Home

Part 2 - Section 3.2 through 3.4

1. 39 16 Now you can begin to format the appearance of the chart You can double click on the boxes to bring up the Format Up Bars window and change their color add shading etc Remove the legend and make your own like the one below Excel doesn t seem to have a legend that works for these graphs 17 To change the scale or fonts double click on those specific parts of the graph such as the site names on the X axis or the numeric values on the Y axis to open the Format Axis window and change the formatting scale or font size 18 To add a title go to the Chart Chart Options Title and fill in the appropriate title Also lines can be added to the chart to indicate water quality standards The final box and whisker may look like this a 2004 Fecal Coliform TMDL Monitoring 300 0 250 0 200 0 150 0 col 100 ml 100 0 50 0 0 0 CR23 5G130 Note If there is a large degree of difference between the sites you may want to adjust the scale to show the sites that are crunched up in a small data range You could also remove the sites 40 19 After completing the box and whisker plot save the style so that you can skip steps 1 12 the next time you want to create a box and whisker graph Do save the style right click on the chart and select Chart Type Click on the Custom Types tab Select the User defined button Click the Add button The Add Custom Char
2. If ties occur let q represent the number of tied groups and Wi represent the number of data points in STEP 2 Calculate the variance of S Fi S 7 the p group The variance of Sis FS rin D2n 3 w w 12w 5 E I I i i p j l Mi Calculate Z iiS 0 Z ifS or Z iiS 0 FS FSI Lise Table A 1 of Appendix A to find the critical value z such that 10011 0 of the normal distribution is below z For example if a 0 05 then z 1 645 For testing the hypothesis H ino trend against 1 H an upward trend reject H if Z gt z Or 2 H a downward trend reject H if Z 0 and the absolute value of 4 Z4 Figure 19 Directions for the Mann Kendall Procedure Using Normal Approximation for Samples Sizes Greater Than 10 from EPA Guidance for Data Quality Assessment A test for an upward trend with a 05 will be based on the 11 weekly measurements shown below STEP 1 Using Box 4 6 a triangular table was constructed of the possible differences A zero has been used if the difference is zero a sign if the difference is positive and a sign if the difference is negative 1 1 Mo of Mo of Signs Signs ul E3 a LA i Iz j me I terraa lz io 74 Go lz Er POE E E E E E E F Mu eum C Om uh onaman mlo umma ta L a 5 sum of signs sum of signs 35
3. c4 f4 f j f 4 25 Minimum maximum and range These statistics are self explanatory The minimum is the lowest value in the data set The maximum is the highest value in a data set Range is the difference between the minimum and the maximum Minimum and maximum values can easily be found in small data sets but equations like the MIN and MAX functions in Microsoft Excel can help find these values in a more numerous set of values in a spreadsheet Standard Deviation Standard variation 1s a measure of the amount of variance in a data set It is equal to the square root of the variance This calculation can be useful in determining precision for a set of replicate samples for example The standard equation for standard deviation 1s H Xx Xy i no In the equation above s standard deviation n the number of values in the data set X the first number of the data set X the second number and so on and Y 7 the mean of the data set Another way to calculate the standard deviation is shown below s the square root of Y X Y X n Y X Sum of the squares of the values n 1 XX Sum of the values n Number of values The easiest way to calculate standard deviation however is by using the Microsoft Excel equation STDEV A1 A5 where A1 A5 is an example of a range of cells that contain the data to be analyzed 2 06 3 22 QA QC Calculations Relative Percent Difference Calculating
4. calculators and with computer programs such as Convert Convert can be downloaded for free at http www joshmadison com software convert Now that you know for example that one Liter equals 0 03531467 cubic feet you still need to be able to conduct conversions based upon these conversion factors You will need to think back to your chemistry classes The point of a conversion is to arrive at the desired units For example if the average concentration of total suspended solids for a day 1s 50 milligrams per Liter mg L and the average rate of flow for the day 1s 500 cubic feet per second cfs how many tons per day were going through the monitoring site The desired units are tons day The beginning units are mg L and ft sec Equations can be created 1n Microsoft Excel to automate these calculations but first write out the equation and multiply by conversion factors to cancel out units until the desired units are achieved In this example we want to change seconds to days and milligrams to tons Liters and cubic feet ft3 are both measures of volume and will be canceled out of the equation 50mg 500ft 50mg 500 amp IL 86400see lkg IL sec lE 1 see 03531467 f 1 day 100 000 mg 611 643 83 kg day 611 643 83 ke Iton 674 36 tons day 1 day 907 ke After writing this conversion on paper it can be translated into a Microsoft Excel equation by noting the multiplication and division factors that are applied to the orig
5. 2 F 44 Series ir Rows 35 s 7 Skip the Chart Wizard Step 3 of 4 for now by clicking Next to continue 8 In the Chart Wizard Step 4 of 4 Chart Location you can choose the location of the graph It can either be placed in its own worksheet or in another worksheet that for example 1s dedicated to graphic analysis Chart Wizard Step 4 of 4 Chart Location Place chart As new sheet Charti 9 Now you have the beginnings of a chart that should look something like the one below You may need to adjust the scale and fonts to make sure the chart is readable This and other aspects of the appearance can also be adjusted when the chart is completed so it is not necessary at this point e min gm 25th median 75th 10 In the chart double click on the line that represents the maximum values in the data set In the Patterns tab remove the line by choosing None under Line change the Marker Style to a dash and change the Marker Foreground Color to black 36 Format Data Series Patterns Axis Y Error Bars Data Labels Series Order Options Line t Automatic Mone Custom Style B ubamatic Weight m Color Sample Marker f Automatic Mane Custom Style Foreground Background Mo Color Size 5 EI pts Shadow 11 Now the graph should look
6. and methods found in textbooks the goal of this document is to increase efficiency in data analysis Therefore the use of Microsoft Excel for the creation of scatter plots and trendlines is recommended In Excel a trendline regression line can easily added to a scatter plot Sections 2 25 and 2 31 give further instructions for creating and analyzing xy scatter plots in Excel The equation including the slope and the R coefficient of determination value for the line can be displayed on the graph as well Pearson s product moment correlation coefficient This is a commonly used method of correlation analysis that measures a linear relationship between two variables Possible values for the Pearson s correlation coefficient range from 1 to 1 Negative values signify a negative slope and positive values signify a positive slope A value of 1 represents a perfectly negative linear correlation A value of 1 indicates a perfectly positive linear correlation Values close to 0 indicate very little correlation between the two variables The closer the correlation coefficient is to 1 or 1 or the closer its square is to 1 the more correlation there is between the two variables The Pearson s correlation coefficient is calculated using the equation shown in the figure below taken from the EPA s Guidance for Data Quality Assessment Practical Methods for Data Analysis EPA QA G 9 44 It can also be calculated using the Microsoft Excel equ
7. be summarized in another form and the graphs may not necessarily appear in the final report or presentation Column graphs should be used with discrete data data that is not continuous Line graphs are used with continuous data Line graphs that are used for discrete measurements may mislead the viewer into thinking the data is continuous An example of a good line graph would be flow data that is collected at regular intervals hourly every 15 minutes Have a clear title Make sure you have simple clear label on the axes that shows reporting limits Use a scale size that reveals trends adjust it from the default scale to meet your needs Avoid clutter Illustrate information that allows the reader to get to the point quickly Use graphs only when they convey meaningful information When displaying data from multiple sites displaying information from upstream to downstream is an intuitive way to organize and present your results Consider the background and graph colors Do they print well Adjust colors to create a color scheme that will make sense to the reader Just do it Start in and play around with different types of graphs thankfully there 1s an undo button 30 Histogram Frequency Plot Histograms and frequency plots show the distribution of observations within a sample set They are usually used to visually assess the degree of scatter and whether the observations are normally distributed Meaning if the obse
8. set Boxplots show the relative positions of Q1 Q2 Q3 minimum and maximum are shown above a scaled real number line The minimum and maximum values of the data set are represented by lines drawn from the ends of the box The left side of the box represents Q1 the first quartile 25 of the samples are less than the value of Q1 Q3 is represented by the right side of the box and Q2 1s represented by a line drawn in the middle of the box They can be used to compare sites by placing a boxplot for each site on the same graph Box and whisker plots can also be used to determine if sites are even comparable If the boxes of two sites do not overlap the sites are not comparable This 1s because the best water quality of one site at its best 1s almost always worse than the water quality of the other site at its worst Thief River Watershed Total Suspended Solids Summary eae ee 15 MooseR 98 TR T57 Mud 40 TR 760 TRNof Thief L Outlet River Agassiz NWR Thief River Falls Figure 14 Boxplot of TSS results within the Thief River Watershed with map Several different methods for generating boxplots and histograms using software have been used by the RLWD One of these is the Analyse It software that can be purchased for approximately 100 as an add on for Microsoft Excel Existing Excel data can easily be used for the calculation of over 30 parametric amp non parametric statistics including descriptive statistics box whisker plots correlati
9. the relative percent difference RPD between samples and duplicates can be used to measure the precision of water quality measurements A smaller RPD indicates greater precision Standards for RPD may be set at the beginning of a monitoring program and included in a quality assurance project plan QAPP Acceptable RPD standards range from lt 20 to lt 30 in existing quality assurance plans from various agencies and laboratories The RPD between a sample and its duplicate 1s calculated by dividing the difference between the two samples by their average RPD Result 1 Result 2 Result 1 Result 2 2 100 Percent Recovery Percent recovery is a test of the accuracy of laboratory methods It is essentially a ratio of the measured value versus the expected value This test can be applied to performance evaluation sample results Performance evaluation samples are prepared by a third party and have a known concentration The percent recovery for a set of performance evaluation samples is equal to the measured concentration divided by the actual concentration then multiplied by 100 Percent recovery calculations can also be used as a method of quality control to determine if there is something in the sample or in the analytical technique that is interfering with the test A set of duplicate samples 1s created from the original real sample A matrix spike with a known concentration of the target analyte 1s added to one of the duplicate sample
10. 13 22 There are several observations tied at 10 and 15 Thus the formula for tied values will be used In this formula g 2 t 4 for tied values of 10 and t 2 for tied values of 15 Ws pa Ql 5 44 D 2 4 2 5 4 amp Fay From Table 4 1 of Appendix A Z 1 645 Since Sis positive 4 H is the altemative of interest Therefore since 1 605 is not greater than 1 645 Hais not rejected Therefore there is not enough evidence to determine that there is an upward trend Figure 20 Example of Mann Kendall Trend Test by Normal Approximation for Sample Sizes of 10 or More From EPA Guidance for Data Quality Assessment 60 Table 4 Critical Values of t Distribution Table A 1 from Appendix A of the EPA Guidance for Data Quality Assessment for Steps 5 6 in Figure 20 TABLE A 1 CRITICAL VALUES OF STUDENT S t DISTRIBUTION Decrees of Freedom 1 000 0 816 0 765 0 74 0 727 0 906 l3 Ae PX AAT 3 143 3 707 0 896 14 A15 U 2 365 2 998 3 499 0 889 108 AT SB 2 MG 2 896 3 355 3 250 0 697 0 695 0 694 0 692 0 69 0 690 0 089 0 088 0 6880 BST 0 686 BSY DEA 32 T 2 080 686 LE OG 32 A17 2074 0 085 LE he 319 Jl 2 069 0 085 AST 5 Ale Y 2 064 0 0e4 BSG O58 ALG TUR 2 060 0 684 0 684 0 683 0 683 0 683 PJ B3 PJ PJ Po O68 D 8 O30 ADD Asta 2 021 0 679 0 848 O46 9G 67 2 000 0 677 0 845 i 289 GAR 1 980 0 674 0 842 O36 28 Gd 1 960 PJ B3 PJ bd Mote The last row of the tablei deg
11. 3 2 Statistical Analysis Procedures There are many different types of statistical analysis that can be performed on water quality data sets for reporting and interpretation purposes Many inferences can be made about data from simple statistics such as mean minimum maximum median range and standard deviation Here is a quick review of how these statistics are calculated and how they can be used for analysis of water monitoring data Also included in this section are some slightly more advance statistics The following table derived from the MPCA s Volunteer Surface Water Monitoring Guide provides some guidance on the particular uses of these statistical methods Table 1 Suggested Statistical Summaries for General Chemical and Physical Parameters Adapted from We Have Stream Data Now What Statistical Summar Flow Weighted Confidence Intervals or Standard Deviation Seasonal Average Seasonal Median Geometric Mean Parameter Total Suspended Solids Temperature Dissolved Oxygen Turbidit Average Range Rame aii i Conductivity i i ET Chlorophyll a GRE Water Clarity Transparency Bacteria Average Nm 3 21 Statistics Median The median of a data set is the middle value after all the values have been ranked in order of value The median can easily be picked out in small data sets or can be calculated with the MEDIAN equation in Microsoft Excel for large data sets Mean The mean or a
12. 96 27 8 11 04 96 6 8 3 0670497 11 52 Use DATE as the date column heading A2 for all worksheets 0 09 17 97 5 85 12 09 97 053 05 13 98 g 8 04 98 g SE 10 19 98 3B ELS 01 05 99 5 46 4 06 89 41 5 28 89 23 318 10 21 88 12 e PE 02 24 00 5 No Gaps in Data 2 4 18 00 48 A 07 19 00 4B EN 10 17 00 10 03 01 01 4 24A 4 17 01 32 25 7 23 01 23 Automatic Arial q2 117047105 2 47 PM Li Ready Figure 21 Lotus Spreadsheet Configured for FLUX When the data to be analyzed has been set up in this fashion the FLUX program can be started Once you have gotten to the main menu you will need to tell the program to read your data Use the arrow keys to navigate the menu system from DATA down to READ and then down to RESET and then hit enter The program will then switch to the FLUX INPUT SCREEN shown below Gy C model flux FLUX EXE z o x F8 HELP FIELD F7 HELPA AEDITOH lt ESC gt ABORT Fi HELP F2 DONE SAVE F3 EDIT FIELD Figure 22 FLUX Input Screen 64 On the FLUX Input Screen Figure 12 enter a title such as the site name and the parameter being analyzed Then enter the DOS PATH which 1s the location of the folder in which the data 1s stored store the flow data and parameter data in the same folder as it would appear in the address bar of Windows Explorer In figure 12 the DOS path is C model DMData Be sure to include a backslash at the end of the DOS path If you are unsure of t
13. Cancel lt Back Finish In Step 3 you can edit details of your chart such as the chart title and axis labels Click next when you are finished to go to the next step In Step 4 of the chart wizard process simply select where you want the chart to appear and click finish Your time series graph is now complete There are several aesthetic alterations that can be made to the graph at this point by right clicking on the axis data series or chart area and using the respective formatting windows To apply regression to your graph to try to find a trend right click on your data series and select Add Trendline The Add Trendline window will now be visible on your screen Select Linear for the graph type and then click on the Options tab Under this tab you may choose to display the equation on the chart or display the r squared value if you so desire Press OK 54 11 A trendline will now be visible on your chart The slope of this line will indicate the direction of the trend in your data If a linear trendline doesn t show a trend there are other types of trendlines to try The types available in Microsoft Excel include logarithmic polynomial power exponential and moving average trendlines A moving average trendline is particularly useful for use on long term monitoring data sets from sites that have experienced both upward and downward trends over time 3 32 Statistical Trend Detection Methods If a trend is n
14. OWESS seasonal Kendall test data transformations and step trend analysis The EPA Guidance for Data Quality Assessment covers trend detection methods such as regression Sen s slope estimator seasonal Kendall slope estimator and hypothesis tests for detecting trends A concept behind some types of statistical analysis for trend detection involves disproving the null hypothesis which states that there 1s no trend In other words if there 1s not enough proof to say there is not a trend than a trend may exist Some of the tests and techniques do approximately the same thing that the Excel method described in Section 2 31 can do for you Some involve data transformations natural log to improve the performance of statistical tests Others involve techniques to determine a trend by reducing variability seasonality or by reducing the influence of flow on results LOWESS LOcally WEighted Scatterplot Smooth is a nonparametric method used to create a smooth line through a scatterplot It is useful when there 1s a non linear relationship between time x and concentration y Adding a moving average trendline to a scatter plot in Microsoft Excel will essentially accomplish this type of plot 55 Dealing with seasonality There are many exogenous variables external factors that can affect sample results and make trend detection difficult The variation of environmental conditions from season to season 1s one of these exogenous variables Sampl
15. and totals for rows Fields per column BH _ amp utaFarmat table o Subtotal hidden page items 7 ices e Far error values show 1 erge labels f Preserve formatting e For empty cells show Data options Data source options External data aptions wf Save data with table layout Save password wf Enable drilldown Background query _ Refresh on open Optimize memory 10 The window for PivotTable Wizard Step 3 of 3 will be active again Click Finish and the table will appear in the spreadsheet Here s the upper left corner of the table based on this example Note the field names 26 Count of PH Dare A 27 SITE NUMBER 9 3 00 S 30 00 10 1 00 10 28 00 20 BGR 1 25 2d CFBS VV I 30 ESHE 0 30 U 3l FC 116 I 34 FE 440 33 FC 453 34 LOR 0 1 U 35 MB Ha 11 If the values for pH in this case are not the ones from the source data it may be because they are actually calculated values In this case the values that appear in the cells are actually a count of the number of values in each cell of the source data This is stated in the upper left cell which says Count of PH What if we want to show the actual pH values Unfortunately PivotTables only display the results of calculations functions In this case the table is displaying the results 50 of calculation which counts the number of values in each cell This is easy t
16. arting you will need to create a column of values that will specify the borders of the intervals within the histogram you will be creating To see if this add in 1s loaded in your version of Excel click on Tools menu If you do not see Data Analysis in the Tools menu click on Add ins instead A window will appear that shows a list of possible add ins for Excel Check the box for Analysis ToolPak and click OK to install the add in You will likely need to insert your Microsoft Office CD in order to complete the installation Once the installation process 1s complete you can open the data analysis window by clicking on Data Analysis in the Tools menu Within this window you can see all the different types of statistical analysis that can be performed with this tool To create a histogram double click on Histogram in the list of options The histogram window will then appear In this window you will need to specify the input range This is the set of values you want to 39 analyze The BIN range is the column of numbers that you created at the beginning of these instructions Indicate where you want the histogram to appear by specifying an output range or by telling the program to create a new worksheet Check the chart output box to get a bar chart histogram When you click OK the program will create the histogram Boxplots Creating boxplots or box and whisker plots is another method for visually representing the distributions within a data
17. at the preview of the chart under the Format Data Series gt Options tab to determine whether or not you need to adjust the gap width Step 15 and 19 can be skipped but steps 16 18 are still needed in order to adjust the appearance of the graph add a title etc The following page shows what Steps 4 6 will look like when using the custom chart type for boxplots created in Step 19 aa zd Chart Wizard Step 1 of 4 Chart Type Standard Tvpes Coton e Chart type Sample 2004 Fecal Colf orm TMOL 350 0 300 0 250 0 5200 0 E 150 0 100 0 St 0 0 d gi 67 7 Sele Bm nm Make sure there are series For the 25th User derine percentile minimum median maximum f Built in and 75th percentile in that order Cancel next gt Finish l Chart Wizard Step 2 of 4 Chart Source Data U El r Dats Range Series E O04 Focal Colform TMOL Meaitorisg o fI 700 0 o 0 rT of FSS oc cog cll ll al CASS 2130 Datarange nDospo l amp bz E 4 Te T jes in L 7 Cancel z Eaz Einish 42 3 25 Measures of Association Correlation matrixes Pearson s correlation coefficient Spearman s rank correlation coefficient and serial correlation coefficient are all measures of association in data sets In other words the purpose of determining correlation 1s to tell how closely x and y values are related 1 e water temperature and dissolved oxygen or turbi
18. ata sets To calculate a trimmed mean calculate the mean of only the data that falls between the 25 and 75 percentiles of a data set Trimmed mean can be automatically calculated in Microsoft Excel by using the equation TRIMMEAN See the following section on quartiles to learn how to calculate the 25 and 75 percentiles Percentiles and Quartiles Percentiles are a measure of the relative position of a single value within a data set They are more valuable when applied to large data sets versus small ones Percentiles are labeled P1 Ps Pos etc The subscript number refers to the percentage of the values in the data set that are smaller than the value of the percentile So if the P39 percentile of a data set equals 10 30 of the measurements are less than 10 and 70 of the measurements are greater than 10 Three particular percentiles are used quite frequently in statistical analysis These are P25 Pso and P75 These percentiles are also referred to as the 1 2 and 3 quartiles or Qi Qo and Q3 respectively Other percentiles that are commonly used include the 5 and the 95 percentiles 24 Percentiles and quartiles are another type of statistical analysis that can be performed using Microsoft Excel and other computer programs Many programs that calculate a set of summary statistics will include the 1 2 and 3 quartiles To perform this calculation using a Microsoft Excel function simply go to Insert gt gt Function cl
19. ation PEARSON To insert this function into a cell go to Insert gt Function highlight the statistical category of available functions and then double click PEARSON or highlight it and click OK A box will then appear that will ask for the two data sets that will be analyzed for correlation array and array 2 Excel also has a CORREL function for calculating a correlation coefficient Box 2 6 Directions for Calculating Pearson s Correlation Coefficient with an Example Let X X X represent one variable of the n data points and let Ya Yn represent a second variable of the n data points The Pearson correlation coefficient r between X and Y is computed by E XT p er cU l H x no i vx p i YY y Dar Te yy _ i l n ilo n Example Consider the follwing data set in ppbj Sample 1 arsenic X 8 0 lead Y 8 0 Sample 2 arsenic 6 0 lead 7 0 Sample 3 arsenic 2 0 lead 7 0 and Sample 4 arsenic 1 0 0 865 E28 kia 105 z Since ri dose to 1 there is a strong linear relationship between these two contaminants Figure 16 Equations and Directions for Calculating Pearson s Correlation Coefficient by Hand Spearman s correlation is a method for calculating correlation coefficient that 1s less sensitive to extreme values than the Pearson s correlation coefficient and is not affected by transformed data For this method the same equation is used for calculating t
20. by time A maximum of five strata can be created in FLUX Stratifying data can improve the accuracy of load estimates as long as there are enough samples As with finding the best calculation method finding the best stratification method also involves trying to get the lowest CV possible In this example data will be stratified by flow FLUX will automatically set the boundaries of flow strata To stratify using dates or another stratification system use the General stratification option and the number of strata needed under Stratify in the MAIN MENU Then input dates or other values to specify boundaries between strata After choosing the best calculation estimation method highlight LIST and then BREAKDOWNS in the submenu and press ENTER to get the breakdowns by stratum Since the default stratification scheme is one stratum no stratification the breakdown results will be for one stratum the first time you do this Breakdowns show the number of samples flow volume per year in HM yr FLUX in Kg yr total volume in HM total mass in Kg mean concentration in ppb and the coefficient of variance CV Note the CV value 147 in the example below and press ESC to return to the main menu Now you will try to use additional strata in an attempt to decrease the CV 67 USE KEYPAD lt F1 gt HELP lt F8 gt SAUE lt ESC gt QUIT OUTPUT Figure 25 Breakdowns Screen This step demonstrated in the screen shot below is used to tes
21. cells within the table where it 1s needed If the cell reference is correct in the first cell it should be correct in the others as well because the cell reference within the equation based upon the receiving cells position relative to the cell the equation is copied from Zero values will have to be entered manually if an if then equation 1f an if then equation such as the example is used because zero values will be transformed into signs when the equation is initially copied across the table 58 Table 3 Table A 11 from Appendix A of the EPA Guidance for Data Quality Assessment 0 408 0 247 0 117 0 042 0 0084 0 548 0 452 0 460 0 274 0 199 0 138 0 089 0 054 0 04 0 016 0 007 0 0028 OCIS D Or 69 0 000025 0 54 0 SOC 0 460 3 0 360 386 0 38 5 235 0 28 0 306 T 0 136 0 19 0 238 0 068 0 119 0 179 0 028 0 068 0 130 3 00083 OLD 8 0 090 3 0 0014 0 015 0 060 0010S 4 0 038 0 0014 0 022 0 00020 0 012 TREERE 0 0029 0 0012 0 04 0 000 2 0 000025 0 ODOOO S 59 0 500 0 43 0 364 0 300 0 242 0 190 0 146 0 108 0 078 0 054 0 036 0 023 0 014 0 0083 0 0046 0 0023 0 0011 0 00047 0 00018 0 000058 0 000015 0000028 0 OOOO 28 Box 4 9 Directions for the Mann Kendall Procedure Using Normal Approximation If the sample size is 10 or more a normal approximation to the Mann Kendall procedure may be used STEP 1 Complete steps 1 2 and 3 of Box 4 7 nin Want 5
22. d intervals of 5 ppm will be used 0 5 ppm 5 10 ppm ete The endpoint convention will be that values are placed in the highest interval containing the value For example a value of 5 ppm will be placed in the interval 5 16 ppm instead of O 5 ppm SIEPZ The table below shows the number of observations within each interval defined in Step 1 STEPS The horizontal axis for the data is from O to 40 ppm The vertical axis for the frequency plot is from 10 and the vertical axis for the histogram is from 09s 1095 Wations total so the number observations shown in the table below will be he results are shown in column 3 of the table below SIEP4 There are 22 obse 2 T divided by 22 A common unit for this data is 1 ppm In each interval there are 5 common units so the percentage of observations column 3 of the table below should be divided by 5 reolumn 4 The frequency plot is shown in Figure z 1 and the histogram is shown in Figure 2 2 of Obs of Obs of Obs Interval in Interval in Interval per ppm O amp ppm a 10 5 10 ppm 13 60 10 15 ppm 35 36 15 20 ppm B ff et 20 25 ppm 4 55 25 30 ppm 4 55 30 35 ppm o 00 35 40 ppm 4 55 Figure 13 Example of Generating a Histogram and a Frequency Plot en M tn G3 oo ui ut EG E E un c The most common available option for the creation of a histogram within a spreadsheet is likely to be the data analysis add in for Microsoft Excel Before st
23. dity and total suspended solids Correlation matrixes are a graphical method of determining correlation In Microsoft Excel x values can be plotted against y values in a scatter plot This scatter plot can be created using methods similar to those described in section 2 3 A time series plot may be considered a correlation matrix of comparing water quality data to time This can be used as a quick way to determine correlation between two sets of data The difference between time series plots and correlation plots 1s that the data points are not chronological on correlation matrixes and correlation matrixes can have parameters on both the x and the y axis instead of just on the y axis In Microsoft Excel a trendline can be added to the data plot by right clicking on the data points and selecting Add Trendline and checking the Display R box under the Options tab in the Add Trendline window A user can visually assess how well the plotted points are clustered along the trendline and by observing the R value The R value also shows how reliably the equation of the trendline can be used to predict y values based on x values It is the square of the correlation coefficient An R value that is close to 1 indicates a close association between x and y values Since not all trends are linear using a trendline in Excel gives the user the advantage of being able to create polynomial exponential logarithmic and moving average trendlines When
24. e Dates If you receive an error instead of a list similar to the one above you will need to check the information entered into the FLUX INPUT SCREEN especially the data file location and file name Check to make sure that the data in the spreadsheets is entered correctly and make sure the data 1s arranged correctly on the spreadsheet Once data 1s loaded into FLUX one of the programs primary functions 1s calculating the load over the time period specified If multiple years of data are used it will calculate the average annual load If the data 1s stratified by season and includes multiple years of data it can calculate the average load for each season 65 One of the most time consuming parts of using FLUX is the determination of which calculation and stratification methods produce the most accurate results The best calculation method is found first and then that calculation method is applied to several different stratification schemes in an effort to find the lowest coefficient of variance The coefficient of variance 1s a measure of the accuracy of the estimate A lower CV means a higher level of accuracy in the model s calculations FLUX uses several different calculation methods Direct mean loading Flow weighted concentration ratio estimate Modified ratio estimate Regression first order Regression second order Regression applied to individual daily flows DEP ra a Fortunately knowledge of how all these ca
25. e results vary from season to season within a year This variation due to weather biological activity natural activities wildlife agricultural activity groundwater influence and surface runoff influence can make discerning a trend from an entire data set difficult A particular level of discharge can either come from either ground water or surface runoff depending on the time of the year so seasonal stratification makes more sense than flow stratification for trend analysis unless there 1s enough data to stratify by both season and flow In order to minimize the influence of seasons data can be stratified by season This way the sample results within each data set will have been influenced by similar environmental factors Finding a trend from summer data for example may be more successful than trying to find a trend from data from all seasons There may be upward trends 1n some seasons and downward trends in others even at the same monitoring site Trends may appear in seasonally stratified data that do not appear in the entire data set This may happen if both upward and downward trends exist for separate seasons that may cancel each other out when all the data 1s combined Seasonal strata can be quarterly four per year or monthly twelve per year Quarterly stratification will yield a more manageable amount of results than monthly stratification Once data has been stratified the Excel method described in this document can be appli
26. ear relationship between the two parameters Using a correlation matrix to identify and remove outliers can help increase any correlation coefficient This affects the Pearson s correlation coefficient more than it affects the Spearman s correlation coefficient since the Spearman s coefficient 1s affected less by extreme values After removing only two outliers in the site 760 TSS vs flow data set the Pearson s correlation coefficient increased from 27 to 55 while the Spearman s correlation coefficient only increased to 74 from 76 Since a data set with nearly zero correlation can be made to look like one with a good correlation if enough outlying data 1s removed the practice of removing a large number of outliers in order to improve correlation plots 1s not encouraged Instead analysis for association using the Spearman s correlation coefficient transformation of data to natural log values or using polynomial trendlines in Microsoft Excel may be used if a correlation 1s not found with other methods 3 26 Pivot Tables The user guide for Microsoft Excel describes a pivot table as an interactive worksheet table that quickly summarizes large amounts of data using a format and calculation methods you choose It is called a pivot table because you can rotate its row and column headings around the core data area to give you different views of the source data sic They are useful for summarizing large amounts of data such as continuous
27. easurement is then compared to previous measurements to determine whether there 1s a positive difference or a negative difference 56 Within this matrix the horizontal measurements are compared with those of vertical measurements The value from the vertical axis 1s subtracted from the value of the each measurement on the horizontal axis A plus or a minus is recorded to indicate whether the relationship is positive or negative values of 0 are not recorded on the table The number of pluses and the number of minuses are then added for each row and totaled at the bottom of the table The total number of minuses is subtracted from the total number of pluses Original Time i E time from earliest to latest Measuremeril MN factual values recorded Original Time i E 5 amp of 8 of Measurement u F e X Differences Differences O lt 0 NOTE X Y 0 do not contribute to either total and are Total gt 0 Total discarded where Y sign X A IX X O 0ifX X 0D e d4dXxX X 0 Figure 17 Upper Triangular Data for Basic Mann Kendall Trend Test with a Single Measurement at Each Time Point EPA Guidance for Data Quality Assessment e Su Consider 5 measurements ordered by the time of their collection 5 6 11 8 and 10 This data will be used to test the null hypothesis Ha no trend versus the altemative hypothesis H of an upward trend at an a 0 05 significance level STEP 1 The data listed i
28. ecord the breakdowns for the calculation method and stratification method combination with the lowest CV Note that in Figure 17 the CV was lowest using calculation method 5 regression second order and using this calculation method the CV was lower with two strata than for one stratum 130 vs 147 respectively Adding another strata did not reduce the CV any further 69 Gy C model flux FLUX EXE nx k ai mi x a Figure 28 Noting the Coefficient of Variance When the most accurate method has been found the values for flow cubic hectometers per year flux Kilograms per year total volume hectometers mass Kilograms and flow weighted concentration parts per billion can be recorded from the breakdowns page FLUX can also be used to evaluate your monitoring program Modeling results can be biased based upon the distribution of samples Since most of the sediment and nutrient loading from rivers occurs during high flows the majority of samples should be collected during high flows to achieve the most accurate annual load estimations FLUX contains a function that determines the optimal percentage of samples that should be collected for each stratum When the data has been stratified by FLUX whether by flow or temporally the distribution of the sample data with the optimal distribution of samples can be compared For example under a flow stratification system of high versus low flows the majority of sample
29. ed to each season s data set to create time series plots The seasonal Kendall test and regression analysis are two statistical methods that can be applied to seasonally stratified data 1n order to find a trend sen s Slope Estimator For this nonparametric alternative method for finding a slope the slopes between each set of points 1n time are calculated first The median of all these slopes is then used as the overall slope seasonal Kendall Test This slope test can be used to account for cyclical trends The concept presented by this test is that a trend may be evident if slope is calculated for each season month or week Mann Kendall Trend Test This method is used for testing a hypothesis for the purpose of trend detection This test involves calculating the statistic S by examining the individual slopes between all possible pairs of data A large negative value for S indicates a decreasing trend A large positive S value represents an increasing trend The null hypothesis or Ho is that there is no trend The alternative hypothesis Ha is that there is either an upward trend or a downward trend To calculate the Mann Kendall trend test list all observations 1n chronological order from left to right horizontally across the top of the table beginning in the same corner of a table as the horizontal lists Also list all measurements except for the last chronologically vertically from the top to bottom along the left side of the table Each m
30. he coefficient as the Pearson s coefficient but there 1s a data transformation involved The values for each variable are changed to their rank within their respective data sets This 1s relatively simple to do in Microsoft Excel New columns can be added to a spreadsheet next to each column of raw or transformed data that 1s going to be used for the correlation analysis Input the rank of each value into its respective new column Hint the Data gt Sort function and the sort ascending 2 button are useful for this task Once the ranks have been entered the correlation efficient 1s determined for each variable s ranking data If there is not a good statistical relationship between each variable Pearson s coefficient this type of correlation analysis will determine if larger values of x correlate with larger values of y and smaller values of x correlate with smaller values of y _45 For example the Pearson s correlation coefficient calculated to determine the correlation between total suspended solids and flow at site 760 on the Thief River was only 27 This indicates that there is not a strong relationship between the two variables However the Spearman s method resulted in a correlation coefficient of 74 which indicates a stronger relationship than the Pearson s correlation coefficient This tells us that higher flows at the monitoring site may be related to higher levels of total suspended solids even though there is not a lin
31. he exact path for your data file use Windows Explorer to find the file and use the path shown in the address bar to get the correct file location The FLOW DATA FILE is the name FLOW760 and extension wk1 of the Lotus file in which the flow data is stored FLOW LABEL is simply the column heading for flow within this spreadsheet SAMPLE DATA FILE is the name and extension of the file containing the sample data and CONC VARIABLE is the column heading for the sample data Entering LOOKUP for the flow variable will tell the program to lookup the corresponding flow for each sample result SAMPLE DATE RANGE and FLOW DATE RANGE are filled in with the beginning date on the left and the ending date on the right Press F2 when you are done If everything goes right you will get a screen with the information listed below that lists statistics such as the number of flow records and the number of samples like the one below You can then hit escape until you get back to the main menu Locating Sample File OPENING SAMPLE FILE TSS760 WK1 SAMPLE CONCENTRATION FIELD TSS CONCENTRATION UNITS FACTOR 1000 000000 Flow Scale Factor 8937 Conc Scale Factor 1000 0000 Reading Samples THIEF RIVER 760 NUMBER OF SAMPLES 16 Reading Flows OPENING FLOW FILE FLOW760 WK1 FLOW FIELD FLOW THIEF RIVER 760 NUMBER OF FLOW RECORDS 6999 substituting Daily Flows for Sample Flows Flow Concentration Pairs 16 Missing or Zero Flows on Sampl
32. hich there 1s data available 1 e April through October By comparing annual or seasonal loads the relative impact that a watersheds or subwatershed 1s having on water quality can be quantified There are a number of software programs that calculate loads and can estimate annual seasonal loads based upon flow and water quality data Some of these are free such as FLUX and Basins Others can be somewhat expensive The free versions are in some cases preferred by resource professionals because the models and the methods used within the models do not change as much as purchased software This makes it easier to compare results from different monitoring programs The RLWD currently uses FLUX for load estimation It is a DOS based program distributed by the U S Army Corps of Engineers that was developed by Dr William W Walker Some of the advantages of this program are that it is reliable relatively accurate and provides a lot of information for each data set There are some negatives and annoyances that have been encountered with _ 62 FLUX The old version that was not Y2K compliant this has since been fixed Data has to be transferred into Lotus spreadsheets before it can be loaded into the model causing extra work for Excel and Access users The program is very fussy about the organization of data within the spreadsheets The user manual does not always cover the quirks of the program very well This section will provide some tips that wi
33. how spatial trends like changes in water quality along the length of a stream Whether it is applied temporally or spatially trend analysis can be used to identify areas where water quality is being improved or degraded 3 31 Graphical Trend Analysis Methods Spreadsheet programs such as Microsoft Excel are a popular method for the easy creation of graphs showing trends in data Time series plots are created easily within this program Due to the seasonal variability of water quality measurements however identifying trends can still be difficult Software based regression analysis can be applied in order to smooth out the variation and show overall trends over a period of time Regression analysis can be easily applied within Excel using a trendline The methods below list the steps necessary for creating a simple time series plot and add a trendline to see if there is a trend in the data 1 The quickest and easiest way to start a time series plot is to highlight the two columns or rows of data that you will be using Highlight the values within the date column row that you wish to use for the graph and while holding the control key down select the corresponding values for your parameter as well 2 Now that your data 1s selected there are two ways to get to the chart wizard a Click the chart wizard button on your tool bar jii b Click on the Insert pull down menu and then click on Chart Insert Format Too Worksheet ial Char
34. ick on statistical and then choose either PERCENTILE or QUARTILE Choose the PERCENTILE function for percentiles other than the quartiles because you can input the percentile you wish to calculate between 0 and 1 QUARTILES is a simplified version of the PERCENTILE function The desired quartile is entered into the Quart field 0 for minimum for QI 2 for Q2 3 for Q3 and 4 for maximum Whichever function you choose a window with two fields will appear Enter the range of values to be analyzed into the Array field and indicate the desired percentile or quartile in the bottom field Click OK when the information has been correctly entered into the fields Loads Loads are calculated by multiplying concentration by flow volume Daily average concentrations and or flows can be used for continuous monitoring programs Often however only one measurement for each will be available for each sampling day Instantaneous loads can still be calculated with this data Loads in milligrams mg per second sec can be calculated by multiplying the concentration in milligrams per liter mg L or ppm by the flow in cubic feet per second ft sec or cfs and then multiplying by a conversion factor of 28 31685 L 1 ft Milligrams per day can be calculated by multiplying the mg sec result by a conversion factor of 86 400 sec day After this any other conversion factors can be applied Kilograms per day can be calculated by multiplying the mg day result by a conver
35. inal values If the 50 mg L 1s in cell A2 the 500 ft sec value is in cell B2 and you wish to calculate the load in tons day in cell C2 here is what the equation should look like in cell 2 A2 B2 86400 0353 1467 100000 907 or a simplified version A2 B2 86400 3203040 569 28 Table 2 Useful Conversions for Water Quality Data Analysis Common Conversions for the Water Quality Monitor 1 gram g 1000 milligrams mg township twp 36 sections sect 1 ton tn 2000 pounds Ibs section sect 1 square mile mi 1 kilogram kg 1000 grams g township twp 36 square miles mi 1 kilogram kg 2 20462 pounds lbs acre ac 43 560 square feet ft 1 pound Ib 453 5924 grams g square mile mi 2 640 acres ac square mile mi 2 589988 square kilometers km square foot 144 square inches in square meter m2 10 76391 square feet ft N N 1 mile mi 5280 feet ft 1 mile mi 1 609344 kilometers km 1 meter m 1 09361 yards yd Presue i ii 1 day 24 hours hrs 1 hour hr 60 minutes min 1 cubic foot f 28 31685 liters L 1 minute min 60 seconds sec 1 hour hr 2 3600 seconds sec 1 day 86 400 seconds sec liter L 2 33 81402 ounces oz cubic yard yd3 27 cubic feet Concentration 1 cubic foot second cfs 646316 9 1 milligram liter mg L 1 part per million ppm 1000 gallon
36. lculation methods work is not needed in order to run the model In order to choose the best calculation method for your data you will need to determine which method 1s the most accurate or which method has the lowest coefficient of variance FLUX calculates this value To find the method with the lowest coefficient of variance use your arrow keys to highlight CALCULATE in the main menu Highlight LOADS in the submenu that appears and press the ENTER key In the resulting window there will be a list of annual load results for each calculation method Make note of which method has the lowest CV and press the Esc key to get back to the main menu In the example below method 5 CV 147 will be the most accurate of the six methods ec C model flux FLUX EXE E x USE KEYPAD lt Fi gt HELP lt F8 gt SAVE lt ESC gt QUIT OUTPUT Figure 23 FLUX Calculated Loads Screen 66 The program must now be told to use the desired calculation method 5 in the example for use in the subsequent calculation of loads The program will apply the selected calculation method to each stratification method that you apply to the data To select a calculation method return to the MAIN MENU and highlight METHOD Then in the submenu highlight the chosen method and press the ENTER key Gy C model flux FLUX EXE J z a Figure 24 Choosing a Load Calculation Method in FLUX Stratification 1s a process that splits the data into groups by flow or
37. ll hopefully make the learning process a little smoother for those who wish to use the FLUX program The first step in creating load estimates is the collection of data Higher numbers of samples will generally result in load estimations of higher accuracy Also the collection of flow data 1s very important Daily average flow data should be obtained for the entire period of record that will be modeled This can be done using flow data from a nearby USGS gauge or by installing continuous stage recording equipment collecting a range of flow measurements and creating rating curves to estimate flows based on the stage data For more information on stream gauging flow monitoring and the creation of rating curves see Section 9 0 of the Standard Operating Procedures for Water Quality Monitoring in the Red River Watershed and Section 3 56 of this manual The next step 1s the preparation of data so that it can be used by FLUX For this step data can be prepared and organized in Excel much more quickly and easily than in Lotus 1 2 3 A separate work sheet 1s needed for each parameter and for flow Creating a workbook for each site and worksheets for each parameter within each workbook 1s recommended This is because there usually is less sampling data than flow data available If there 1s not a sample result for each day that there 1s a value for flow there will be gaps in the parameter data if it 1s placed in a column next to the flow data within the
38. monitoring data from which daily averages can be calculated from hourly data by creating a pivot table Tables can be created that summarize a data set using sum average maximum minimum standard deviation variance count or product calculations The following is a set of step by step directions that show how to create a basic pivot table Although menu composition precise methods and window appearance may vary among different versions of Microsoft Excel the basic process for creating the tables should be the same 1 Open an Excel file that contains a worksheet with the raw data you wish to analyze 2 Arrange the data so that columns represent fields and rows represent records 3 Start the PivotTable wizard There are two ways to do this a Click on the Pivot Table Wizard button lie in the standard toolbar 46 b Goto View Toolbars and select Pivot Table Wizard The pivot table toolbar will then be visible Click on Wizard in the PivotTable pull down menu on the toolbar PivotTable le Wizard Select k Formulas k 4 The first step of the pivot table wizard will then appear as a window For this example a pivot table will be created from an Excel database Select the Microsoft Excel List or Database option and the Pivot Table option and click Next PivotTable and PivotChart Wizard Step 1 of 3 where is Ehe daba Ehab vou want ko analyze Microsoft Excel list or database Externmab data sac t Multiple co
39. n order by time are 5 6 11 8 10 STEP 2 A triangular table see Box 4 6 was used to construct the possible differences The sum of signs of the differences across the rows are shown in the columns and 5 Time 1 2 3 4 Data 5 Wea of No of 10 Signs im e eu un m ra 5 G 1 8 cof cius ajo Pi co Using the table above 8 2 6 From Table 4 11 of Append A for n 5 and 6 p 0 117 Since S gt 0 but p 0 117 0 05 the null hypothesis is not rejected Therefore there is not enough evidence to conclude that there is an increasing trend in the data Figure 18 An Example of Mann Kendall Trend Test for Small Sample Sizes EPA Guidance for Data Quality Assessment To save a little time an equation can be used to arrive at the final table in Microsoft Excel An if then equation like F H15 lt 0 can be used This equation will determine whether or not the value in a cell is below zero and if it 1s it will display a negative sign in its cell It will display a positive sign for every value greater than or equal to zero Create a copy of the table containing the difference calculations and replace the values in the copy with the if then equation Start by placing the equation in one of the cells and making sure that it works properly Make sure the cell reference H15 in the example points to the corresponding place in the original table Copy the equation to the other
40. n the Analyze or Download Data page for each monitoring site To get to this page go to the RLWD website at www redlakewatershed org click on the Water Quality section search for a site using the interactive map or text search tools click on a blue site ID number the link to the informational pages for the monitoring site and then click on the Analyze or Download Data tab Scroll down to the blue link for the current version of StatCrunch After you have created an free account the software will automatically load the data from the monitoring site into the program The data can then be analyzed using nearly 3 any type of applicable statistical or graphical analysis The statistics available in StatCrunch include correlation covariance summary statistics for columns or rows frequency tables contingency tables z statistics proportions variance regression t statistics ANOVA and control charts The options available in StatCrunch for graphical analysis include bar plots pie charts histograms stem and leaf plots boxplots dot plots means plots QQ plots scatter plots index plots chart group statistics parallel coordinates pairs plots 3D rotating plots and color schemes Consider the following 22 samples of a contaminant concentration fin ppm 17 7 17 4 22 8 35 5 28 6 17 2 19 1 4 7 2 4 15 2 14 7 14 9 10 9 12 4 12 4 11 6 14 7 10 2 5 2 16 5 and 8 8 STEP 1 This data spans 40 ppm Equally size
41. ng site that may have had an effect on water quality and the dates of these actions are known they should be considered during trend analysis These actions could include the removal of a dam an upgrade to a waste water treatment plant erosion control projects impoundments implementation of buffers within the watershed and lake restoration projects The original data set may be split into before and after data sets Make sure that the data split is based on the timing of the event and not based upon an examination of water quality data or bias may be introduced into the analysis processes and trend analysis may show changes that aren t really there For more information on statistical methods for trend detection consult a statistics textbook or a free resource like the EPA Guidance for Data Quality Assessment Practical Methods for Data Analysis EPA QA G 9 QA00 Update that is available for free online at http www epa gov quality qs docs g9 final pdf 3 4 Developing Load Estimates Load estimates are used to determine the mass of a substance being carried by a river or stream through a sampling site within a particular amount of time Loads can be calculated on an annual or a seasonal basis depending upon how much data is available Annual loads can only be accurately estimated when there 1s a full year s worth of data If a full year s worth of data is not available seasonal estimates can be done for the period of time for w
42. nsolidation ranges What kind of report do vau want bo create t PiyvotTable t PivotChark with PivotTable IL ill IEE Cancel z Bacl Finish 5 The next window will be PivotTable Wizard Step 2 of 3 Select the spreadsheet that contains the source data In the spreadsheet select the range of cells containing the data you ll be working with including the column headings a must Select the entire range at once In the example window below the rvsdatal 101 B 2 K 237 text in the box refers to the file name rvsdatal 101 range of cells B 2 K 237 that were selected Click the Next button AT a Pivotiable Wizard Step 2 of 3 Where is the data that you want to use be used to create the pivot table Pivotiable Wizard Step 2 of 3 Where is the data that vau want to use Click here to select the data that will 6 Now you ll see the final step of the pivot table wizard PivotTable Wizard Step 3 of 3 see below Click the appropriate option to tell the program whether you want the table in a new worksheet or in the one you are working in in this case it will place the table in the existing worksheet with the upper left corner in cell I26 Note that you can specify a location by clicking the icon just to the right of the box and selecting the location in the spreadsheet Click the Layout button Pivotiable Wizard Step 3 of 3 Where do vau want to put the PivotTable C3 Me
43. o work around If we wish to view daily results for each site we just need to select another function that will return the original values 12 To change the type of calculation the Pivot Table toolbar will need to be open If it was not opened in Step 3 of these directions open the View menu by clicking on it move your cursor to Toolbars and select PivotTable This toolbar will then appear t e FH ALE ALIMIT TIME Al TEMP Ww ATER TEMP AEP AE LU MITE ATES TURBIGIT 13 Select a cell from the results area ex a data label Count of pH in order to alter the type of calculation Click on Pivot Fgble in the upper left corner of the PivotTable toolbar This is a pull down menu Select Field Settings from this menu The Field Settings option will only be available if a cell is selected as described at the beginning of this step The PivotTable Field window will open In the example below Average was selected Pivotlable Field Source field FH Mame Average of PH Cancel summarize by Delete Count Nums Options 14 Click OK to view your completed pivot table _5 3 3 Trend Analysis Most trend analysis that uses long term monitoring data 1s conducted to determine if there are changes in water quality over time It can even be used on data that spans a relatively short period of time to show for example changes in water quality throughout the duration of a storm event Trend analysis can be used to s
44. on multiple linear regression analysis ANOVA amp chi square statistics This program basically creates a worksheet that is set up as a report and includes histograms percentiles and summary statistics along with the boxplots Another way to create boxplots along with nearly any type of statistical analysis can be performed is by using the Webstat StatCrunch program eee The preceding methods definitely work but a user sometimes may want a worksheet dedicated to boxplots In this case boxplots can be created using the Chart Wizard in Microsoft Excel Since there 1s no preset setting as of Office 2000 for boxplots the program needs to be tricked into creating a boxplot The following step by step methods expound upon those found in We Have Data Now What a manual compiled for the Data Analysis and Interpretation Pilot Training Workshop for Citizen Volunteer Water Quality Monitoring Programs workshop by the Red River Basin Monitoring Network Rivers Council of Minnesota and the River Network 1 The first step to creating a box and whisker plot or boxplot is to determine which monitoring sites will be featured on the graph and create the summary statistics that will be used to create the plot In the summary statistics table sites should be placed in a significant order such as upstream to downstream The summary statistics necessary for creating a boxplot are the 25 percentile Q1 minimum median 504 percentile or Q2 maxim
45. ot easily detected by a time series plot or linear regression this does not necessarily mean that it does not exist There may simply be some complicating factors involved that will necessitate further statistical analysis There are many factors that can affect the determination of trends These include seasonal variation day to day variation and concentrations that vary with flow One thing to consider when conducting trend analysis 1s to try to compare apples to apples instead of apples to oranges For example instead of viewing all data results at once view just the results for one season or month at a time to determine a trend This concept and others are incorporated into some more technical methods of statistical analysis for the detection of trends Some of the concepts introduced by the more technical methods found in Statistical Methods in Water Resources by D R Helsel and Hirsch s Statistical Methods in Water Resources and the EPA Guidance Manual for Data Quality Assessment G 9 can be applied to the trend analysis that can be done with Excel Most of the descriptions of statistical methods found in Helsel and Hirsch are very technical while the EPA guidance manual EPA QA G 9 and hopefully the manual you are reading right now do a better job of explaining these methods in a more understandable fashion The different methods mentioned in Statistical Methods in Water Resources include the Mann Kendall test parametric regression L
46. rees of freedom gives the critical values for a standard normal distribution z 6 2 t oc Z gal 1 645 Alternative Methods and Data Transformations Some data sets may have non linear trends that won t be found using methods for determining a linear slope coefficient In these cases although not in all cases transforming data before trend analysis may increase the chance of success in finding a linear trend Transforming the data into natural log units 1s one way to do this Create a linear trend line using the transformed 61 data by using the methods described in a text book for linear regression or by using Microsoft Excel the easier way to create a trendline through a time series plot or a time graph of the data Once a trend is found the trend slope will be expressed in log units and the percentage of change can be calculated by using the equation e 1 100 where m is the slope of the linear trend in log units Remember that m slope in the equation of a line y mx b Therefore in the equation y 2x 3 the slope is equal to 2 For example the slope of the linear trend of the natural logs of spring total suspended solids results from the Clearwater River at the USGS gauge near the town of Plummer Minnesota 1s 1804 When m in log units 7 1804 the percentage of increase in spring total suspended solids concentrations each year 1s 19 77 If events have occurred within the watershed of a particular monitori
47. reporting results from trend analysis creating a summary table of trend analysis results may be preferable to pages and pages of correlation matrix graphs Plotting correlation matrixes is very helpful but not always necessary Direct calculation of a correlation coefficient may be a desirable alternative for measuring the amount of association between two sets of data Correlation matrixes can be used to find relationships between turbidity and total suspended solids turbidity and transparency tube readings water temperature and dissolved oxygen turbidity and dissolved oxygen turbidity or total suspended solids and phosphorus flow and temperature flow and dissolved oxygen or other parameter combinations 43 e Organic P Red Lake River vs TP 1 8 1 6 1 4 1 2 1 0 0 8 0 6 0 4 0 2 0 0 Crookston Sampson Bridge R 0 9319 Organic Phosphorus T5 2 Total Phosphorus Figure 15 Example of a Correlation Matrix Regression Regression as a statistic can be used to find a relationship between two variables and to estimate the value of one variable based upon the value of another Finding a relationship between two variables using regression is particularly useful because especially in water quality monitoring rarely if ever is there a direct mathematical relationship between variables Although linear regression can be calculated and plotted by hand using the equations
48. rvations are normally distributed the heights of the columns should be roughly shaped like the Normal distribution curve the superimposed blue line in the example below These graphs can be used to interpret the symmetry and variability of data symmetric data will be structured symmetrically around a central point The extent and direction to which data is being skewed will also be indicated by boxplots and frequency distributions 30 25 Frequency n2 C1 eo O 0 Figure 12 Example Frequency Plot Both histograms and frequency plots split data into intervals count the number of values in each group and displaying the data in the form of a bar chart green bars in Figure 12 There are two differences between the two graphs The vertical axis of a histogram represents the percentage of the total data set that is included in each interval The vertical axis of a frequency plot represents the number of observations within an interval These plots can either be created manually see example in figure 2 or using a computer program Analyse it an add in for Microsoft Excel 100 histogram creating add ins for Microsoft Excel around 30 the free data analysis add in for Microsoft Excel and StatCrunch free online at http www statcrunch com are some of the programs that can be used to create histograms The Webstat StatCrunch program is an online statistical analysis tool that can be accessed through the RLWD website o
49. s Both the spiked sample and the unmodified sample are analyzed at the same time The percent recovery of a matrix spike 1s calculated by dividing the difference in concentration between the results for the spiked sample and the results for the original sample by the concentration of the spike that was added Greater values for percent recovery indicate a higher level of accuracy The lab tests a spiked sample and the non spiked sample When the percent recovery 1s calculated 1t should be within the range of 90 to 110 percent A perfect percent recovery is 100 percent If the percent recovery is low there may be something in the sample that is interfering with the test The percent recovery equation for matrix spikes 1s shown below Recovery Conc of Spiked Sample Conc of Non spiked Sample X 100 Concentration of Spike Added x09 3 23 Conversions Conversions are often necessary when managing and analyzing water quality data Results from different sources may be in different units Conversions are nearly always a necessity when working with loads since the units of volume in concentration data are usually milligrams and the units of volume in flow measurements are usually cubic feet When converting data knowing conversion factors between units 1s essential Lists of conversion factors are available in table form see below but they are also very handy when they are in an electronic form Conversions can be performed with advanced
50. s day 2446576 liters day micrograms liter ug L 101940 6 liters day 2446 576 cubic meters day 3600 cubic feet hour 1 microgram Liter ug L 1 part per billion ppb Temperature Fahrenheit to Celsius C F 32 5 9 Subtract 32 multiply by 5 and then divide by 9 Celsius to Fahrenheit F 32 C 9 5 Multiply by 9 divide by 5 and then add 32 29 3 24 Graphical Methods Other forms of statistical analysis are often needed Summarizing analysis results in tables graphs or charts for reporting purposes can be very helpful to the reader Some of the descriptive statistical analysis performed for the Red River Watershed Assessment Protocol Project include the determination of minimum detection limits recommending methods for addressing values below the minimum detection limit histograms boxplots time series plots next section correlation matrixes and flow duration curves It 1s important to make graphs neat informative and understandable The graphs should be useful for interpreting the meaning of data and presenting findings from data There are many techniques involved in creating quality graphs Here are some tips V Graphing data is part of a process You may end up graphing more data than you will use in a report or presentation Some data you graph will be more valuable than others If graphs are used as part of the process of understanding data their meanings indications and other results may
51. s may have been collected during low flows but the optimal distribution that FLUX calculates will show that the majority of the samples should be collected during high flow periods Using this comparison a monitoring program can be adjusted to for example collect more samples during high flow periods than during low flow periods if one of the goals of the program is the calculation of annual or seasonal loads These calculations of optimal sample distributions are found in the optimal sample allocation section of the breakdowns screen see Figure 28 NE is the actual percentage of samples in each stratum NEOPT is the optimal percentage of samples in each stratum Below this section FLUX gives the CV that would have been obtained if the samples were optimally distributed among the strata In the Figure 28 the CV could have been reduced from 130 to 093 with an optimal sample allocation EUM
52. same table FLUX reads from the top down in each column of data and when it encounters a blank or zero value it stops reading values so if there are blank cells between results not all of the data will read by the FLUX program In the spreadsheet a title on the first line of the table and column headings in the second row are another necessity The DATE column headings should be typed in all capital letters Use consistent column headings for flow and other parameters You will need to remember what these column headings are writing them down helps when you are telling FLUX where to find the data Each individual worksheet within the workbook will need to be saved as a WK1 file if it has been created in Excel When saving the worksheets put them in a location where the file path 1s easy to remember C model Data The Lotus spreadsheet below is formatted to work with FLUX Keeping track recording file names column headings and date ranges 1s highly recommended so a quick reference 1s available when bringing data into FLUX 2635 3 Lotus 1 2 3 Release 5 TSS760 WK1 a x o File Edit View Style Tools Range Window Help m8 x F8 me 8 eos Name of the Site or other title in A1 f alt p PH New Sheet r8 NM C D E G H I J K 2 THIEF RIVER 760 2 e DATE T E 18 28 98 47 A 5 27 95 11 3 Remember the column heading B2 of the o5 09 26 95 E parameter data for later write it down Pe 023 95 ob 07 31
53. similar to this e min a 25th median 75th max E n 12 Repeat Step 10 for the minimum and median lines When you are done the graph should look like this min a 25th median 75th max 13 Double click on the line for the 25 or 75th data series to bring up the Format Data Series window This time select the Series Order tab Make sure that the order of the series to the following 25 percentile minimum median maximum 75 percentile This series order can be changed if needed by using the MOVE UP and MOVE DOWN keys Format Data Series 2 leg Patterns Axis Y Error Bars Data Labels Series Order Options eries order Move Up _38 14 Before clicking OK click the Options tab Check the boxes for High low lines and Up down bars Adjust the Gap width number to 150 A smaller gap width value will produce larger boxes in the box and whisker plot and vice versa y Format Data Series Patterns Axis Y Error Bars Data Labels Series Order Options Drop lines Gap width dis iw High low lines 15 Click on the Patterns tab and repeat Step 10 for the 25 and 75 percentile lines to remove the remaining lines and markers Now the graph should look similar to this 25th min median max 75th
54. sion factor of 1 Kg 1 000 000 mg Tons per day can be calculating by multiplying the kilograms per day by a conversion factor of 1 ton 907 1847 Kg Flow Weighted Mean Calculating the flow weighted mean concentrations of water quality parameters places more importance to concentrations recorded during higher flows when calculating an average concentration High flow periods can contribute the majority of the total flow volume for a given year The concentrations of water quality parameters during periods of high flows can have a greater impact on receiving waters than the concentrations during periods of low flow Weighted means are calculated by multiplying each individual datum in a data set by a weighting factor finding the sum of these products and then dividing this sum by the sum of the weighting factors In other words to find flow weighted mean concentrations first multiply parameter concentration by flow for each sampling event Find the sum of the products from all sampling events Finally divide this sum by the sum of all the flow values No conversions of concentration or flow should be needed Any conversion factors added to the equation would need to be applied to both the divisor and the dividend and will therefore cancel each other out and will be a waste of time The following equation will calculate the flow weighted mean using a data set of concentrations c c4 and flows fi f4 Flow weighted mean c f c2 fo c3 f3
55. t Page Break PI Function l hal t 3 You are now at Step 1 of 4 in the chart wizard process Select XY Scatter from the list of chart types You may choose what you want the chart to look like from the sub type options on the right Click Next when you are finished En Chart Wizard Step 1 of 4 Chart Type Standard Types Custom Types Chart Evpe Chart sub Evpe yoy a bax px Scatter with data points connected by smoothed Lines Press and Hold Ea View Sample Cancel wet Finish 4 When you get to Step 2 you will see a preview of your chart Click the Series tab 5 At this point you can enter a name for your data series in the Name box check to see if your graph will turn out the way you want it to If you want to add additional data series to the chart you can use the Add button to add another data series for the purpose of comparing data sets Once everything looks the way you want it to proceed to the next step by clicking Next At any point from this step forward you can click the Finish button and skip to Step 9 if you are satisfied with the appearance of the graph However going through all the steps will result in a more presentable graph 53 E 6 10 Chart Wizard Step 7 of 4 Chart Source Data E Ea Total Phasphoraz e Total Phosphor Total Phosphorus Mare Total Phosphorus 53 l 55 44 E AT76 53 I 148 44 5 76
56. t Type window will appear Name the new custom type Boxplot or Box and Whisker and type a description The necessary series order 1s an important piece of information to put in the description box Click OK when you are done An option for creating boxplots will appear among the chart type options Standard Tvpes Chart type Add Custom Chart Type az This dialog allows you ko make the active chart into a custom chart type Enter a text name Far the new custom chart type ame Boxplot Enter a kext description For the new custom chart type Description Make sure there are series For the 25th percentile minimum median maximum and 75th percentile in that order elect From User defined Built in Set as default chart Ok Cancel If a custom chart type has been created for box and whisker plots additional boxplots can be made very easily and efficiently Some of the steps in the process can be skipped To create a box and whisker plot using the custom chart type that was created in step 19 first complete Steps 1 3 Instead of choosing the chart type indicated in Step 4 choose the custom chart type created in Step 19 Chart Wizard Custom Types User defined Name of custom box and whisker plot chart type Perform steps 5 8 and then skip to step 13 If your columns were in the correct order 25 3 min gt median max 75 step 13 is also unnecessary and can be skipped For step 14 look
57. t other stratification schemes based upon flow by increasing the number of strata In the main menu with DATA highlighted use the arrow keys to get to Stratify then Flow then 2 Strata and then press ENTER t Letter TO RUN ROUTINE CALC METHOD Q UTD C Figure 26 Path Through the Menu to Stratification FLUX will automatically stratify the flow into two categories 68 es Ci model flux FLUX EXE F1 HELP F2 DOME SBRUE F3 EDIT FIELD F HELP EDITOR lt E amp C gt ABORT Figure 27 Stratification Screen FLUX will automatically use the mean flow volume as the boundary between Stratum 1 and Stratum 2 The flow levels for each category can be modified but for the sake of sticking to the basics press the F2 key and then the ESC key to go back to the MAIN MENU To see if this stratification improves the CV go to LIST gt BREAKDOWNS once again to see if the overall CV the average values are in the bottom rows of each table shown on this screen is larger or smaller than the previous CV If it is smaller try using 3 strata or even 4 to find the smallest possible CV A limiting factor for the amount of stratification that can be applied to data 1s the number of samples in each stratum When there are too few samples per strata too many strata the FLUX program will inform the user of this problem by displaying an illegal stratification error After you have achieved the lowest possible coefficient of variance r
58. um and the 75 percentile Q3 If the columns are in this order as shown below you will be able to skip Step 13 Also after saving the boxplot as a custom chart type having summary data arranged in this order will make the creation of boxplots easier in the future A B C 0 E F 5 H 1 Fecal Coliform 2 25th min median max 75th avg of samples 3 CR23 4 SG130 SS 20 sooj saoj zao s92 ESSEN 2 Select the site name pon percentile minimum median maximum and 755 percentile column headings and data 3 Select the Chart Wizard Button ji 4 In the Chart Wizard Step 1 of 4 click on the Standard Types tab and choose the Line chart Choose the chart sub type labeled line with markers displayed at each data value 34 Chart Wizard Step 1 of 4 Chart Type Standard Types Custom Types Chart type Chart sub Evpe I EN dr Radar dipl Surface Au Bubble lt Stock N Line with markers displayed at each data value Press and Hold to View Sample Cancel Next gt Finish 5 Click Next to continue 6 In Chart Wizard Step 2 of 4 the data range box should automatically contain the summary data cells you selected in Step 2 Click the round button that puts the series into Columns Click Next to continue Chart Wizard Step 2 of 4 Chart Source Data Aix Data Range Series 350 0 300 0 EINE 200 0 150 0 100 0 59 0 0 0 Data range Bosplot a
59. verage of a set of samples 1s one way of finding the center value of a data set Divide the sum of the results by the number of results Mean can be automatically calculated using the AVERAGE equation in Microsoft Excel Geometric Mean A geometric mean can be used to calculate a mean that is not skewed by extreme values It is one of the calculations used when assessing waters for impairment for the TMDL program particularly for fecal coliform Fecal coliform levels can be very low on one day and too numerous to count the next day on some streams The geometric mean is normally close to the median for positively skewed data sets Where G represents the geometric mean and the x values represent a series of numbers in a data set G xi x9 Vx Ga G Xi X2 X3 Qui x2 x3 And so on Note that geometric mean takes the product of all the numbers in the data set to the power of one over the number of values in the data set Geometric mean can also be calculated automatically using a function in Excel GEOMEAN A1 A5 where A1 A5 is the range of cells that contain the data to be analyzed for the example The geometric mean cannot be calculated for data sets that include values of zero Therefore values that are below the minimum detection limit represented by lt MDL in lab reports must be represented by a positive number such as one half of the MDL Trimmed Mean This is another way to remove the influence of outliers in d
60. w worksheet Existing worksheet Click Finish ta create your PivotTable _ 48 7 You ll see the following window PivotTable Layout The boxes on the right are the column headings Cfield buttons in thecell range you selected in Step 6 above UNPivotTable by dragging the field buttons oti COLUM ALKALINI TURBIDIT TIME AIR TEHP Cancel 8 Select and drag each of the field buttons to its appropriate place in the diagram In this case we want to create a table with the sites on the left of the table and the dates across the top This 1s shown by the window below Note that you can double click on the Count of pH field and you can proceed to the procedures described in step 13 at this point After dragging the fields to their desired locations and or selecting the desired summary statistics Click OK to go back to the PivotTable Wizard Step 3 of 3 m pivetTable Wizard Layout Construct your PivatTable by dragging the field buttons an the right ta the diagram on the left ITE NUPME WATER_TE DATE AYERADGE I PH NITRATES ALEALINITY TURBIDITY ROW TIME Cancel _ 49 9 Next click the Options button and make selections so that the options window looks like the window below or make modifications to suit your needs and then click OK PivotTable Options Mame Town Creek pH Format options C Grand totals for columns Page layout Down Then Over Gr

Part 2 - Section 3.2 through 3.4

Contents

Download Pdf Manuals

Related Search

Related Contents