Home

CatchAll Version 1.0 User Manual by Linda Woodard, Sean

image

Contents

1. 2 Mixture of two exponentials mixed Poisson The stochastic abundance distribution is a mixture of two exponentials and the mixed Poisson distribution is then a mixture of two geometrics ae ae 1 6 N 1 Oa j 0 1 2 01 02 gt 0 0 lt 03 lt 1 3 Mixture of three exponentials mixed Poisson The stochastic abundance distribution is a mixture of three exponentials and the mixed Poisson distribution is then a mix ture of three geometrics 1 6 N 1 ba I EmA 5 a s a E 1 03 4 1 04 65 se 5 j 0 1 2 01 02 03 gt 0 0 lt 04 05 lt 1 4 Mixture of four exponentials mixed Poisson The stochastic abundance distribution is a mixture of four exponentials and the mixed Poisson distribution is then a mixture of four geometrics ee 5 1 1 f 1 02 f 1 bs I 1 64 N t 1 4 EDIK G m z E j 0 1 2 01 02 803 804 gt 0 0 lt 85 86 07 lt 1 CatchAll computes all five models at every value of 7 having non zero frequency count in the data This generates a combinatorial explosion of analyses one for each model r combination which then must be sifted to find a best model or at least a collection of best models We do this according to the following algorithm which combines statistical principles and heuristic decisions based on empirical experience Model selection algorithm 1 2 Statistical Eliminate model 7 combinations f
2. atchAll Version 2 6 input File SampleInput csy output path Out put Fo lder oisson Model SingleExp Model woMixedExp Model hreeMixedExp Model Singular matrix in routine ludcmp FourMixedExp Model Singular matrix in routine ludcmp on parametric Models End of Analysis D CH_Pro jects CatchAllWeb WWersion2 gt CatchAllCmdW SampleInput csy Out putFolder atchAll Version 2 6 input File SampleInput csy output path Out put Folder oisson Model SingleExp Model woMixedExp Model hreeMixedExp Model Singular matrix in routine ludcmp FourMixedExp Model Singular matrix in routine ludcmp on parametric Models End of Analysis D CH_Pro jects CatchAllWebWersion2 gt CatchAl1l1CmdW SampleInput csy Out putFolder atchAll Version 2 8 input File SampleInput csy output path Out put Folder oisson Model SingleExp Model woMixedExp Model hreeMixedExp Model Singular matrix in routine ludcmp on parametric Models End of Analysis CH_Pro jects CatchfAl1lWeb WWersion2 gt 3c Analysis with the Linux command line version CatchAllcmdL exe Mono must be invoked to run this executable at least two parameters must be supplied to the Linux MAC command line version the input filename complete path if not in same directory as the executable and the path to the directory where the output files will be written If no such folder exists it will be created ren See Output below for a detailed description of these files Optionally you can
3. in the population The ith species contributes a random number X of individuals to the sample where X 0 1 2 If X 0 then the ith species is unobserved X has a Poisson distribution with mean E X A 7 1 C and in general we assume that A Ac are distributed according to a stochastic abundance model that is a prob ability distribution with probability density function say f A The stochastic abundance distribution depends on some number of parameters in our implementation there are at most 7 parameters called 0 The observed frequency count data is then unconditionally distributed as zero truncated f mixed Poisson We fit this distribution to the data via max imum likelihood which yields an estimate of the parameter vector 6 and from this we obtain an estimate p 0 of the zero probability p 0 0 which is the probability that an arbitrary species is unobserved in the sample Our final estimate is then A c C SS O A 1 p 0 0 where c is the number of observed species in the sample This estimate has an associated standard error given by A z 1 2 SE C c x a0 aj A a0 A where aoo 1 p 0 4 p 0 ao 1 p 0 0 Vo 1 p 0 and A Info 0 X the Fisher information about 0 in X all evaluated at 0 6 Actually the situation is slightly more complicated Because frequency count data typ ically exhibits a large number of rare species graphically a steep slope up
4. include a flag to have the program calculate the 4 Mixed Exponential Model the default is to calculate it N B the 4 Mixed Exponential Model will take longer to calculate Calculate 4 Mixed Exponential Model mono CatchAllcmdL exe inputfilename outputpath or mono CatchAllicmdL exe inputfilename outputpath 1 Don t calculate 4 Mixed Exponential Model Pa mono CatchAllcmdL exe inputfilename outputpath 0 4 Output Running the analysis program generates a number of files If you use the GUI version a folder called Output is created in the same directory as the input file If you use either command line version these files are put in the folder you designate This folder contains the following files datasetname_Analysis csv This is a complete listing of all information from all analyses performed by CatchAll See the section Statistical Procedures below for details datasetname_BestModelsAnalysis csv Column formatted copy of summary analysis output as displayed in the main CatchAll window when using the GUI version This is to be read into the Summary Analysis worksheet in CatchAll display xlsm see below datasetname_BestModelsFits csv Fitted values for the best models as selected by the model selection algorithm see Statistical Procedures This is to be read into the Best Fits Data worksheet in CatchAll display xlsm see below datasetname_BubblePlo
5. 5 25 5 36 3 0 0024 0 1802 Model 2c SingleExp 9 17 31 8 4 1 27 4455 0 2306 0 1511 Non P 1 Chaoi 2 17 30 6 5 25 1 57 6 Non P 2 ACE 9 17 28 8 4 1 25 1 44 8 Parm Max Tau SingleExp 81 24 26 3 1 7 24 7 32 3 0 1132 Non P Max Tau ACE1 81 24 57 2 27 8 32 162 4 Best Models Fits D C _Projects CatchAllWeb Output Samplelnput_BestModelsFits csv Best Models Analysis D C _Projects CatchAllWeb Output Samplelnput_BestModelsAnalysis csv Best Models Bubble Plot D C _Projects CatchAllWeb Output Samplelnput_BubblePlot csv 3 Close Window About CatchAll 3b Analysis with the Windows command line version CatchAllCmdW exe At least two parameters must be supplied to the Windows command line version the input filename complete path if not in same directory as the executable and the path to the directory where the output files will be written If no such folder exists it will be created See Output below for a detailed description of these files Optionally you can include a flag to have the program calculate the 4 Mixed Exponential Model the default is to calculate it N B the 4 Mixed Exponential Model will take longer to calculate Calculate 4 Mixed Exponential Model CatchAllcmdW exe inputfilename outputpath or CatchAllcmdW exe inputfilename outputpath 1 Don t calculate 4 Mixed Exponential Model CatchAllcmdW exe inputfilename outputpath 0 r BA Command Prompt a 2S
6. CatchAll Version 1 0 User Manual by Linda Woodard Sean Connolly and John Bunge Cornell University Funded by National Science Foundation grant 0816638 1 System requirements There are two types of programs available the main analysis program in a variety of flavors CatchAllIName exe and an interactive graphics module CatchAll display xlsm written in Excel 2007 which uses macros that need to be enabled The graphics module runs only on a Windows platform assuming Excel 2007 or later is installed Unfortunately Apple decided not to enable macros in this version of Excel so the GUI version of the executable CatchAlIGUI exe will only run under Windows There is also a Windows command line version CatchAllcmdW exe The Net framework must be installed to run either Windows version In addition there is a command line version CatchAllcmdL exe that will run on the MAC OS and other Linux platforms provided the appropriate version of Mono has been installed 2 Input data CatchAll is a set of two programs for analyzing data derived from experiments or observations of species abundances or multiple recapture counts For simplicity we will use the species abundance terminology throughout this manual but the same methods can be applied to the total counts row sums of recaptures in a multiple recapture or multiple list study The fundamental dataset consists of frequency counts This is a list of frequencies of occurrence followed b
7. bble plot with the nonparametric sequence deleted for easier visual comparison of the parametric estimates Note that all Excel functions are available under CatchAll display xlsm in particular one can change the scale of axes alter colors or shapes delete plotted data sequences etc at will Thus the program give the user interactive control over the graphical displays
8. ers see the section on CatchAll display xlsm for more discussion of this point 1 Good Turing also called homogeneous model i e equal species sizes same as sumption as Model 0 Poisson above C c_ T 1_ fi n_r ote c T where n_ T iq ifi 2 Chaol e fi fi 1 2 f2 0 This is generally regarded as a lower bound for C c f 2fe f2 gt 0 3 ACE Abundance based Coverage Estimator Ay C7 DE 2 C 1 f n_ r c t 1 fi n T X Yrare fi 2 i fi n T X Yrare Good Turing 4 where EE eT Xj ili Di Trare G filn r n_ r n 7 1 1 0 i 4 ACE1 Abundance based Coverage Estimator for highly heterogeneous cases yO c T i fi 12 C 1 fi n T c4 T j 1 fi n T X Yrare fi 12 Good Turing 4 1 A fi n T x Yrare where fun to _ Elsi DA o 1 f n T n_ r 1 i ACE is preferred when Yrare lt 0 8 otherwise ACEI is preferred CatchAll makes this selection automatically Vare Max Ce 1 5 Chao Bunge gamma Poisson estimator Ay ino fi c T fi a ARa ee jeter Keala Oe ifi HOT This is known to be consistent when the stochastic abundance model is the gamma distribution i e when the sample counts follow the negative binomial distribution In each case we also compute a standard error based on an asymptotic approximation due to Chao The variance for one of these estimators C is g
9. iven by the approximate formula aC aC C DD ap aR ie fi i gt l j gt 1 where c v fi fj a iX fj The empirical standard error of C is then Var C Thus the problem is to calculate aC Ofi which in turn depends on the formula for in each case We omit the specific details here Finally we display two analyses of the full dataset that is with no right truncation T max7T These are the minimum AICc parametric model and the preferred ACE ACE1 choice 3 Graphical displays implemented by CatchAll display xlsm The Microsoft Excel based module CatchAll display xlsm generates four displays See the section on Display for details on operation files etc for this program The displays are as follows 1 Summary analysis This is a copy of the CatchAll output window formatted for columns 2 Best Fits Color This is a scatterplot showing the frequency count data as points and the various fitted models best 2a 2b 2c as curved lines 3 Bubble Graph Color This graph shows the behavior of the estimates as 7 is increased i e as outliers large frequencies are progressively added to the data Typically the nonparametric estimates diverge as 7 increases while the parametric estimates converge The bubble sizes are proportional to SE 2 in each case points are not plotted they are blanked out if either i C gt 100 x c or ii SE gt 10 x GC 4 Bubble Color No Non Parametric This is the bu
10. nutes the Model Analysis Completed button will appear click OK N B the 4 Mixed Exponential Model will take longer to calculate ag Catchall a Saye FS Mr HF KS SS ES Ty KS eS S Foy olay xo a see Os a a Mam S i l P 1 Locate Input Data csv file Input D C _Projects CatchAllWeb Samplelnputcsv 2a Run Program with 4 Mixed Exp 2b Run Program without 4 Mixed Exp PROGRAM RUNNING 4 Mixed Exp will not be calculated Program Message Model Analysis Completed iJ 3 Close Window LI i A summary of the analysis appears in the Best Models window and the OUTPUT FILES window displays the pathnames for the files used by the interactive graphics program CatchAll display xlsm see below for details Other files created by the program are located in the same folder See Output below for a detailed description of these files oth Se Se a a a eee ee ee ee ey O Or t 1 Locate Input Data csv file Input D C _Projects CatchAllWeb Samplelnputcsv 2a Run Program with 4 Mixed Exp 2b Run Program without 4 Mixed Exp Best Models Total Number of Observed Species 24 Model Tau Observed Estimated SE Lower Upper GOFO GOFS Species Species cB cB Best Model SingleExp 17 21 29 2 2 8 25 9 38 2 0 0949 0 3807 Model 2a SingleExp 11 18 30 9 3 6 26 6 42 2 0 2612 0 7392 Model 2b SingleExp 28 22 28 3 2
11. or which GOF5 lt 0 01 Statistical For each 7 select the model with minimum AICc Akaike Information Criterion corrected where necessary for small sample sizes Heuristic Eliminate model 7 combinations for which estimate gt 100xACE1 where ACE1 is the estimate at 7 10 Heuristic Eliminate model 7 combinations for which SE gt estimate 2 Heuristic Then e Best model Select the largest 7 for which GOFO gt 0 01 e Model 2a Select the 7 with maximum GOFO e Model 2b Select the largest 7 e Model 2c Select 7 as close as possible but lt 10 Heuristic e If all model 7 combinations are eliminated allow GOF5 above 0 001 but keep SE lt estimate 2 e If all combinations are still eliminated allow GOF5 above 0 001 and allow SE up to the estimate e If there are still no combinations require calculable computable GOF5 but impose no restrictions on SE 2 2 Nonparametric procedures We compute five nonparametric estimates of C All derive directly or indirectly from the coverage based approach under which the estimate of the total number of species is based on an estimate of the coverage of the sample the proportion of the population represented by the sampled species We compute the nonparametric estimates at every 7 as we do for the parametric estimates but we report them only for 7 10 or the nearest possible value because they tend to be highly sensitive to outli
12. splays For example one can easily change the scale of the axis of a plot or its upper or lower cutoff values For more information see the appropriate Excel help screens CatchAll Version 1 0 User Manual by Linda Woodard Sean Connolly and John Bunge Cornell University Sponsored by NSF Grant 0816638 October 5 2010 1 Introduction CatchAll is a set of programs for analyzing frequency count data arising from abundance or incidence based samples Given the data CatchAll estimates the total number of species or individuals observed unobserved and provides a variety of competing model fits and model assessments along interactive graphical displays of the data fitted models and comparisons of estimates The first program is CatchAll exe which performs the neces sary statistical and numerical analysis and the second is CatchAll display xlsm which is a Microsoft Excel based program that generates the graphical displays We first discuss the statistical procedures underlying the main program 2 Statistical procedures implemented by CatchAll 2 1 Parametric models We fit a suite of five parametric models to the data These are increasingly complex ver sions of the standard model for species estimation For full mathematical details see e g Bunge and Barger 2008 here we give a sketch intended to briefly explain the compu tations performed by the program We assume that there is a fixed number of species C lt oo
13. t csv Analysis data to generate the bubble plot display this is to be read into the Bubble Graph Data worksheet in CatchAll display xlsm see below 5 Display To view the analysis results graphically and to create graphics for presentations or articles open CatchAll display xlsm by double clicking Near the top of the screen click Options gt Enable Macros On the worksheet Summary Analysis click Import Summary Analysis and navigate to the file datasetname_BestModelsAnalysis csv This copies the CatchAll summary output display to the worksheet in column formatted form On the worksheet Best Fits Data click Import Best Fit Data and navigate to the file datasetname_BestModelsFits csv This imports the fitted values from the best selected models and automatically generates the comparative plot on the Best Fits Color worksheet On the worksheet Bubble Graph Data click Insert Bubble Graph Data and navigate to the file datasetname_BubblePlot csv This imports the required data and automatically generates the bubble plot on the Bubble Graph Color worksheet See Statistical Graphics below for more details on these plots Note CatchAll display xlsm is a fully functional Excel spreadsheet This means in particular that all of the mathematical and data spreadsheet functions are available and that the graphs may be modified in the usual way by altering the input data or by right clicking within various parts of the di
14. two goodness of fit measures GOFO is the p value for the Pearson x goodness of fit test comparing the observed frequencies to the expected frequencies under the fitted model This measure uses no adjustment for low cell counts that is every frequency is compared to its corresponding expected frequency Since the x test is based on an asymptotic approximation requiring cell counts gt 5 although there is not a consen sus on this value we also compute a p value for the Pearson x test after concatenating adjacent cells so as to achieve a minimum expected cell count of 5 under the fitted model this is GOF5 Since the null hypothesis in both cases is that the model fits larger p values support the choice of model We compute five progressively more complicated models which we refer to as order 0 1 2 3 and 4 0 Poisson Here the stochastic abundance distribution is a point mass at a fixed A i e all of the species sizes are assumed to be equal This is rarely if ever realistic and almost never fits real data but it provides a readily computable lower bound benchmark since heterogeneous species sizes will render this model downwardly biased 1 Single exponential mixed Poisson The stochastic abundance distribution is expo nential 1 L058 ge A gt 0 8 gt 0 The mixed Poisson distribution of the frequency counts is then the geometric 1 0 N P X j 0 A ea j 0 1 2 6 gt 0
15. ward to the left and a small number of very abundant species a long right hand tail of outliers paramet ric models typically do not fit the entire dataset Instead some outliers must be deleted Specifically we fit a parametric model up to some maximum frequency 7 deleting all of the frequency count data for frequencies gt 7 obtaining an estimate that depends on 7 T To complete the estimate we add the number of species with counts greater than 7 c4 T and the final estimate is then C C r c r Similarly the SE is only com puted on the data excluding outliers i e on the frequency counts up to 7 Essentially we regard the frequencies gt 7 as constants or fixed points for the purposes of the analysis This means that we compute every model at every possible value of 7 and compare the results we return to this issue below For confidence intervals we do not use the Wald or normal approximation interval C 1 96 SE for various reasons Instead we implement an asymmetric interval based on a lognormal transformation proposed by Chao 1987 Estimating the population size for capture recapture data with unequal catchability Biometrics 43 4 783791 Let c T denote the total number of species with frequency counts lt 7 so that c T ci T c for all 7 The lognormal based interval is then c C r c 7 d c E t 7 x d where iagi 1 90 log 1 SE C r c r We also compute
16. y the number of species occurring the given number of times in the sample For example in the following dataset 1 295 2 63 3 30 4 6 5 4 6 6 7 1 9 6 11 1 12 2 13 1 14 1 17 1 21 1 25 1 30 1 31 1 55 1 69 1 86 1 there are 295 species with exactly one representative in the sample called singletons 63 species with two representatives 30 with 3 and then there are some large or very abundant species in the right tail 1 species with 55 representatives 1 with 69 and 1 the largest or most abundant with 86 This structure with a large number of rare species and a small number of very abundant species is typical The dataset must be in this comma delimited format frequency count with filename equal to datasetname csv or datasetname txt 3a Analysis with the GUI version CatchAlIGUI exe To read in the data start CatchAll by double clicking on CatchAll exe use the Locate Input Data button to navigate to your dataset and double click on the appropriate file CatchAll then displays the first 10 lines of your dataset in a small window for verification click OK oe z a Cathal os ole x 1 Locate Input Data csv file 2a Run Program with 4 Mixed Exp 3 Close Window About CatchAll Once the data is loaded perform the analysis by clicking one of the Run Program buttons After a short time ranging from lt 1 sec to lt 5 mi

Download Pdf Manuals

image

Related Search

Related Contents

SMC SMCHPAV-ETH EZ Connect™ Powerline Ethernet Adapter  Eagle ET-CS2LSU2-BK storage enclosure  Marantz PM7005    Matrox Electronic Systems II User's Manual  Zimbra Web Client User Guide Advanced Web Client  Operating instructions  HDR-CX410VE  Packard Bell SL User's Manual    

Copyright © All rights reserved.
Failed to retrieve file