Home

SSJ User's Guide Package gof Goodness-of

1. 1 10 11 N H Anderson and D M Titterington A comparison of two statistics for detecting clustering in one dimension Journal of Statistical Computation and Simulation 53 103 125 1995 T W Anderson and D A Darling Asymptotic theory of certain goodness of fit criteria based on stochastic processes Annals of Mathematical Statistics 23 193 212 1952 D A Darling On the asymptotic distribution of Watson s statistic The Annals of Statistics 11 4 1263 1266 1983 J Durbin Distribution Theory for Tests Based on the Sample Distribution Func tion SIAM CBMS NSF Regional Conference Series in Applied Mathematics SIAM Philadelphia PA 1973 J Glaz Approximations and bounds for the distribution of the scan statistic Journal of the American Statistical Association 84 560 566 1989 J Glaz J Naus and S Wallenstein Scan statistics Springer Series in Statistics Springer New York NY 2001 D E Knuth The Art of Computer Programming Volume 2 Seminumerical Algo rithms Addison Wesley Reading MA third edition 1998 A M Law and W D Kelton Simulation Modeling and Analysis McGraw Hill New York NY third edition 2000 P L Ecuyer and R Simard TestU01 A Software Library in ANSI C for Empirical Testing of Random Number Generators 2002 Software user s guide Available at http www iro umontreal ca lecuyer P A W Lewis Distribution of the Anderson Darling
2. and the p value computed with the above definition is p 1 pr 0 632 Note that if pr is very small in this definition p becomes close to 1 If the left p value was defined as py 1 pr PY lt y Hol this would also lead to problems In the example one would have pz 0 in that case A very common type of test in the discrete case is the chi square test which applies when the possible outcomes are partitioned into a finite number of categories Suppose there are k categories and that each observation belongs to category 2 with probability p for 0 lt i lt k December 17 2014 CONTENTS 3 If there are n independent observations the expected number of observations in category 1 is e np and the chi square test statistic is defined as xX Z 2 0 gt ei 2 Ci where o is the actual number of observations in category 7 Assuming that all e s are large enough a popular rule of thumb asks for e gt 5 for each i X follows ap proximately the chi square distribution with k 1 degrees of freedom 12 The class GofStat OutcomeCategoriesChi2 a nested class defined inside the GofStat class provides tools to automatically regroup categories in the cases where some e s are too small The class GofFormat contains methods used to format results of GOF test statistics or to apply several such tests simultaneously to a given data set and format the results to produce a report that also contains the p valu
3. contains random variables from the distribution function dist then the result will contain uniform random variables over 0 1 public static DoubleArrayList unifTransform DoubleArrayList data DiscreteDistribution dist Applies the transformation U F V fori 0 1 n 1 where F is a discrete distribu tion function and returns the result as an array of length n V represents the n observations contained in data and U the returned transformed observations Note If V are the values of random variables with distribution function dist then the result will contain the values of discrete random variables distributed over the set of values taken by dist not uniform random variables over 0 1 public static void diff IntArrayList sortedData IntArrayList spacings int ni int n2 int a int b Assumes that the real valued observations Up Un 1 contained in sortedData are already sorted in increasing order and computes the differences between the successive observations Let D be the differences returned in spacings The difference U U _1 is put in D for n1 lt i lt n2 whereas Un a is put into Dn and b Ungz is put into D 2 1 The number of observations must be greater or equal than n2 we must have n1 lt n2 and n1 and n2 are greater than 0 The size of spacings will be at least n 1 after the call returns public static void diff DoubleArrayList sortedData DoubleArrayList spacings int n1 int n2 doub
4. pl public static double kolmogorovSmirnov double sortedData Computes the Kolmogorov Smirnov KS test statistics D D and Dn see method kolmogorovSmirnov Returns the array D D Dy public static double kolmogorovSmirnov oun eae sortedData Computes the Kolmogorov Smirnov KS test statistics D D and Dp defined by 2 Dy mx 1 1 n Up 14 Da max Uo 3 n 15 Dn max Dt D 16 December 17 2014 GofStat 13 and returns an array of length 3 that contains D D3 Dn These statistics compare the empirical distribution of U U n which are assumed to be in sortedData with the uniform distribution over 0 1 public static void kolmogorovSmirnov double data ContinuousDistribution dist double sval double pval Computes the KolmogorovSmirnov KS test statistics and their p values This is to com pare the empirical distribution of the unsorted observations in data with the theoretical distribution dist The KS statistics D D and Dn are returned in sval 0 sval 1 and sval 2 respectively and their corresponding p values are returned in pval 0 pval 1 and pval 2 public static double kolmogorovSmirnovJumpOne DoubleArrayList sortedData double a Compute the KS statistics D a and D7 a defined in the description of the method FDist kolmogorovSmirnovPlusJumpOne assuming that F is the uniform distribution over 0 1 and that U 1 U n are
5. to be printed explicitly If EPSILONP e then any p value less than e or larger than 1 e is not written explicitly the program simply writes eps or 1 eps The default value is 1058 public static double SUSPECTP 0 01 Environment variable used in formatp1 to determine which p values should be marked as suspect when printing test results If SUSPECTP a then any p value less than a or larger than 1 a is considered suspect and is singled out by formatp1 The default value is 0 01 public static String formatp0 double p Returns the p value p of a test in the format 1 p if p is close to 1 and p otherwise Uses the environment variable EPSILONP and replaces p by e when it is too small public static String formatp1 double p Returns the string p value of test then calls formatp0 to print p and adds the marker if p is considered suspect uses the environment variable SUSPECTP for this public static String formatp2 double x double p Returns x on a single line then go to the next line and calls formatp1 December 17 2014 GofFormat 16 public static String formatp3 String testName double x double p Formats the test statistic x for a test named testName with p value p The first line of the returned string contains the name of the test and the statistic whereas the second line contains its p value The formated values of x and p are aligned public static String
6. dist this method computes the kernel density estimate at each of the m points Y j j 0 1 m 1 where m is the length of Y the kernel is kern density x and the bandwidth is h Returns the estimates as an array of m values public static double computeDensity EmpiricalDist dist ContinuousDistribution kern double Y Similar to method computeDensity above but the bandwidth h is obtained from the method KernelDensityGen getBaseBandwidth dist in package randvar December 17 2014 T GofStat This class provides methods to compute several types of EDF goodness of fit test statis tics and to apply certain transformations to a set of observations This includes the proba bility integral transformation U F X as well as the power ratio and iterated spacings transformations 15 Here Uco U n 1 stand for n observations Up Un 1 sorted by increasing order where 0 lt U lt 1 Note This class uses the Colt library package umontreal iro lecuyer gof import cern colt list public class GofStat Transforming the observations public static DoubleArrayList unifTransform DoubleArrayList data ContinuousDistribution dist Applies the probability integral transformation U F V for i 0 1 n 1 where F is a continuous distribution function and returns the result as an array of length n V represents the n observations contained in data and U the returned transformed observations If data
7. tests in this list are to be performed when asking for several simultaneous tests via the functions activeTests formatActiveTests etc public static final int KSP 0 Kolmogorov Smirnov test public static final int KSM 1 Kolmogorov Smirnov test public static final int KS 2 Kolmogorov Smirnov test December 17 2014 GofFormat 17 public static final int AD 3 Anderson Darling test public static final int CM 4 Cram r von Mises test public static final int WG 5 Watson G test public static final int WU 6 Watson U test public static final int MEAN 7 Mean public static final int COR 8 Correlation public static final int NTESTTYPES 9 Total number of test types public static final String TESTNAMES Name of each testType test Could be used for printing the test results for example public static boolean activeTests The set of EDF tests that are to be performed when calling the methods activeTests formatActiveTests etc By default this set contains KSP KSM and AD Note MEAN and COR are always excluded from this set of active tests public static void tests DoubleArrayList sortedData double sVal Computes all EDF test statistics enumerated above except COR to compare the empir ical distribution of U U v 1 with the uniform distribution assuming that these sorted observations are in sortedData If N gt 1 returns sVal with the values of the KS stati
8. SSJ User s Guide Package gof Goodness of fit test Statistics Version December 17 2014 This package provides facilities for performing and reporting different types of univariate goodness of fit statistical tests December 17 2014 CONTENTS 1 Contents bape Gee ae es ee he ee ee a ae ee es a ee A R F Dist KernelDensity ici ak ee ok wR RA See Re AR Ge eH E INEA AER AAA AA PRA ade 14 ND Oo A N December 17 2014 CONTENTS 2 Overview This package contains tools for performing univariate goodness of fit GOF statistical tests Methods for computing or approximating the distribution function F x of certain GOF test statistics as well as their complementary distribution function F x 1 F a are implemented in classes of package probdist Tools for computing the GOF test statistics and the corresponding p values and for formating the results are provided in classes Gof Stat and GofFormat We are concerned here with GOF test statistics for testing the hypothesis Ho that a sample of N observations X Xj comes from a given univariate probability distribution F We consider tests such as those of Kolmogorov Smirnov Anderson Darling Cr mer von Mises etc These test statistics generally measure in different ways the distance between a continuous distribution function F and the empirical distribution function EDF Fy of X1 Xy They are also called EDF test statistics The observations X are usually
9. a data set A pair of observations that are close to each other is transformed into an observation close to zero A data set with unusually clustered observations is thus transformed to a data set with an accumulation of observations near zero which is easily detected by the Anderson Darling GOF test public static void powerRatios DoubleArrayList sortedData Applies the power ratios transformation W described in section 8 4 of Stephens 15 Let U be the n observations contained into sortedData Assumes that U contains n real numbers Uco U n 1 from the interval 0 1 already sorted in increasing order and computes the transformations m CoUa e 1 0 n 1 with Un 1 These U are sorted in increasing order and put back in U 1 n If the Ui are i i d U 0 1 sorted by increasing order then the Uf are also i i d U 0 1 This transformation is useful to detect clustering as explained in iterateSpacings except that here a pair of observations close to each other is transformed into an observation close to 1 An accumulation of observations near 1 is also easily detected by the Anderson Darling GOF test December 17 2014 GofStat 9 Partitions for the chi square tests public static class OutcomeCategoriesChi2 This class helps managing the partitions of possible outcomes into categories for applying chi square tests It permits one to automatically regroup categories to make sure that the expected number of observati
10. category i in the nbExp array public OutcomeCategoriesChi2 double nbExp Constructs an OutcomeCategoriesChi2 object using the array nbExp for the number of expected observations in each category The smin and smax fields are set to 0 and n 1 respectively where n is the length of array nbExp The loc field is set such that loc i i for each i The field nbCategories is set to n public OutcomeCategoriesChi2 double nbExp int smin int smax Constructs an OutcomeCategoriesChi2 object using the given nbExp expected observa tions array Only the expected numbers from the smin to smax inclusive indices will be considered valid The loc field is set such that loc il i for each i in the interval smin smax All loc i for i lt smin are set to smin and all loc i fori gt smax are set to smax The field nbCategories is set to smax smin 1 December 17 2014 GofStat 10 public OutcomeCategoriesChi2 double nbExp int loc int smin int smax int nbCat Constructs an OutcomeCategoriesChi2 object The field nbCategories is set to nbCat public void regroupCategories double minExp Regroup categories as explained earlier so that the expected number of observations in each category is at least minExp We usually choose minExp 10 public String toString Provides a report on the categories Computing EDF test statistics public static double chi2 double nbExp int count int smin int smax Computes a
11. e minExp Computes the chi square statistic for a continuous distribution Here the equiprobable case can be used Assuming that data contains observations coming from the uniform distribution the 0 1 interval is divided into 1 p subintervals where p minExp n n being the sample size i e the number of observations stored in data For each subinterval the method counts the number of contained observations and the chi square statistic is computed using chi2Equal We usually choose minExp 10 public static double chi2Equal DoubleArrayList data Equivalent to chi2Equal data 10 public static int scan DoubleArrayList sortedData double d Computes and returns the scan statistic S d defined in 6 Let U be the n observations contained into sortedData The n observations in U 0 n 1 must be real numbers in the interval 0 1 sorted in increasing order See FBar scan for the distribution function of Sn d public static double cramerVonMises DoubleArrayList sortedData Computes and returns the Cram r von Mises statistic W2 see 4 13 14 defined by 1 3 0 5 Y 2 j 0 assuming that sortedData contains U g U n 1 sorted in increasing order public static double watsonG DoubleArrayList sortedData Computes and returns the Watson statistic Gn see 3 defined by Gn yn max j 1 n Ug Un 1 2 11 0 lt j lt n 1 where Un is the average of the observations Uj assuming that sortedData con
12. es of all these tests A C version of this class is actually used extensively in the package TestU01 which applies statistical tests to random number generators 9 The class also provides tools to plot an empirical or theoretical distribution function by creating a data file that contains a graphic plot in a format compatible with a given software December 17 2014 4 FDist This class provides methods to compute or approximate the distribution functions of special types of goodness of fit test statistics package umontreal iro lecuyer gof public class FDist public static double kolmogorovSmirnovPlusJumpOne int N double a double x Similar to KolmogorovSmirnovPlusDist but for the case where the distribution function F has a jump of size a at a given point xp is zero at the left of xy and is continuous at the right of xg The Kolmogorov Smirnov statistic is defined in that case as Dia sup En P7 u E u max j N F V 3 a lt u lt 1 LI aN lt j lt N where V 1 Vw are the observations sorted by increasing order The method returns an approximation of P D a lt z computed via N 1 a i 1 N i N 1 1 2 CO 4 1 5 2 4 N at a CEDEIRA The current implementation uses formula when N x a lt 6 5 and a lt 0 5 and uses when Nx gt 6 5 or x a gt 0 5 Restriction 0 lt a lt 1 P Dy a lt a I E 8 public static double scan int N double d int m Returns F m t
13. et 0 in sVal MEAN and 1 dist cdf data get 0 in sVal KSP pVal KSP and pVal MEAN public static String formatActiveTests int n double sVal double pVal Gets the p values of the active EDF test statistics which are in activeTests It is assumed that the values of these statistics and their p values are already computed in sVal and pVal and that the sample size is n These statistics and p values are formated using formatp2 for each one If n 1 prints only pVal KSP using formatp1 public static String iterSpacingsTests DoubleArrayList sortedData int k boolean printval boolean graph PrintWriter f Repeats the following k times Applies the GofStat iterateSpacings transformation to the U o U w 1 assuming that these observations are in sortedData then computes the EDF test statistics and calls activeTests after each transformation The function returns the original array sortedData the transformations are applied on a copy of sortedData If printval true stores all the values into the returned String after each iteration If graph true calls graphDistUnif after each iteration to print to stream f the data for plotting the distribution function of the Uj public static String iterPowRatioTests DoubleArrayList sortedData int k boolean printval boolean graph PrintWriter f Similar to iterSpacingsTests but with the GofStat powerRatios transformation December 17 2014 REFERENCES 19 References
14. for i 0 1 m and formats these points into a String in a format suitable for the software specified by graphSoft NOTE see also the more recent class ContinuousDistChart public static String drawDensity ContinuousDistribution dist double a double b int m String desc Formats data to plot the graph of the density f x over the interval a b and returns the result as a String The method dist density x returns the value of f x at x The String desc gives a short caption for the graphic plot The method computes the m 1 points xi f x where x a i b a m for i 0 1 m and formats these points into a String in a format suitable for the software specified by graphSoft NOTE see also the more recent class ContinuousDistChart public static String graphDistUnif DoubleArrayList data String desc Formats data to plot the empirical distribution of U 1 U yv which are assumed to be in data 0 N 1 and to compare it with the uniform distribution The Ug must be sorted The two endpoints 0 0 and 1 1 are always included in the plot The string desc gives a short caption for the graphic plot The data is printed in a format suitable for the software specified by graphSoft NOTE see also the more recent class EmpiricalChart Computing and printing p values for EDF test statistics public static double EPSILONP 1 0E 15 Environment variable used in formatp0 to determine which p values are too close to 0 or 1
15. formatChi2 int k int d double chi2 Computes the p value of the chi square statistic chi2 for a test with k intervals Uses d decimal digits of precision in the calculations The result of the test is returned as a string The p value is computed using pDisc public static String formatKS int n double dp double dm double d Computes the p values of the three Kolmogorov Smirnov statistics De Dy and Dy whose values are in dp dm d respectively assuming a sample of size n Then formats these statistics and their p values using formatp2 for each one public static String formatKS DoubleArrayList data ContinuousDistribution dist Computes the KS test statistics to compare the empirical distribution of the observations in data with the theoretical distribution dist and formats the results See also method GofStat kolmogorovSmirnov double ContinuousDistribution double double public static String formatKSJumpOne int n double a double dp Similar to formatKS but for the KS statistic D a defined in 3 Writes a header computes the p value and calls formatp2 public static String formatKSJumpOne DoubleArrayList data ContinuousDistribution dist double a Similar to formatKS but for D a defined in 3 Applying several tests at once and printing results Higher level tools for applying several EDF goodness of fit tests simultaneously are of fered here The environment variable activeTests specifies which
16. he distribution function of the scan statistic with parameters N and d evaluated at m For a description of this statistic and its distribution see scan which computes its complementary distribution F m 1 F m 1 December 17 2014 5 FBar This class is similar to FDist except that it provides static methods to compute or approximate the complementary distribution function of X which we define as F z P X gt z instead of F x P X lt a Note that with our definition of F one has F x 1 F x for continuous distributions and F x 1 F x 1 for discrete distributions over the integers package umontreal iro lecuyer gof public class FBar public static double scan int n double d int m Return P Sy d gt m where Sy d is the scan statistic see 5 6 and scan defined as Sy d sup nly y al 6 O lt y lt i d where d is a constant in 0 1 nly y d is the number of observations falling inside the interval y y d from a sample of N i i d U 0 1 random variables One has see I P Sy d gt m Q Z n 7 exp 0 1 2 2 1 D 0x OK Jin 8 where is the standard normal distribution function a taa d Vi m TT VN For d lt 1 2 is exact for m gt N 2 but only an approximation otherwise The approx imation is good when Nd is large or when d gt 0 3 and N gt 50 In other cases this implementation sometimes use the approximation pr
17. in sortedData Returns the array D D3 public static double pDisc double pL double pR Computes a variant of the p value p whenever a test statistic has a discrete probability distribution This p value is defined as follows py PY lt y pr P Y gt y PR if pr lt pL p 1 pz if pr gt py and pz lt 0 5 0 5 otherwise The function takes p and pr as input and returns p December 17 2014 14 GofFormat This class contains methods used to format results of GOF test statistics or to apply a series of tests simultaneously and format the results It is in fact a translation from C to Java of a set of functions that were specially written for the implementation of TestU0l a software package for testing uniform random number generators 9 Strictly speaking applying several tests simultaneously makes the p values invalid in the sense that the probability of having at least one p value less than 0 01 say is larger than 0 01 One must therefore be careful with the interpretation of these p values one could use e g the Bonferroni inequality 8 Applying simultaneous tests is convenient in some situations such as in screening experiments for detecting statistical deficiencies in random number generators In that context rejection of the null hypothesis typically occurs with extremely small p values e g less than 10 1 and the interpretation is quite obvious in this case The class also provides tools to p
18. le a double b Same as method diff IntArrayList IntArrayList int int int int but for the con tinuous case December 17 2014 GofStat 8 public static void iterateSpacings DoubleArrayList data DoubleArrayList spacings Applies one iteration of the iterated spacings transformation 7 15 Let U be the n obser vations contained into data and let S be the spacings contained into spacings Assumes that S 0 n contains the spacings between n real numbers Up U 1 in the interval 0 1 These spacings are defined by Si Ua Uti 1 1l lt i lt n where Ug 0 Um 1 1 and Uy U n 1 are the U sorted in increasing order These spacings may have been obtained by calling diff This method transforms the spacings into new spacings by a variant of the method described in section 11 of and also by Stephens 15 it sorts So Sn to obtain So lt Sq lt Si lt lt Sim computes the weighted differences So n 1 S o S n Sa Sey S2 n 1 S Sa Sn S n a Sin 1 and computes V So S1 S for 0 lt i lt n It then returns So Sn in S 0 n and V V in V 1 n Under the assumption that the U are i i d U 0 1 the new S can be considered as a new set of spacings having the same distribution as the original spacings and the V are a new sample of i i d U 0 1 random variables sorted by increasing order This transformation is useful to detect clustering in
19. lot an empirical or theoretical distribution function by creating a data file that contains a graphic plot in a format compatible with the software specified by the environment variable graphSoft NOTE see also the more recent package charts Note This class uses the Colt library package umontreal iro lecuyer gof import cern colt list public class GofFormat Plotting distribution functions public static final int GNUPLOT Data file format used for plotting functions with Gnuplot public static final int MATHEMATICA Data file format used for creating graphics with Mathematica public static int graphSoft GNUPLOT Environment variable that selects the type of software to be used for plotting the graphs of functions The data files produced by graphFunc and graphDistUnif will be in a format suitable for this selected software The default value is GNUPLOT To display a graphic in file f using gnuplot for example one can use the command plot f with steps x with lines in gnuplot public static String drawCdf ContinuousDistribution dist double a double b int m String desc Formats data to plot the graph of the distribution function F over the interval a b and returns the result as a String The method dist cdf x returns the value of F at x The December 17 2014 GofFormat 15 String desc gives a short caption for the graphic plot The method computes the m 1 points x F x where x a i b a m
20. nd returns the chi square statistic for the observations o in count smin smax for which the corresponding expected values e are in nbExp smin smax Assuming that i goes from 1 to k where k smax smin 1 is the number of categories the chi square statistic is defined as A 2 0 es yee 9 pie o Under the hypothesis that the e are the correct expectations and if these e are large enough X follows approximately the chi square distribution with k 1 degrees of freedom If some of the e are too small one can use DutcomeCategoriesChi2 to regroup categories public static double chi2 OutcomeCategoriesChi2 cat int count Computes and returns the chi square statistic for the observations o in count for which the corresponding expected values e are in cat This assumes that cat regroupCategories has been called before to regroup categories in order to make sure that the expected numbers in each category are large enough for the chi square test public static double chi2 IntArrayList data DiscreteDistributionInt dist int smin int smax double minExp int numCat Computes and returns the chi square statistic for the observations stored in data assuming that these observations follow the discrete distribution dist For dist we assume that there is one set S a a 1 b 1 b where a lt band a gt 0 for which p s gt 0ifs S and p s 0 otherwise Generally it is not possible to divide the integers in i
21. ntervals satisfying nP ag lt s lt aj nP a lt s lt a2 nP aj 1 lt s lt aj for a discrete distribution where n is the sample size i e the number of observations stored into data To perform a general chi square test the method starts from smin and finds the first non negligible probability p s gt e where e DiscreteDistributionInt EPSILON It uses smax to allocate an array storing the number of expected observations np s for each s gt smin Starting from s smin the np s December 17 2014 GofStat 11 terms are computed and the allocated array grows if required until a negligible probability term is found This gives the number of expected elements for each category where an outcome category corresponds here to an interval in which sample observations could lie The categories are regrouped to have at least minExp observations per category The method then counts the number of samples in each categories and calls chi2 to get the chi square test statistic If numCat is not null the number of categories after regrouping is returned in numCat 0 The number of degrees of freedom is equal to numCat 0 1 We usually choose minExp 10 public static double chi2Equal double nbExp int count int smin int smax Similar to chi2 except that the expected number of observations per category is assumed to be the same for all categories and equal to nbExp public static double chi2Equal DoubleArrayList data doubl
22. ons in each category is large enough To use this facility one must first construct an OutcomeCategoriesChi2 object by passing to the constructor the expected number of observations for each original category Then calling the method regroupCategories will regroup categories in a way that the expected number of observa tions in each category reaches a given threshold minExp Experts in statistics recommend that minExp be always larger than or equal to 5 for the chi square test to be valid Thus minExp 10 is a safe value to use After the call nbExp gives the expected numbers in the new categories and loc i gives the relocation of category i for each i That is loc i j means that category i has been merged with category j because its original expected number was too small and nbExp i has been added to nbExp j and then set to zero In this case all observations that previously belonged to category 1 are redirected to category j The variable nbCategories gives the final number of categories smin contains the new index of the lowest category and smax the new index of the highest category public int nbCategories Total number of categories public int smin Minimum index for valid expected numbers in the array nbExp public int smax Maximum index for valid expected numbers in the array nbExp public double nbExp Expected number of observations for each category public int loc loc il gives the relocation of the
23. oposed by Glaz 5 For more informa tion see 16 The approximation returned by this function is generally good when it is close to 0 but is not very reliable when it exceeds say 0 4 If m lt N 1 d the method returns 1 Else if Nd lt 10 it returns the approximation given by Glaz 5 If Nd gt 10 it computes or and returns the result if it does not exceed 0 4 otherwise it computes the approximation from 5 returns it if it is less than 1 0 and returns 1 0 otherwise The relative error can reach 10 when Nd lt 10 or when the returned value is less than 0 4 For m gt Nd and Nd gt 10 a returned value that exceeds 0 4 should be regarded as unreliable For m 3 the returned values are totally unreliable There may be an error in the original formulae in 5 Restrictions N gt 2 and d lt 1 2 December 17 2014 6 KernelDensity This class provides methods to compute a kernel density estimator from a set of n indi vidual observations Zo Tn 1 and returns its value at m selected points For details on how the kernel density is defined and how to select the kernel and the bandwidth h see the documentation of class KernelDensityGen in package randvar package umontreal iro lecuyer gof import umontreal iro lecuyer probdist x public class KernelDensity Methods public static double computeDensity EmpiricalDist dist ContinuousDistribution kern double h double Y Given the empirical distribution
24. statistic Annals of Mathematical Statistics 32 1118 1124 1961 G Marsaglia A current view of random number generators In L Billard editor Computer Science and Statistics Sixteenth Symposium on the Interface pages 3 10 North Holland Amsterdam 1985 Elsevier Science Publishers T R C Read and N A C Cressie Goodness of Fit Statistics for Discrete Multivariate Data Springer Series in Statistics Springer Verlag New York NY 1988 M A Stephens Use of the Kolmogorov Smirnov Cram r Von Mises and related statistics without extensive tables Journal of the Royal Statistical Society Series B 33 1 115 122 1970 M S Stephens Tests based on EDF statistics In R B D Agostino and M S Stephens editors Goodness of Fit Techniques Marcel Dekker New York and Basel 1986 December 17 2014 REFERENCES 20 15 M S Stephens Tests for the uniform distribution In R B D Agostino and M S Stephens editors Goodness of Fit Techniques pages 331 366 Marcel Dekker New York and Basel 1986 16 S R Wallenstein and N Neff An approximation for the distribution of the scan statistic Statistics in Medicine 6 197 207 1987 17 G S Watson Optimal invariant tests for uniformity In Studies in Probability and Statistics pages 121 127 North Holland Amsterdam 1976
25. stics D Dy and Dy of the Cram r von Mises statistic W Watson s Gy and Ux Anderson Darling s A and the average of the U s respectively If N 1 only puts l sortedData get 0 in sVal KSP Calling this method is more efficient than computing these statistics separately by calling the corresponding methods in GofStat public static void tests DoubleArrayList data ContinuousDistribution dist double sVal The observations V are in data not necessarily sorted and their empirical distribution is compared with the continuous distribution dist If N 1 only puts data get 0 in sVal MEAN and 1 dist cdf data get 0 in sVal KSP public static void activeTests DoubleArrayList sortedData double sVal double pVal Computes the EDF test statistics by calling tests then computes the p values of those that currently belong to activeTests and return these quantities in sVal and pVal respectively December 17 2014 GofFormat 18 Assumes that U o U w 1 are in sortedData and that we want to compare their em pirical distribution with the uniform distribution If N 1 only puts 1 sortedData get 0 in sVal KSP pVal KSP and pVal MEAN public static void activeTests DoubleArrayList data ContinuousDistribution dist double sVal double pVal The observations are in data not necessarily sorted and we want to compare their empirical distribution with the distribution dist If N 1 only puts data g
26. tains the sorted U g U n 1 December 17 2014 GofStat 12 public static double watsonU DoubleArrayList sortedData Computes and returns the Watson statistic U see 4 13 14 defined by W mt Un t f 12 U W2 n Up 1 2 13 where Un is the average of the observations Uj assuming that sortedData contains the sorted U q U m 1 public static double EPSILONAD Num DBL_EPSILON 2 Used by andersonDarling Num DBL_EPSILON is usually 2 public static double andersonDarling DoubleArrayList sortedData Computes and returns the Anderson Darling statistic A see method andersonDarling public static double andersonDarling double sortedData Computes and returns the Anderson Darling statistic A2 see 2 defined by Al no y 2j 1 In 0 2n 1 25 In 1 UGy assuming that sortedData contains U g Uj 1 sorted in increasing order When computing A all observations U are projected on the interval e 1 e for some gt 0 in order to avoid numerical overflow when taking the logarithm of U or 1 U The variable EPSILONAD gives the value of e public static double andersonDarling double data ContinuousDistribution dist Computes the Anderson Darling statistic 42 and the corresponding p value p The n un sorted observations in data are assumed to be independent and to come from the continuous distribution dist Returns the 2 elements array A
27. transformed into U F X which satisfy 0 lt U lt 1 and which follow the U 0 1 distribution under Ho This is called the probability integral transformation Methods for applying this transformation as well as other types of transformations to the observations X or U are provided in GofStat Then the GOF tests are applied to the U sorted by increasing order The corresponding p values are easily computed by calling the appropriate methods in the classes of package probdist If a GOF test statistic Y has a continuous distribution under Ho and takes the value y its right p value is defined as p P Y gt y Ho The test usually rejects Ho if p is deemed too close to 0 for a one sided test or too close to 0 or 1 for a two sided test In the case where Y has a discrete distribution under Ho we distinguish the right p value pr PIY gt y Ho and the left p value pr P Y lt y Ho We then define the p value for a two sided test as PR if pr lt PL p 1 pr if pr gt pr and pz lt 0 5 1 0 5 otherwise Why such a definition Consider for example a Poisson random variable Y with mean 1 under Ho If Y takes the value 0 the right p value is pR P Y gt 0 Ho 1 In the uniform case this would obviously lead to rejecting Ho on the basis that the p value is too close to 1 However P Y 0 Ho 1 e 0 368 so it does not really make sense to reject Ho in this case In fact the left p value here is pr 0 368

SSJ User's Guide Package gof Goodness-of

Contents

Download Pdf Manuals

Related Search

Related Contents