Home

Statistics Toolbox User's Guide

1. 1 xo 2n Example and Plot Suppose the income of a family of four in the United States follows a lognormal distribution with u log 20 000 and o 1 0 Plot the income density x 10 1000 125010 y l ognpdf x log 20000 1 0 plot x y set gca Xtick 0 30000 60000 90000 120000 set gca Xticklabels str2mat 0 30 000 60 000 90 000 120 000 x 10 4 0 1 1 ni 0 30 000 60 000 90 000 120 000 Negative Binomial Distribution Background The geometric distribution is a special case of the negative bino mial distribution also called the Pascal distribution The geometric distribu tion models the number of successes before one failure in an independent succession of tests where each test results in success or failure In the negative binomial distribution the number of failures is a parameter of the distribution The parameters are the probability of success p and the number of failures r Mathematical Definition The negative binomial pdf is r p Ges H p a o 109 whereq 1 p y f x 1 31 1 32 Example and Plot x 0 10 y nbinpdf x 3 0 5 plot x y set gca XLim 0 5 10 5 0 2 4 4 0 15 T 0 1 F 0 05 t 4 0 1 L 0 2 4 6 8 10 Normal Distribution Background Thenormal distribution is a two parameter family of curves The first parameter u is the mean The second o is the standard deviation The
2. 1 95 Control Charts espian a kadai baw eee 1 95 Xbar Charts lt sviss eye eevee w ani cain dee net 1 95 S Charts etapan pend det mew es Sastre eto 1 96 EWMA Charts 0 00 e cette 1 97 Capability Studies 0 0 0 E a a a a A 1 98 iv Contents Design of Experiments DOE 00000ee 1 100 Full Factorial Designs 0000 e eee eee 1 101 Fractional Factorial Designs 0002 eee eae 1 102 D optimal Designs 00 00 cee 1 103 Generating D optimal Designs 0 4 1 103 Augmenting D Optimal Designs 1 106 Design of Experiments with Known but Uncontrolled Inputs 00 00 1 108 DOMOS ise se seetestese Boe esse ta pe fe desb ean ae py dieses wy ME ep hashes yee eS 1 109 Thedisttool Demo 0 1 109 Therandtool Demo 2 2 cee 1 110 The polytool Demo 1 2 2 ects 1 111 Thersmdemo Demo 0 000 cette 1 116 PA E areata de aks at dom t get ait TATE mitigate nc a gta eda 1 117 Patt 2 akuin ofan eee bie olde ie ee belie et 1 118 References 22 26 he bebe eo ete bee eee ebb aees 1 119 Reference Before You Begin Before You Begin This introduction describes how to begin using the Statistics Toolbox It explains how to use this guide and points you to additional books for toolbox installation information What is the Statistics Toolbox The Statistics Toolbox is a collection of tools buil
3. Suppose we want the D optimal design for fitting this model with nine runs settings cordexch 2 9 q settings a 2 t poro4g SO OF FF FF FS We can plot the columns of settings against each other to get a better picture of the design h plot settings 1 settings 2 set gca Xtick 1 0 1 set gca Ytick 1 0 1 set h Markersize 20 1 0 10 e e 1 0 1 For a simple example using the row exchange algorithm consider the interac tion model with two inputs The model form is y Bo B x B2x2 Pi2X1X2 1 104 Suppose we want the D optimal design for fitting this model with four runs settings X rowexch 2 4 i settings 1 z 1 x 1 X 1 2 1 S 1 z 1 1 The settings matrix shows how tovary theinputs fromrun torun Thex matrix is the design matrix for fitting the above regression model The first column of X is for fitting the constant term The last column is the element wise product of the 2nd and 3rd columns The associated plot is simple but elegant h plot settings 1 settings 2 set gca Xtick 1 0 1 set gca Ytick 1 0 1 set h Markersize 20 10 o OF 10 1 0 1 1 105 Augmenting D Optimal Designs In practice experimentation is an iterative process We often want to add runs to a completed experiment to learn more about our system The function daugment allows y
4. 0 set gca YTicklabels categories Thesecommands generate the plot below Notethat thereis substantially more variability in the ratings of the arts and housing than in the ratings of crime and climate economics H HH recreation HHE arts H HHH H education Wt H transportation Column Number crime itt health HHIH housing H HHH climate 3 Values Oo f N x 10 1 79 Ordinarily you might also graph pairs of the original variables but there are 36 two variable plots Maybe Principal Components Analysis can reduce the number of variables we need to consider Sometimes it makes sense to compute principal components for raw data This is appropriate when all the variables are in the same units Standardizing the data is reasonable when the variables are in different units or when the vari ance of the different columns is substantial as in this case You can standardize the data by dividing each column by its standard devia tion stdr std ratings sr ratings stdr ones 329 1 Now we are ready to find the principal components pcs newdata variances t2 princomp sr The Principal Components First O utput The first output of pri ncomp pcs is the nine principal components These are the linear combinations of the original variables that generate the new vari ables Let s look a
5. 500 F 400 F 300 F 200 100 f polyfit polyval refline refline Purpose Syntax Description Example See Also Add a reference line to the current axes refline slope intercept refline sl ope h refline slope intercept refline refline slope intercept adds a reference line with the givens ope and intercept tothe current axes refline slope wheres ope is atwo element vector adds the line y SLOPE 2 SLOPE 1 x to the figure h refline slope intercept returns the handle to the line ref line with noinput arguments superimposes the least squares line on each line object in the current figure except LineStyles This behavior is equivalent tol sl ine y 3 2 2 6 3 1 3 4 2 4 2 9 3 0 3 3 3 2 2 1 2 6 plot y refline 0 3 3 5 r r 2 57 Isline polyfit polyval refcurve 2 189 regress Purpose Syntax Description Examples 2 190 Multiple linear regression b regress y X b bint r rint stats regress y X b bint r rint stats regress y X al pha b regress y X returns the least squares fit of y on X regress solves the linear model y XB e e N 0 0 1 for B where e yis an nx vector of observations e X is an nxp matrix of regressors e Bisa px1 vector of parameters and e is an nx1 vector of random disturbances b bint r rint stats regress y X returns an estimate
6. Rayleigh Distribution Background The Rayleigh distribution is a special case of the Weibull distribu tion substituting 2 for the parameter p in the equation below 2 2 bye 2 b b 1 2 y xz p spr e Lo If the velocity of a particlein the xand y directions are two independent normal random variables with zero means and equal variances then the distance the particle travels per unit time is distributed Rayleigh Mathematical Definition The Rayleigh pdf is 2 zX ys f x b x a b Example and Plot x 0 0 01 2 p rayl pdf x 0 5 plot x p 1 5 0 5 F 0 0 05 1 15 2 Parameter Estimation The MLE of the Rayleigh parameter is n 2 x si a 2n 1 36 Student s t Distribution Background Thet distribution is a family of curves depending on a single parameter v the degrees of freedom As v goes to infinity the t distribution converges to the standard normal distribution W S Gossett 1908 discovered the distribution through his work at Guinness brewery At that time Guinness did not allow its staff to publish so Gossett used the pseudonym Student If x ands are the mean and standard deviation of an independent random sample of size n from a normal distribution with mean u and o n then X u t o v 7 v an l1l Mathematical Definition Student s t pdf is y F x v CU a 1 r 3 a v Example and Plot The plot compares thet distribution with
7. logL 203 8216 info 0 0021 0 0022 0 0022 0 0056 J K Patel C H Kapadia and D B Owen Handbook of Statistical Distri bu tions Marcel Dekker 1976 betalike gamlike mle wei bfit 2 237 weibpdf Purpose Syntax Description Examples Reference 2 238 Weibull probability density function pdf Y weibpdf X A B wei bpdf X A B computes the Weibull pdf with parameters A andB at the values in X The arguments X A and8 must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments Parameters A andB are positive The Weibull pdf is b y f x a b abx 463 l0 Some references refer tothe Weibull distribution with a single parameter This corresponds towei bpdf with A The exponential distribution is a special case of the Weibull distribution lambda 1 6 y weibpdf 0 1 0 1 0 6 lambda 1 y 0 9048 1 3406 1 2197 0 8076 0 4104 0 1639 yl exppdf 0 1 0 1 0 6 1 ambda yl 0 9048 1 3406 1 2197 0 8076 0 4104 0 1639 Devroye L Non Uniform Random Variate Generation Springer Verlag New York 1986 weibplot Purpose Syntax Description Example See Also Weibull probability plot wei bp ot X h wei bpl ot X wei bpl ot X displays a Weibull probability plot of the data in X If X isa matrix wei bpl ot displays a plot for each column h wei bplot X returns handles to
8. Purpose Mean and variance for the noncentral chi square distribution Syntax M V ncx2stat NU DELTA Description M V ncx2stat NU DELTA returns the mean and variance of the noncen tral chi square pdf with NU degrees of freedom and noncentrality parameter DELTA e Themeanisv 5 e The variance is 2 v 26 Example mv ncx2stat 4 2 m 16 References J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 130 148 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 50 52 See Also ncx2cdf ncx2inv ncx2pdf ncx2rnd 2 147 nlinfit Purpose Syntax Description Example See Also 2 148 Nonlinear least squares data fitting by the Gauss N ewton method beta r nlinfit X y model beta0 beta nlinfit X y model beta0 returns the coefficients of the nonlinear function describedin model model iS a user supplied function having the form y f f X That is model returns the predicted values of y given initial parameter esti mates B and the independent variable X The matrix X has one column per independent variable The response y is a column vector with the same number of rows as xX beta r nlinfit X y model beta0 returns the fitted coefficients beta the residuals r andthe acobian J for use with nl i ntool to produce erro
9. X ncx2inv P V DELTA returns the inverse of the noncentral chi square cdf with parameters V and DELTA at the probabilities in P The size of X is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs ncx2inv uses Newton s method to converge to the solution ncx2inv 0 01 0 05 0 1 4 2 ans 0 4858 1 1498 1 7066 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 130 148 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 50 52 ncx2cdf ncx2pdf ncx2rnd ncx2stat ncx 2pdf Purpose Syntax Description Example References See Also Noncentral chi square probability density function pdf Y ncx2pdf X V DELTA Y ncx2pdf X V DELTA returns thenoncentral chi square pdf with v degrees of freedom and positive noncentrality parameter DELTA at the values in xX The size of Y is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs Some texts refer to this distribution as the generalized Rayleigh Rayleigh Rice or Rice distribution As thenoncentrality parameter 5 increases the distribution flattens as in the plot xX 0 0 1 10 pl ncx2pdf x 4 2 p chi2pdf x 4 plot x p X pl 0 2 fo 0 15
10. X x1 y yl The variables x andy are observations made with error from a cubic polyno mial The variables x1 andy1 are data points from the true function without error If you do not specify the degree of the polynomial po yt 00 does a linear fit to the data pol ytool x y box for controlling upper confidence polynomial degree PP bound data point predicted value fitted line 95 confidence interval lower confidence bound draggable reference line x value The linear fit is not very good The bulk of the data with x values between zero and two has a steeper slope than the fitted line Thetwo points tothe right are dragging down the estimate of the slope 1 113 Goto the data entry box at the top and type3 for a cubic model Then drag the vertical reference line to the x value of two or type2 in the x axis data entry box This graph shows a much better fit to the data The confidence bounds are closer together indicating that thereis less uncertainty in prediction The data at both ends of the plot tracks the fitted curve The true function in this case is cubic y 44 4 3444x 1 4533x 0 1089x e N 0 0 11 1 114 To superimpose the true function on the plot use the commana plot xl yl fitted polynomial true function The true function is quite close to the fitted polynomial in the region of the data
11. prices 119 118 117 115 115 115 116 122 112 118 121 121 115 120 122 122 116 120 1 73 1 74 118 113 109 120 112 123 119 121 112 109 117 117 113 117 114 120 109 116 109 118 118 125 Suppose it is historically true that the standard deviation of gas prices at gas stations around Massachusetts is four cents a gallon The Z test is a procedure for testing the null hypothesis that the average price of a gallon of gas in J an uary pri ce1 is 1 15 h pvalue ci ztest pricel 100 1 15 0 04 h 0 pvalue 0 8668 1 1340 1 1690 The result of the hypothesis test is the boolean variable h When h 0 you do not reject the null hypothesis The result suggests that 1 15 is reasonable The 95 confidence interval 1 1340 1 1690 neatly brackets 1 15 What about February Try a t test with price2 Now you arenot assuming that you know the standard deviation in price h pvalue ci ttest price2 100 1 15 1 pvalue 4 9517e 04 1 1675 1 2025 With the boolean result h 1 you can reject the null hypothesis at the default significance level 0 05 It looks like 1 15 is not a reasonable estimate of the gasoline price in F eb ruary The low end of the 95 confidence interval is greater than 1 15 The functiont t est 2 allows you to compare the means of the two data samples h sig ci ttest2 pricel price2 5 7845 0 9155 The confidence interval ci above indicates that ga
12. tcdf Purpose Syntax Description Examples Student s t cumulative distribution function cdf P tcdf X V t cdf X V computes Student s t cdf with V degrees of freedom at the values in X The arguments X and V must be the same size except that a scalar argument functions as a constant matrix of the same size of the other argument The parameter V iS a positive integer Thet cdf is CE 4 a ee re ers G S i v The result p is the probability that a single observation from thet distribu tion with v degrees of freedom will fall in the interval co x Suppose 10 samples of Guinness beer have a mean alcohol content of 5 5 by volume and the standard deviation of these samples is 0 5 What is the prob ability that the true alcohol content of Guinness beer is less than 5 t 5 0 5 5 0 5 probability tcdf t 10 1 probability 0 1717 2 211 tinv Purpose Syntax Description Examples 2 212 Inverse of the Student s t cumulative distribution function cdf X tinv P V tinv P V computes the inverse of Student s t cdf with parameter V for the probabilities in P The arguments P andV must be the same size except that a scalar argument functions as a constant matrix of the size of the other argu ment The degrees of freedom V must be a positive integer andP must lie in the interval 0 1 Thet inverse function in terms of thet cdf is x F ply OF x v p
13. v 1 CF a whee p F x v DS ig r 3 t2 2 5 v The result x is the solution of the integral equation of thet cdf with parameter v where you supply the desired probability p What is the 99th percentile of thet distribution for one to six degrees of freedom percentile tinv 0 99 1 6 percentile 31 8205 6 9646 4 5407 3 7469 3 3649 3 1427 tpdf Purpose Syntax Description Examples Student s t probability density function pdf Y tpdf X V tpdf X V computes Student s t pdf with parameter V at the values in X The arguments X and V must be the same size except that a scalar argument func tions as a constant matrix of the same size of the other argument The degrees of freedom V must be a positive integer Student s t pdf is y f x v ie Ja a 1 r 3 Gea v The mode of thet distribution is at x 0 This example shows that the value of the function at the mode is an increasing function of the degrees of freedom tpdf 0 1 6 ans 0 3183 0 3536 0 3676 0 3750 0 3796 0 3827 Thet distribution converges tothe standard normal distribution as the degrees of freedom approaches infinity How good is the approximation for v 30 difference tpdf 2 5 2 5 30 normpdf 2 5 2 5 difference 0 0035 0 0006 0 0042 0 0042 0 0006 0 0035 2 213 trimmean Purpose Syntax Description Examples See Also 2 214 Mean of a sample of data excluding extreme values
14. 0 1 4 0 2 4 0 3 AT LOL 4 0 4 J 0 5 7 LSL 4 0 5 10 15 20 25 30 35 40 Sample Number Capability Studies Before going into full scale production many manufacturers run a pilot study to determine whether their process can actually build parts to the specifica tions demanded by the engineering drawing Using the data from these capability studies with a statistical mode allows us to get a preliminary estimate of the percentage of parts that will fall outside the specifications p Cp Cpk capable mean runout spec p 1 3940e 09 1 98 Cp 2 3950 Cok 1 9812 The result above shows that the probability p 1 3940e 09 of observing an unacceptable runout is extremely low Cp andCpk are two popular capability indices Cp is the ratio of the range of the specifications to six times the estimate of the process standard deviation Cn USL LSL p 66 For a process that has its average value on target a Cp of onetranslates toa little more than one defect per thousand Recently many industries have set a quality goal of one part per million This would correspond to a Cp 1 6 The higher the value of Cp the more capable the process For processes that do not maintain their average on target Cpk is a more descriptive index of process capability Cpk is the ratio of difference between the process mean and the closer specification limit to three times the estimate of the process standard deviat
15. 2 114 mad Purpose Syntax Description Examples See Also Mean absolute deviation MAD of a sample of data y mad X mad X computes the average of the absolute differences between a set of data and the sample mean of that data For vectors mad x returns the mean abso lute deviation of the elements of x For matrices mad X returns the MAD of each column of X The MAD is less efficient than the standard deviation as an estimate of the spread when the data is all from the normal distribution Multiply the MAD by 1 3 to estimate o the second parameter of the normal distribution This example shows a Monte Carlo simulation of the relative efficiency of the MAD tothe sample standard deviation for normal data x normrnd 0 1 100 100 s std x s_ MAD 1 3 mad x efficiency norm s 1 norm s_MAD 1 2 efficiency 0 5972 std range 2 115 mahal Purpose Syntax Description Example See Also 2 116 Mahalanobis distance d mahal Y X mahal Y X computes the Mahalanobis distance of each point row of the matrix Y from the sample in the matrix X The number of columns of Y must equal the number of columns in X but the number of rows may differ The number of rows in X must exceed the number of columns The Mahalanobis distanceis a multivariate measure of the separation of a data set from a point in space It is the criterion minimized in linear discri
16. 2140 1415 9209 n1533 6306 4930 3180 1204 ao O O 2 O 2 Hl m be You can easily calculate the percent of the total variability explained by each principal component 1 85 percent _explained 100 variances sum variances percent_explained 37 8699 13 4886 12 6831 10 2324 8 3698 0062 4783 5338 3378 wos A Scree plot is a pareto plot of the percent variability explained by each prin cipal component pareto percent_ explained xlabel Principal Component ylabel Variance Explained 100 90 80 70 60 50 40 Variance Explained 30 20 LIL Aman 1 2 3 4 5 6 7 Principal Component Wecan seethat the first three principal components explain roughly twothirds of the total variability in the standardized ratings Hotelling s T Fourth Output The last output of the function pri ncomp Hotelling s T2 is a statistical mea sure of the multivariate distance of each observation fromthe center of the data set This is an analytical way to find the most extreme points in the data st2 index sort t2 Sort in ascending order st2 flipud st2 Values in descending order index flipud index Indices in descending order extreme index 1 extreme 213 names extreme ans New York NY It is not surprising that the ratings for New York are the furthest from the average U S town
17. J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 130 148 ncex2cdf ncx2inv ncx2rnd ncx2stat 2 145 ncx2rnd Purpose Syntax Description Example References See Also Random matrices from the noncentral chi square distribution R ncx2rnd V DELTA R ncx2rnd V DELTA m R ncx2rnd V DELTA m n R ncx2rnd V DELTA returns a matrix of random numbers chosen from the non central chisquare distribution with parameters V and DELTA Thesize of R is the common size of V and DELTA if both are matrices If either parameter isa scalar the size of R is the size of the other parameter R ncx2rnd V DELTA m returns a matrix of random numbers with parame ters V and DELTA mis a 1 by 2 vector that contains the row and column dimen sions of R R ncx2rnd V DELTA m n generates random numbers with parameters V and DELTA The scalars m andn are the row and column dimensions of R ncx2rnd 4 2 6 3 ans 6 8552 5 9650 11 2961 bi 2 0 31 4 2640 5 9495 9 1939 6 7162 3 8315 10 3100 4 4828 7 1653 2 1142 1 9826 4 6400 3 8852 5 3999 0 9282 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 130 148 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 50 52 ncx2cdf ncx2inv ncx2pdf ncx2stat ncx 2stat
18. Weibull Probability Plots A Weibull probability plot is a useful graph for assessing whether data comes from a Weibull distribution Many reliability analyses make the assumption that the underlying distribution of the lifetimes is Weibull sothis plot can pro vide some assurance that this assumption is not being violated or provide an early warning of a problem with your assumptions The scale of the y axis is not uniform The y axis values are probabilities and as such go from zero to one The distance between the tick marks on the y axis matches the distance between the quantiles of a Weibull distribution 1 93 1 94 If the data points pluses fall near the line the assumption that the data come from a Weibull distribution is reasonable This example shows a typical Weibull probability plot y weibrnd 2 0 5 100 1 wei bp ot y Weibull Probability Plot 0 99 PA 0 96 Af 0 90 a 0 75 0 50 o X a Probability oOo o ou a oO 0 02 0 01 0 003 Data Statistical Process Control SPC SPC is an omnibus term for a number of methods for assessing and monitoring the quality of manufactured goods These methods are simple which makes them easy to implement even in a production environment Control Charts These graphs were popularized by Walter Shewhart in his work in the 1920s at Western Electric A control chart is a plot of a measurem
19. ans 0 6827 More generally about 68 of the observations from a normal distribution fall within one standard deviation o of the mean u normfit Purpose Syntax Description Example See Also Parameter estimates and confidence intervals for normal data muhat sigmahat muci sigmaci normfit X muhat sigmahat muci sigmaci normfit X alpha muhat sigmahat muci sigmaci normfit X returns estimates muhat and si gmahat of the parameters u and o of the normal distribution given the matrix of data X muci andsi gmaci are 95 confidence intervals muci and sigmaci havetworows and as many columns as the data matrix X Thetop row is the lower bound of the confidence interval and the bottom row is the upper bound muhat sigmahat muci sigmaci normfit X alpha gives estimates and 100 1 a pha percent confidence intervals For example al pha 0 01 gives 99 confidence intervals In this example the data is a two column random normal matrix Both columns have u 10 and c 2 Note that the confidence intervals below contain the true values r normrnd 10 2 100 2 mu si gma muci sigmaci normfit r mu 10 1455 10 0527 sigma 1 9072 2 1256 muci 9 7652 9 6288 10 5258 10 4766 sigmaci 1 6745 1 8663 2 2155 2 4693 betafit binofit expfit gamfit poissfit unifit wei bfit 2 153 norminv Purpose Syntax Description Examples 2 154 Inverse of the normal
20. igr x efficiency norm s 1 norms_IQR 1 2 efficiency 0 3297 std mad range kurtosis Purpose Syntax Description Example See Also Sample kurtosis k kurtosis X k kurtosis X returns the sample kurtosis of X For vectors kurtosis x isthe kurtosis of the elements in the vector x For matrices kurtosis X returns the sample kurtosis for each column of X Kurtosis is a measure of how outlier prone a distribution is The kurtosis of the normal distribution is 3 Distributions that are more outlier prone than the normal distribution have kurtosis greater than 3 distributions that are less outlier prone have kurtosis less than 3 The kurtosis of a distribution is defined as 4 _ E x p SS a oO where E x is the expected value of x Note Some definitions of kurtosis subtract 3 from the computed value so that the normal distribution has kurtosis of 0 Thekurtosis function does not use this convention X randn 5 4 1 1650 1 6961 1 4462 0 3600 0 6268 0 0591 0 7012 0 1356 0 0751 lT ITI 1 2460 1 3493 0 3516 0 2641 0 6390 1 2704 0 6965 0 8717 0 5774 0 9846 k kurtosis X 2 1658 1 2967 1 6378 1 9589 mean mment skewness std var 2 107 leverage Purpose Syntax Description Example Algorithm Reference See Also 2 108 Leverage values for a regression h leverage DATA h everage DATA model h everage DATA finds the leverage
21. load hald b zeros 4 100 kvec 0 01 0 01 1 count 0 for k 0 01 0 01 1 count count 1 b count ridge heat ingredients k end plot kvec b xlabel k ylabel b FontName Symbol 10 r r r r 5 gt a Op 5 IN 10 0 0 2 0 4 0 6 0 8 1 regress stepwise 2 195 rowexch Purpose D Optimal design of experiments row exchange algorithm Syntax settings rowexch nfactors nruns settings X rowexch nfactors nruns settings X rowexch nfactors nruns model Description settings rowexch nfactors nruns generates the factor settings matrix settings for aD Optimal design using a linear additive model with a constant term settings hasnruns rows and nf actors columns settings X rowexch nfactors nruns also generates the associated design matrix X settings X rowexch nfactors nruns model produces a design for fitting a specified regression model Theinput model can be one of these strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms Example This exampleillustrates that the D optimal design for 3 factors in 8 runs using an interactions model is a two level full factorial design s rowexch 3 8 interaction See Also cordexch daugment dcovary fullfact ff2n hadamard 2 196 rsm
22. m moment X order returns the central moment of X specified by the positive integer or der For vectors moment X order returns the central moment of the specified order for the elements of x For matrices moment X order returns central moment of the specified order for each column Note that the central first moment is zero and the second central moment is the variance computed using a divisor of n rather than n 1 where n is the length of the vector x or the number of rows in the matrix X The central moment of order k of a distribution is defined as k m E x 4 where E x is the expected value of x X randn 6 5 X 1 1650 0 0591 1 2460 1 2704 0 0562 0 6268 1 7971 0 6390 0 9846 0 5135 0 0751 0 2641 0 5774 0 0449 0 3967 0 3516 0 8717 0 3600 0 7989 0 7562 0 6965 1 4462 0 1356 0 7652 0 4005 1 6961 0 7012 1 3493 0 8617 1 3414 m moment X 3 0 0282 0 0571 0 1253 0 1460 0 4486 kurtosis mean skewness std var mvnrnd Purpose Syntax Description Example See Also Random matrices from the multivariate normal distribution mvnrnd mu SIGMA cases iT r mvnrnd mu SIGMA cases returns a matrix of random numbers chosen from the multivariate normal distribution with mean vector mu and covari ance matrix SI GMA cases is the number of rowsinr SI GMA is asymmetric positive definite matrix with size equal to the length of mu mu 2 3 sigma 1 1 5
23. m trimmean X percent trimmean X percent calculates the mean of asamplex excluding the highest and lowest percent 2 of the observations The trimmed mean is a robust esti mate of the location of a sample If there are outliers in the data the trimmed mean is a more representative estimate of the center of the body of the data If the data is all from the same probability distribution then the trimmed mean is less efficient than the sample average as an estimator of the location of the data This example shows a Monte Carlo simulation of the relative efficiency of the 10 trimmed mean to the sample average for normal data x normrnd 0 1 100 100 m mean x trim trimmean x 10 sm std m strim std trim efficiency sm strim 2 efficiency 0 9702 mean medi an geomean har mmean trnd Purpose Syntax Description Examples Random numbers from Student s t distribution R trnd V R trnd V m R trnd V m n R trnd V generates random numbers from Student s t distribution with V degrees of freedom The size of R is the size of V R trnd V m generates random numbers from Student s t distribution with v degrees of freedom m is a 1 by 2 vector that contains the row and column dimensions of R R trnd V m n generates random numbers from Student s t distribution with V degrees of freedom The scalars m and n are the row and column dimen sions of R noisy trnd ones 1
24. ncfcdf X NUL NU2 DELTA Description P ncfcdf X NU1 NU2 DELTA returns the noncentral F cdf with numerator degrees of freedom df NU1 denominator df NU2 and positive noncentrality parameter DELTA at the values in X The size of P is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs The noncentral F cdf is F X v3 V gt y e ji 2 7 1 J z 58 5 5 If v X 341 3 j 0 V2 V X where I x a b is the incomplete beta function with parameters a and b Example Compare the noncentral F cdf with 6 10 to the F cdf with the same number of numerator and denominator degrees of freedom 5 and 20 respectively xX 0 01 0 1 10 01 pl ncfcdf x 5 20 10 p fcdf x 5 20 plot x p x pl 1 0 8 0 6 0 4 F 0 2 F 8 10 12 References J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 189 200 2 133 ncfinv Purpose Syntax Description Example References See Also 2 134 Inverse of the noncentral F cumulative distribution function cdf X ncfinv P NU1 NU2 DELTA X ncfinv P NU1 NU2 DELTA returns theinverseof thenoncentral F cdf with numerator degrees of freedom df NU1 denominator df NU2 and positive noncentrality parameter DELTA for the probabilities P The size of X is the commo
25. nctrnd V DELTA m n R nctrnd V DELTA returns a matrix of random numbers chosen from the noncentral T distribution with parameters V and DELTA Thesizeof R isthe common size of V and DELTA if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R nctrnd V DELTA m returns a matrix of random numbers with parameters V and DELTA mis a 1 by 2 vector that contains the row and column dimensions of R R nctrnd V DELTA m n generates random numbers with parameters V and DELTA The scalars m andn are the row and column dimensions of R nctrnd 10 1 5 1 ans 6576 0617 4491 2930 6297 worre J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 201 219 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 147 148 nctcdf nctinv nctpdf nctstat 2 141 nctstat Purpose Syntax Description Example References See Also 2 142 Mean and variance for the noncentral t distribution M V ncetstat NU DELTA M V nctstat NU DELTA returns the mean and variance of the noncentral t pdf with NU degrees of freedom and noncentrality parameter DELTA 8 v 2 2r v 1 2 e The mean is Tw 2 wherev gt 1 I WW 2 ve2pl v 1 2 2 The variance is ee al 58 oa where v gt 2 mv netstat 10 1 m 1
26. 0 2914 0 1618 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 201 219 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 147 148 icdf nctcdf nctpdf nctrnd nctstat 2 139 nctpdf Purpose Syntax Description Example References See Also 2 140 Noncentral T probability density function pdf Y nctpdf X V DELTA Y nctpdf X V DELTA returnsthenoncentral T pdf with v degrees of freedom and noncentrality parameter DELTA at the values in x The size of Y is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs Compare the noncentral T pdf with DELTA 1 tothe T pdf with the same number of degrees of freedom 10 xX 5 0 1 5 pl nctpdf x 10 1 p tpdf x 10 plot x p xX pl 0 4 0 3 F 0 2 F 0 1 F J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 201 219 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 147 148 netcdf nctinv nctrnd nctstat pdf nctrnd Purpose Syntax Description Example References See Also Random matrices from noncentral T distribution R nctrnd V DELTA R nctrnd V DELTA m R
27. 0 4 F 0 2 F 0 e f 5 0 5 Uniform Continuous Distribution Background The uniform distribution also called rectangular has a constant pdf between its two parameters a the minimum and b the maximum The standard uniform distribution a 0 and b 1 is a special case of the beta dis tribution setting both of its parameters to one The uniform distribution is appropriate for representing the distribution of round off errors in values tabulated to a particular number of decimal places Mathematical Definition The uniform cdf is x a p F x a b boa la by Parameter Estimation The sample minimum and maximum are the MLEs of a and b respectively 1 39 Example and Plot The example illustrates the inversion method for generating normal random numbers usingr and and nor mi nv Note that the MATLAB function r andn does not use inversion since it is not efficient for this case u rand 1000 1 x norminv u 0 1 hist x 300 200 100 0 7 4 2 0 2 4 Weibull Distribution Background Waloddi Weibull 1939 offered the distribution that bears his name as an appropriate analytical tool for modeling breaking strength of mate rials Current usage also includes reliability and lifetime modeling The Weibull distribution is more flexible than the exponential for these purposes Tosee why consider the hazard rate function inst
28. 0 7615 0 4664 RMSE R square F P 2 734 0 9725 176 6 1 581e 08 Coefficients and Confidence Intervals The table at the top of the figure shows the regression coefficient and confidence interval for every term in or out of the model The green rows in the table on your monitor represent terms in the model while red rows indicate terms not currently in the model Clicking on a row in this table toggles the state of the corresponding term That is atermin the model green row gets removed turns red and terms out of the model red rows enter the model turn green The coefficient for a term out of the model is the coefficient resulting from adding that term to the current model Additional Diagnostic Statistics There are also several diagnostic statistics at the bottom of the table e RMSE the root mean squared error of the current model e R square the amount of response variability explained by the model e F the overall F statistic for the regression e P the associated significance probability Close Button Shuts down all windows Help Button Activates on line help 1 63 1 64 Stepwise History This plot shows the RMSE and a confidence interval for every model generated in the course of the interactive use of the other windows Recreating a Previous Model Clicking on oneof theselines re creates the current model at that point in the analysis using a new set of windows So you can thus compare the tw
29. 1 87 Statistical Plots 1 88 The Statistics Toolbox adds specialized plots to the extensive graphics capabil ities of MATLAB Box plots are graphs for data sample description They are also useful for graphic comparison of the means of many samples see the discussion of one way ANOVA on page 1 51 Normal probability plots are graphs for determining whether a data sample has normal distribution Quantile quantile plots graphically compare the distributions of two samples Box Plots The graph shows an example of a notched box plot 125 8 120 pa S 115 1107 1 Column Number This plot has several graphic elements e The lower and upper lines of the box are the 25th and 75th percentiles of the sample The distance between the top and bottom of the box is the inter quartile range e Thelinein the middle of the box is the sample median If the median is not centered in the box that is an indication of skewness e The whiskers are lines extending above and below the box They show the extent of the rest of the sample unless there are outliers Assuming no out liers the maximum of the sample is the top of the upper whisker The min imum of the sampleis the bottom of the lower whisker By default an outlier is a value that is morethan 1 5 times the interquartile range away from the top or bottom of the box e The plus sign at the top of the plot is an indication of an outlier in
30. 2 206 symmetric 2 79 T t distribution 1 13 1 37 abul ate 2 207 bl read 2 13 2 208 bl write 2 13 2 210 cdf 2 4 2 211 inv 2 6 2 212 pdf 2 5 2 213 rimmean 2 9 2 214 rnd 2 7 2 215 stat 2 8 2 216 test 2 13 2 217 test2 2 13 2 219 two way ANOVA 1 53 U unbiased 2 204 2 232 idcdf 2 4 2 221 idinv 2 6 2 222 idpdf 2 5 2 223 i dr nd 2 7 2 224 idstat 2 8 2 225 ifcdf 24 2 226 nifinv 2 6 2 227 nifit 2 3 2 228 uniform distribution 1 13 1 39 nifpdf 2 5 2 229 nifrnd 2 7 2 230 nifstat 2 8 2 231 Index V var 2 9 2 232 variance 1 6 1 11 W wei bcdf 2 4 2 234 wei bfi t 2 235 wei binv 2 6 2 236 wei blike 2 237 wei bpdf 2 5 2 238 wei bpl ot 2 10 2 239 wei brnd 2 7 2 240 wei bstat 2 8 2 241 Weibull distribution 1 13 1 40 Weibull probability plots 1 93 Weibull Waloddi 1 40 whiskers 1 88 2 38 X x2f x 2 242 Xbar charts 1 95 xbarpl ot 2 11 2 243 Z ztest 2 13 2 245 l 7
31. Between the two groups of data points the two functions separate but both fall inside the 95 confidence bounds If the cubic polynomial is a good fit it is tempting to try a higher order polyno mial to see if even more precise predictions are possible 1 115 1 116 Since the true function is cubic this amounts to overfitting the data Use the data entry box for degree and type5 for a quintic model The resulting fit again does well predicting the function near the data points But in the region between the data groups the uncertainty of prediction rises dramatically This bulge in the confidence bounds happens because the data really do not contain enough information to estimate the higher order polynomial terms pre cisely so even interpolation using polynomials can be risky in some cases The rsmdemo Demo rsmdemo is an interactive graphic environment that demonstrates design of experiments and surface fitting through the simulation of a chemical reaction The goal of the demo is to find the levels of the reactants needed to maximize the reaction rate There are two parts to the demo 1 Compare data gathered through trial and error with data from a designed experiment 2 Compare response surface polynomial modeling with nonlinear modeling Part 1 Begin the demo by using the sliders in the Reaction Simulator to control the partial pressures of three reactants Hydrogen n Pentane and sopentane Each
32. In each column the expected value of y is one x normrnd 0 1 100 6 y std x y 0 9536 1 0628 1 0860 0 9927 0 9605 1 0254 y std 1 2 1 1 4142 cov var std isa function in MATLAB stepwise Purpose Syntax Description Example Reference See Also Interactive environment for stepwise regression stepwise X y stepwise X y inmodel stepwise X y inmodel alpha stepwise X y fits a regression model of y on the columns of X It displays three figure windows for interactively controlling the stepwise addition and removal of model terms stepwise X y inmodel allows control of the terms in the original regression model The values of vector i nmodel are the indices of the columns of the matrix X toinclude in the initial model stepwise X y inmodel alpha allows control of the length confidence inter vals on the fitted coefficients al pha is the significance for testing each term in the model By default al pha 1 1 0 025 P where p is the number of columnsin X This translates to plotted 95 simultaneous confidence intervals Bonferroni for all the coefficients The least squares coefficient is plotted with a green filled circle A coefficient is not significantly different from zero if its confidence interval crosses the white zero line Significant model terms are plotted using solid lines Terms not significantly different from zero are plotted with dotted lines Click on the confid
33. Rayleigh mean and variance Student s t mean and variance Discrete uniform mean and variance Continuous uniform mean and variance Weibull mean and variance Descriptive Statistics corrcoef COV geomean har mmean i qr kurtosis mad mean medi an moment nanmax nanmean nanmedi an nanmi n nanstd nansum prctile range skewness std tri mmean Var Correlation coefficients in MATLAB Covariance matrix in MATLAB Geometric mean Harmonic mean Interquartile range Sample kurtosis Mean absolute deviation Arithmetic average in MATLAB 50th percentile in MATLAB Central moments of all orders Maximum ignoring missing data Average ignoring missing data Median ignoring missing data Minimum ignoring missing data Standard deviation ignoring missing data Sum ignoring missing data Empirical percentiles of a sample Sample range Sample skewness Standard deviation in MATLAB Trimmed mean Variance 2 10 Statistical Plotting boxplot errorbar fsurfht gl ine gname Isline normpl ot pareto qqpl ot rcoplot refcurve refline surf ht wei bpl ot Box plots Error bar plot Interactive contour plot of a function Interactive line drawing Interactive point labeling Add least squares fit line to plotted data Normal probability plots Pareto charts QuantileQuantile plots Regression case order plot Reference polynomial Reference line Interactiv
34. Stepwise Regression Diagnostics Figure 1 62 Nonlinear Regression Models 2 5 1 65 Mathematical Form 0 00 ccc eee ence een eee 1 65 Nonlinear Modeling Example 00 0 c eee cence 1 65 The Variables 0 0 ccc cette nee 1 66 Fitting the Hougen Watson Model 2 0 5 1 66 Confidence Intervals on the Parameter Estimates 1 68 Confidence Intervals on the Predicted Responses 1 69 An Interactive GUI for Nonlinear Fitting and Prediction 1 69 Hypothesis Tests ij 4 as hae eed eee ed eee eed 1 71 Terminology 0 cece ete eee 1 71 AASSUMPLIONS oc oes fae A ed Beer e a pee 1 72 Example ov eseedten i a h ne ete de ia Bees 1 73 Multivariate Statistics 0 0 0 e eee eee 1 77 Principal Components Analysis 0 00eee eee 1 77 Example es cache ne ato c ey Aer AE 1 78 The Principal Components First Output 1 80 The Component Scores Second Output 1 81 The Component Variances Third Output 1 85 Hotelling s T Fourth Output asana eee eee 1 87 Statistical Plots 0 3206 ena5 teh ea See wy eee 1 88 BOX PlOtSiateead cual nated while oe yuk betta tate 1 88 Normal Probability Plots 0 0 0 cee eee eee eee 1 89 Quantile Quantile Plots 0 0 0 0 c eee 1 91 Weibull Probability Plots 0 00 00 cee eee 1 93 Statistical Process Control SPC
35. University of Minnesota Department of Agricultural Engineering 1975 Poisson S D Recherches sur la Probabilit des J ugements en Matiere Criminelle et en Meti re Civile Pr c d es des Regles G n rales du Calcul des Probabiliiti s Paris Bachelier mprimeur Libraire pour les Mathematiques 1837 Student On the probable error of the mean Biometrika 6 1908 pp 1 25 Weibull W A Statistical Theory of the Strength of Materials Ingeniors Veten skaps Akademiens Handlingar Royal Swedish Institute for Engineering Research Stockholm Sweden No 153 1939 1 119 1 120 Reference 2 2 The Statistics Toolbox provides several categories of functions These catego ries appear in the table below The Statistics Toolbox s Main Categories of Functions probability Probability distribution functions descriptive Descriptive statistics for data samples plots Statistical plots SPC Statistical Process Control linear Fitting linear models to data nonlinear Fitting nonlinear regression models DOE Design of Experiments PCA Principal Components Analysis hypotheses Statistical tests of hypotheses file I O Reads data from writes data to operating system files demos Demonstrations data Data for examples The following pages contain tables of functions from each of these specific areas The first seven tables contain probability distribution functions The remaining tables describe
36. contains the upper bound of the interval ahat bhat ACI BCI unifit X alpha allows control of the confidence level al pha For example ifal pha is 0 01 then ACI andBC are 99 confidence intervals r unifrnd 10 12 100 2 ahat bhat aci bci unifit r ahat 10 0154 10 0060 bhat 11 9989 11 9743 aci 9 955 1 9 9461 10 0154 10 0060 bci 11 9989 11 9743 12 0592 12 0341 betafit binofit expfit gamfit normfit poissfit weibfit unifpdf Purpose Syntax Description Examples Continuous uniform probability density function pdf Y unifpdf X A B uni f pdf X A B computes the continuous uniform pdf with parameters A and B at the values in X The arguments x A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameter B must be greater than A The continuous uniform distribution pdf is y f x a b lj x b a a b The standard uniform distribution has A 0 and8 1 For fixeda and b the uniform pdf is constant x 0 1 0 1 0 6 y unifpdf x y 1 1 1 1 1 1 What if x is not between a and b y unifpdf 1 0 1 y 2 229 unifrnd Purpose Syntax Description Examples 2 230 Random numbers from the continuous uniform distribution R unifrnd A B R unifrnd A B m R unifrnd A B m n R unifrnd A B generates uniform random numbers with parameters
37. density function with n degrees of freedom is the same as the gamma density function with parameters n 2 and 2 probability chi2cdf 5 1 5 probability 0 9747 0 9179 0 8282 0 7127 0 5841 probability chi2cdf 1 5 1 5 probability 0 6827 0 6321 0 6084 0 5940 0 5841 chi2inv Purpose Syntax Description Examples Inverse of the chi square 2 cumulative distribution function cdf X chi 2inv P V chi 2inv P V computes the inverse of they cdf with parameter v for the prob abilitiesin TheargumentsP andV must bethe same size except that a scalar argument functions as a constant matrix of the size of the other argument The degrees of freedom must be a positive integer and P must lie in the interval 0 1 We define the x inverse function in terms of the x cdf x F plv F x v p x V 2 2 172 where p F X v it 2 r v 2 The result x is the solution of the integral equation of the x cdf with param eter v where you supply the desired probability p Find a value that exceeds 95 of the samples from a x distribution with 10 degrees of freedom x chi2inv 0 95 10 X 18 3070 You would observe values greater than 18 3 only 5 of the time by chance 2 47 chi2 pdf Purpose Syntax Description Examples Chi square x2 probability density function pdf Y chi2pdf X V chi 2pdf X V computes the x pdf with parameter v at the values in x The
38. n 1 6 nN iT 0 1 2 1 3 I r3 binornd n 1 n 1 6 2 34 binostat Purpose Mean and variance for the binomial distribution Syntax M V binostat N P Description For the binomial distribution e The mean is np e The variance is npq Examples n logspace 1 5 5 10 100 1000 10000 100000 mv binostat n 1 n 0 9000 0 9900 0 9990 0 9999 1 0000 mv binostat n 1 2 5 50 500 5000 50000 1 0e 04 0 0003 0 0025 0 0250 0 2500 2 5000 2 35 bootstrp Purpose Syntax Description Example 2 36 Bootstrap statistics through resampling of data bootstat bootstrp nboot bootfun dl bootstat bootsam bootstrp bootstrp nboot bootfun dl drawsnboot bootstrap data samples and analyzes them using the function boot fun nboot must bea positive integer bootstrp passes thedatad1 d2 etc tobootfun bootstat bootsam bootstrap returns the bootstrap statistics in bootstat Eachrowofbootstat contains the results of applying bootfun to one bootstrap sample If boot fun returns a matrix then this output is converted to a long vector for storage inboot stat bootsamis a matrix of indices into the rows of the data matrix Correlate the LSAT scores and and law school GPA for 15 students These 15 data points are resampled to create 1000 different datasets and the correlation between the two variables is computed for each dataset load lawdata bootstat bo
39. p 0 0000 0 0039 0 8411 There are three models of cars columns and two factories rows The reason there aresix rows instead of two is that each factory provides three cars of each model for the study The data from the first factory is in the first three rows and the data from the second factory is in the last three rows The standard ANOVA table has columns for the sums of squares degrees of freedom mean squares SS df and F statistics ANOVA Table Source Ss df MS F Columns 53 35 2 26 68 234 2 Rows 1 445 1 1 445 12 69 Interaction 0 04 2 0 02 0 1756 Error 1 367 12 0 1139 Total 56 2 17 You can use the F statistics to do hypotheses tests to find out if the mileage is the same across models factories and model factory pairs after adjusting for the additive effects anova2 returns the p value from these tests The p value for the model effect is zero to four decimal places This is a strong indication that the mileage varies from one model to another An F statistic as extreme as the observed F would occur by chanceless than oncein 10 000 times if the gas mileage were truly equal from model to model The p value for the factory effect is 0 0039 which is also highly significant This indicates that one factory is out performing the other in the gas mileage of the cars it produces The observed p value indicates that an F statistic as extreme as the observed F would occur by chance about four out of 1000 times if the gas mil
40. trials of where the probability of success in any given trial is p If a baseball team plays 162 games in a season and has a 50 50 chance of winning any game then the probability of that team winning more than 100 games in a season is 1 binocdf 100 162 0 5 The result is 0 001 i e 1 0 999 If a team wins 100 or more games ina season this result suggests that it is likely that the team s true probability of winning any game is greater than 0 5 binofit Purpose Syntax Description Example Reference See Also Parameter estimates and confidence intervals for binomial data phat binofit x n phat pci binofit x n phat pci binofit x n al pha binofit x n returns the estimate of the probability of success for the bino mial distribution given the data in the vector x phat pci binofit x n gives maximum likelihood estimate phat and 95 confidence intervals pci phat pci binofit x n al pha gives 100 1 a l pha percent confidence intervals For example al pha 0 01 yields 99 confidence intervals First we generate one binomial sample of 100 elements with a probability of success of 0 6 Then we estimate this probability given the results from the sample The 95 confidence interval pci contains the true value 0 6 r binornd 100 0 6 phat pci binofit r 100 phat 0 5800 0 4771 0 6780 J ohnson Norman L Kotz Samuel amp Kemp Adrienne W
41. 0 000 e eee eee ee 1 30 Negative Binomial Distribution 1 31 Normal Distribution 0 0 cee cece ees 1 32 Poisson Distribution 0 cece ees 1 34 Rayleigh Distribution 0 0 0 0 cece eee 1 36 Student s t Distribution 0 0 cece eee ee 1 37 Noncentral t Distribution sssaaa sararan nrn 1 38 Uniform Continuous Distribution 0 0 1 39 Weibull Distribution 0 0 0 c cece eee ees 1 40 Descriptive Statistics 0 0 c eee eee 1 42 Measures of Central Tendency Location 1 42 Measures of Dispersion 0000 c cece eee ee 1 43 Functions for Data with Missing Values NaNs 1 46 Percentiles and Graphical Descriptions 1 47 The Bootstrap ccc ceperiun es bas eek EA EE A es 1 48 Linear Models 0 0 c cece ete eeeee enn 1 51 One way Analysis of Variance ANOVA 00000 1 51 Two way Analysis of Variance ANOVA 0 55 1 53 Multiple Linear Regression 00 00 ee eee eee eee 1 56 EXA Mpls Tan enn his Go tae Adee GEE A DOLE aan BOLL 1 58 Quadratic Response Surface ModelS 0000005 1 59 An Interactive GUI for Response Surface Fitting ANG PFECICHON fen ite drs Majeed Asa ede saad 1 60 ii Contents Stepwise Regression 0 eee 1 61 Stepwise Regression Interactive GUI 0005 1 61 Stepwise Regression Plot 0 000 cece eee eee 1 62
42. 0 1 2 3 X Quantiles 2 177 random Purpose Syntax Description Examples 2 178 Random numbers from a specified distribution y random name Al A2 A3 m n random isa utility routine allowing you to access all the random number gener ators in the Statistics Toolbox using the name of the distribution as a param eter y random name Al A2 A3 m n returns a matrix of random numbers name is a string containing the name of the distribution A1 A2 and A3 are matrices of distribution parameters Depending on the distribution some of the parameters may not be necessary The arguments containing distribution parameters must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The last two parameters d ande are the size of the matrix y If the distribu tion parameters are matrices then these parameters are optional but they must match the size of the other matrix arguments See second example rn random Normal 0 1 2 4 rn 1 1650 0 0751 0 6965 0 0591 0 6268 0 3516 1 6961 1 7971 rp random Poisson 1 6 1 6 randtool Purpose Syntax Description See Also Interactive random number generation using histograms for display randtool r randtool output Therandtool command sets up a graphic user interface for exploring the effects of changing parameters and sample size on the histogram of random
43. 0837 1 3255 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 201 219 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 147 148 nctcdf nctinv nctpdf nctrnd ncx 2cdf Purpose Syntax Description Example References See Also Noncentral chi square cumulative distribution function cdf P ncx2cdf X V DELTA ncx2cdf X V DELTA returns the noncentral chi square cdf with v degrees of freedom and positive noncentrality parameter DELTA at the values in X The size of P_ is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs Some texts refer to this distribution as the generalized Rayleigh Rayleigh Rice or Rice distribution The noncentral chi square cdf is TOE F x v 8 L ae Prix j 0 2 v 2j lt x x 0 0 1 10 pl ncx2cdf x 4 2 p chi2cdf x 4 plot x p x pl 1 0 8 0 6 0 4 F 0 2 F 0 2 4 6 8 10 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 130 148 cdf ncx2inv ncx2pdf ncx2rnd ncx2stat 2 143 ncx2inv Purpose Syntax Description Algorithm Example References See Also 2 144 Inverse of the noncentral chi square cdf X ncx2inv P V DELTA
44. 1 3h 4 2 iet t j T uae S F k Q 1 Hi t 4 2 Ree h g TPF fe T ij E AREH w Or eo J Q Fon H e ae ee a 1t et H J E Ei amp F H F 2t a ae AT PRE J F F 4 3 F pop 1 4 fi fi L L L fi L fi 4 2 0 2 4 6 8 10 12 14 1st Principal Component Note the outlying points in the upper right corner The function gname is useful for graphically identifying a few points in a plot like this You can call gname with a string matrix containing as many case labels as points in the plot The string matrix names works for labeling points with the city names gname names Move your cursor over the plot and click once near each point at the top right When you finish press the return key Hereis the resulting plot 1 82 4 3L 4 Chicago IL 4 en DC MD VA eis York NY 2F an J 5 g t Boston MA S HE A 1f me H J 3 ie o F F EHER T of P Los Angeles Long Beach CA H E i r a aye fi LT J 4 Ti T i a San Francisco CA Z ah aan A F 4 2 L F T 4 ae 3 F J 4 fi fi fi fi fi 1 1 4 2 0 2 4 6 8 10 12 14 1st Principal Component The labeled cities are the biggest population centers in the U S Perhaps we should consider them
45. 1 1 0 0 4 0 2 0 2 169 poisstat Purpose Syntax Description Examples 2 170 Mean and variance for the Poisson distribution M poisstat LAMBDA M V poisstat LAMBDA M poisstat LAMBDA returns the mean of the Poisson distribution with parameter LAMBDA M and LAMBDA match each other in size M V poisstat LAMBDA also returns the variance of the Poisson distribution For the Poisson distribution e the mean is A e the variance is A Find the mean and variance for the Poisson distribution with A 2 mv poisstat 1 2 3 4 m 1 2 3 4 v 1 2 3 4 polyconf Purpose Syntax Description Examples Polynomial evaluation and confidence interval estimation Y DELTA pol yconf p X S Y DELTA pol yconf p X S al pha Y DELTA polyconf p X uses the optional output S generated by polyfit to give 95 confidence intervals Y DELTA This assumes the errors in the data input topo yf i t are independent normal with constant variance Y DELTA polyconf p X S al pha gives 100 1 a pha confidence inter vals For example al pha 0 1 yields 90 intervals If p is a vector whose elements are the coefficients of a polynomial in descending powers such as those output frompol yfit thenpol yconf p X is the value of the polynomial evaluated at X If X is a matrix or vector the poly nomial is evaluated at each of the elements This example gives pred
46. 1 5 3 r mvnrnd mu sigma 100 plot r 1 r 2 Fo 8 4 t 6 ae HH 4 ta tF ce 2r rae a Fry 0 2 L 1 1 0 1 2 3 4 5 normr nd 2 121 nanmax Purpose Syntax Description Example See Also 2 122 Maximum ignoring NaNs m nanmax a m ndx nanmax a m nanmax a b m nanmax a returns the maximum with Na Ns treated as missing F or vectors nanmax a is the largest non NaN element ina For matrices nanmax A isa row vector containing the maximum non NaN element from each column m ndx nanmax a alSoreturns the indices of the maximum values in vector ndx m nanmax a b returns the larger ofa or b which must match in size m magic 3 m 1 6 8 NaN NaN NaN m NaN 1 6 3 5 NaN 4 NaN 2 nmax maxidx nanmax m nmax 4 5 6 maxi dx 3 2 1 nanmin nanmean nanmedian nanstd nansum nanmean Purpose Syntax Description Example See Also Mean ignoring NaNs y nanmean xX nanmean X the average treating NaNs as missing values For vectors nanmean x isthe mean of thenon NaN elements of x For matrices nanmean X isa row vector containing the mean of the non NaN elements in each column m magi c 3 m 1 6 8 NaN NaN NaN m NaN 1 6 3 5 NaN 4 NaN 2 nmean nanmean m nmean 3 5000 3
47. 2 3 4 5 6 0 1 2 3 n3 0 9299 1 9361 2 9640 4 1246 5 0577 5 9864 2 157 normspec Purpose Syntax Description Example See Also Plot normal density between specification limits p normspec specs mu si gma p h normspec specs mu si gma p normspec specs mu sigma plots the normal density between a lower and upper limit defined by the two elements of the vector specs mu andsi gma are the parameters of the plotted normal distribution p h normspec specs mu sigma returns the probability p of a sample falling between the lower and upper limits h is a handle to the line objects Ifspecs 1 is I nf thereis nolower limit and similarly ifspecs 2 I nf there is no upper limit Suppose a cereal manufacturer produces 10 ounce boxes of corn flakes Vari ability in the process of filling each box with flakes causes a 1 25 ounce stan dard deviation in the true weight of the cereal in each box The average box of cereal has 11 5 ounces of flakes What percentage of boxes will have less than 10 ounces normspec 10 Inf 11 5 1 25 Probability Between Limits is 0 8849 0 4 0 3 F 0 2 Density 0 17 6 8 10 12 14 16 Critical Value capaplot disttool histfit normpdf normstat Purpose Mean and variance for the normal distribution Syntax M V normstat MU SI GMA Description For the normal distribution e the mean is e the variance is o Examples f
48. 20 10 0 tank f 0 3 2 1 hist is a function in MATLAB 2 97 histfit Purpose Syntax Description Example See Also 2 98 Histogram with superimposed normal density histfit data histfit data nbins h histfit data nbins histfit data nbins plots a histogram of the values in the vector dat a using nbi ns bars in the histogram With one input argument nbi ns is set tothe square root of the number of elements in data h histfit data nbins returns a vector of handles tothe plotted lines h 1 is the handle to the histogram h 2 is the handle to the density curve r normrnd 10 1 100 1 histfit r 25 20 hist normfit hougen Purpose Syntax Description Reference See Also H ougen Watson model for reaction kinetics yhat hougen beta X yhat hougen beta x gives the predicted values of the reaction rate yhat as a function of the vector of parameters bet a and the matrix of data X bet a must have 5 elements and X must have three columns hougen isa utility function for rsmdemo The model form is _ By Xp X3 B5 1 B5X B3X gt B4X3 y Bates Douglas and Watts Donald Nonlinear Regression Analysis and Its Applications Wiley 1988 p 271 272 rsmdemo hygecdf Purpose Syntax Description Examples 2 100 H ypergeometric cumulative distribution function cdf P hyg
49. 32 error function 1 32 errorbar 2 10 2 61 estimate 1 113 EWMA charts 1 97 ewmaplot 2 11 2 62 expcdf 2 3 2 64 expfit 2 3 2 65 expi nv 2 5 2 66 exponential distribution 1 13 1 21 exppdf 2 4 2 67 exprnd 2 6 2 68 expstat 2 8 2 69 extrapolated 2 177 F F distribution 1 13 1 23 F statistic 1 58 fedf 2 3 2 70 ff2n 2 12 2 71 file i o 2 2 finv 2 5 2 72 floppy disks 2 100 f pdf 2 4 2 73 fractional factorial designs 1 102 frnd 2 6 2 74 fstat 2 8 2 75 fsurfht 2 10 2 76 full factorial designs 1 101 fullfact 2 12 2 78 G gamcdf 2 3 2 79 gamfit 2 3 2 80 gami nv 2 5 2 81 gaml i ke 2 3 2 82 gamma distribution 1 13 1 25 gampdf 2 4 2 83 gamr nd 2 6 2 84 gamstat 2 8 2 85 gas 2 14 Gaussian 2 97 geocdf 2 3 2 86 geoi nv 2 5 2 87 geomean 2 9 2 88 geometric distribution 1 13 1 28 Index geopdf 2 4 2 89 geornd 2 7 2 90 geostat 2 8 2 91 gline 2 10 2 92 gname 2 10 2 93 Gossett W S 1 37 grpstats 2 95 Guinness beer 1 37 2 211 H hadamard 2 12 hald 2 14 harmmean 2 9 2 96 hat matrix 1 56 hist 2 97 histfit 2 11 2 98 histogram 1 110 hogg 2 14 Hogg R V and Ledolter J 2 18 2 21 Hotelling s T squared 1 87 hougen 2 99 Hougen Watson model 1 65 hygecdf 2 3 2 100 hygeinv 2 6 2 101 hygepdf 2 4 2 102 hygernd 2 7 2 103 hygestat 2 8 2 104 hypergeometric distribution 1 13 1 29 hypotheses 1 24 2 2 hypothesis tests 1 71 l i cdf 2 10
50. 5 ewmapl ot r 0 4 0 01 9 3 10 7 ew maplot 9 8 9 6 9 4 9 2 0 Reference See Also Exponentially Weighted Moving Average EWMA Chart LG xbarplot schart 10 15 Sample Number 20 25 30 USL LSL Montgomery Douglas Introduction to Statistical Quality Control J ohn Wiley amp Sons 1991 p 299 2 63 ex pcdf Purpose Syntax Description Examples 2 64 Exponential cumulative distribution function cdf P expcdf X MU expcdf X MU Computes the exponential cdf with parameter settings MU at the values in X Thearguments X and MU must be the same size except that a scalar argument functions as a constant matrix of the same size of the other argu ment The parameter MU must be positive The exponential cdf is t 1 ei T p F xju lon dt 1 e The result p is the probability that a single observation from an exponential distribution will fall in the interval 0 x The median of the exponential distribution is w log 2 Demonstrate this fact mu 10 10 60 p expcdf log 2 mu mu p 0 5000 0 5000 0 5000 0 5000 0 5000 0 5000 What is the probability that an exponential random variable will be less than or equal to the mean u mu 1 6 X mu expcdf x mu a iT 0 6321 0 6321 0 6321 0 6321 0 6321 0 6321 ex pfit Purpose Syntax Description Example See Also Parameter estimates and confidence in
51. 6 noisy 19 7250 0 3488 0 2843 0 4034 0 4816 2 4190 numbers trnd 1 6 1 6 numbers 1 9500 0 9611 0 9038 0 0754 0 9820 1 0115 numbers trnd 3 2 6 numbers 0 3177 0 0812 0 6627 0 1905 15585 0 0433 0 2536 0 5502 0 8646 0 8060 0 5216 0 0891 2 215 tstat Purpose Mean and variance for the Student s t distribution Syntax M V tstat NU Description For the Student s t distribution with parameter v e The mean is zero for values of v greater than 1 If v is one the mean does not exist e The variance for values of v greater than 2 is 5 Examples The mean and variance for 1 to 30 degrees of freedom mv tstat reshape 1 30 6 5 m NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 v NaN 1 4000 1 1818 1 1176 1 0870 NaN 1 3333 1 1667 TLITI 1 0833 3 0000 1 2857 1 1538 1 1053 1 0800 2 0000 1 2500 1 1429 1 1000 1 0769 1 6667 1 2222 1 1333 1 0952 1 0741 1 5000 1 2000 1 1250 1 0909 1 0714 Note that the variance does not exist for one and two degrees of freedom 2 216 ttest Purpose Syntax Description Example Hypothesis testing for a single sample mean h ttest x m h ttest x m al pha h sig ci ttest x m al pha tail ttest x m performs at test at significance level 0 05 to determine whether a sample from anormal distribution in x could have mean m when the standard deviation is unknown h ttest x m alpha gives control of the significance
52. A and B The size of R is the common size of A and B if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R unifrnd A B m generates uniform random numbers with parameters A andB mis a 1 by 2 vector that contains the row and column dimensions of R R unifrnd A B m n generates uniform random numbers with parameters A andB The scalars m and n are the row and column dimensions of R random unifrnd 0 1 6 random 0 2190 0 0941 2 0366 2 7172 4 6735 2 3010 random unifrnd 0 1 6 1 6 random 0 5194 1 6619 0 1037 0 2138 2 6485 4 0269 random unifrnd 0 1 2 3 random 0 0077 0 0668 0 6868 0 3834 0 4175 0 5890 unifstat Purpose Mean and variance for the continuous uniform distribution Syntax M V unifstat A B Description For the continuous uniform distribution e The mean is azb 2 e The variance is Cm Examples a 1 6 b 2 a mv unifstat a b m 1 5000 3 0000 4 5000 6 0000 7 5000 9 0000 0 0833 0 3333 0 7500 1 3333 2 0833 3 0000 2 231 var Purpose Syntax Description 2 232 Variance of a sample y var xX y var X 1 y var X w var X computes the variance of the data in X For vectors var x is the vari ance of the elements in x For matrices var X is a row vector containing the variance of each column of X var x normalizes by n 1 wheren is the sequence length F or normally distrib uted data
53. Distribution Function Theinverse cumulative distribution function returns critical values for hypoth esis testing given significance probabilities To understand the relationship between a continuous cdf and its inverse function try the following x 3 0 1 3 xnew norminv normedf x 0 1 0 1 1 7 1 8 How does xnew compare with x Conversely try this p 0 1 0 1 0 9 pnew normcdf norminv p 0 1 0 1 How does pnew compare with p Calculating the cdf of values in the domain of a continuous distribution returns probabilities between zero and one Applying the inverse cdf to these proba bilies yields the original values For discrete distributions the relationship between a cdf and its inverse func tion is more complicated It is likely that there is no x value such that the cdf of x yields p In these cases the inverse function returns the first value x such that the cdf of x equals or exceeds p Try this X 0 10 y binoinv binocdf x 10 0 5 10 0 5 How does x compare with y The commands below show the problem with going the other direction for dis crete distributions p 0 1 0 2 0 9 pnew binocdf binoinv p 10 0 5 10 0 5 pnew 0 1719 0 3770 0 6230 0 8281 0 9453 The inverse function is useful in hypothesis testing and production of confi dence intervals Here is the way to get a 99 confidence interval for a normally distributed sample 0 005 0 995 norminv p 0 1
54. EA mv normstat n n n n mv normstat n n n n m 1 2 3 4 5 2 4 6 8 10 3 6 9 12 15 4 8 12 16 20 5 10 15 20 25 yV 1 4 9 16 25 4 16 36 64 100 9 36 81 144 225 16 64 144 256 400 25 100 225 400 625 2 159 pareto Purpose Syntax Description Example See Also 2 160 Pareto charts for Statistical Process Control pareto y pareto y names h pareto pareto y names displays a Pareto chart where the values in the vector y are drawn as bars in descending order Each bar is labeled with the associated value in the string matrix names pareto y labels each bar with the index of the corresponding element in y The line above the bars shows the cumulative percentage pareto y names labels each bar with the row of the string matrix names that corresponds to the plotted element of y h pareto returns a combination of patch and line handles Create a Pareto chart from data measuring the number of manufactured parts rejected for various types of defects defects pits cracks holes dents quantity 5 3 19 25 pareto quantity defects 60 40 F 20 F dents holes pits cracks bar capaplot ewmaplot hist histfit schart xbarplot pcacov Purpose Syntax Description Example Reference See Also Principal Components Analysis using the covariance matrix pc pcacov X pc atent explained pcacov X pc latent expl
55. Finda valuethat should exceed 95 of the samples from an F distribution with 5 degrees of freedom in the numerator and 10 degrees of freedom in the denom inator x finv 0 95 5 10 3 3258 You would observe values greater than 3 3258 only 5 of the time by chance fpdf Purpose Syntax Description Examples F probability density function pdf Y fpdf X V1 V2 f pdf X V1 V2 computes theF pdf with parametersV1 and V2 at the values in X The arguments X V1 and V2 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments The parameters V1 and V2 must both be positive integers and X must lie on the interval 0 lt The probability density function for the F distribution is v 2 v 1 ae 2 V15 2 y f X V1 V gt rota 29 N nK Vi V2 V2 V t V2 zF hgk 2 v2 fpdf 1 6 2 2 iT 0 2500 0 1111 0 0625 0 0400 0 0278 0 0204 N iT fpdf 3 5 10 5 10 0 0689 0 0659 0 0620 0 0577 0 0532 0 0487 2 73 frnd Purpose Syntax Description Examples 2 74 Random numbers from the F distribution R frnd V1 V2 R frnd V1 V2 m R frnd V1 V2 m n R frnd V1 V2 generates random numbers from the F distribution with numerator degrees of freedom V1 and denominator degrees of freedom V2 The size of R is the common size of V1 and V2 if both are matrices If either parameter is a scalar th
56. Maximum likelihood estimation MLE involves calcu lating the value of p that give the highest likelihood given the particular set of data The function bi nofi t returns the MLEs and confidence intervals for the parameters of the binomial distribution Hereis an example using random numbers from the binomial distribution with n 100 and p 0 9 r binornd 100 0 9 88 phat pci binofit r 100 phat 0 8800 pci 0 7998 0 9364 The MLE for the parameter p is 0 8800 compared to the true value of 0 9 The 95 confidence interval for p goes from 0 7998 to 0 9364 which includes the true value Of course in this made up example we know the true value of p 1 17 Example and Plot The following commands generate a plot of the binomial pdf for n 10 and p 1 2 x 0 10 y binopdf x 10 0 5 plot x y 0 25 r i F i i 0 2 i i 0 15 5 0 1 0 05 t n 0 0 2 4 6 8 10 Chi square x2 Distribution Background The distribution is a special case of the gamma distribution where b 2 in the equation for gamma distribution below x 1 o E x lo a bT a y f x a b The X distribution gets special attention because of its importance in normal sampling theory If a set of n observations are normally distributed with vari ance o2 and s is the sample standard deviation then 2 n 2 2 n 1 1 18 The Statistics Toolbox uses the
57. Random number generator e Mean and variance This section discusses each of these functions Probability Density Function pdf The probability density function has a different meaning depending on whether the distribution is discrete or continuous For discrete distributions the pdf is the probability of observing a particular outcome In our videotape example the probability that there is exactly one defect in a given hundred feet of tape is the value of the pdf at 1 Unlike discrete distributions the pdf of a continuous distribution at a value is not the probability of observing that value For continuous distributions the probability of observing any particular value is zero To get probabilities you must integrate the pdf over an interval of interest F or example the probability of the thickness of a videotape being between one and two millimeters is the integral of the appropriate pdf from one to two A pdf has two theoretical properties e The pdf is zero or positive for every possible outcome e Theintegral of a pdf over its entire range of values is one A pdf is not asinglefunction Rather a pdf is a family of functions characterized by one or more parameters Once you choose or estimate the parameters of a pdf you have uniquely specified the function The pdf function call has the same general format for every distribution in the Statistics Toolbox The following commands illustrate how to call the pdf for the normal
58. That is v and vz are the number of independent pieces information used to calculate x and 2 respectively Mathematical Definition The pdf for the F distribution is a j v v 2 e es V2 Vit V2 GH Example and Plot The most common application of the F distribution is in stan dard tests of hypotheses in analysis of variance and regression The plot shows that the F distribution exists on the positive real numbers and is skewed to the right xX 0 0 01 10 y fpdf x 5 3 plot x y 0 8 _ a R 0 6 0 4 H 0 2 0 0 2 4 6 8 10 Noncentral F Distribution Background As with they theF distribution is a special case of the noncentral F distribution The F distribution is the result of taking the ratio of two x random variables each divided by its degrees of freedom If the numerator of the ratio is a noncentral chi square random variable divided by its degrees of freedom the resulting distribution is thenoncentral F The main application of the noncentral F distribution is to calculate the power of a hypothesis test relative to a particular alternative Mathematical Definition Similarly to the noncentral chi square the Statistics Toolbox calculates noncentral F distribution probabilities as a weighted sum of incomplete beta function using Poisson probabilities as the weights 58 v v F X Vz V gt 5 Sre l vX Yi 3 j ol PP ESE ae J pv xl 2 L2
59. Univariate Discrete Distributions Second Edition Wiley 1992 pp 124 130 mle 2 31 binoinv Purpose Syntax Description Examples 2 32 Inverse of the binomial cumulative distribution function cdf X binoinv Y N P binoinv Y N P returns the smallest integer X such that the binomial cdf eval uated at X is equal to or exceeds Y You can think of Y as the probability of observing X successes in N independent trials where is the probability of success in each trial The parameter n must be a positive integer and both P andY must lie on the interval 0 1 Each X is a positive integer less than or equal toN If a baseball team has a 50 50 chance of winning any game what is a reason able range of games this team might win over a season of 162 games We assume that a surprising result is one that occurs by chance once in a decade binoinv 0 05 0 95 162 0 5 ans 71 91 This result means that in 90 of baseball seasons a 500 team should win between 71 and 91 games binopdf Purpose Syntax Description Examples Binomial probability density function pdf Y binopdf X N P bi nopdf X N P computes the binomial pdf with parameters N and at the values in X The arguments X N andP must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments N must be a positive integer andP must lie on the interval 0 1 The
60. X followed by the back slash operator to compute b The QR decomposition is not necessary for com puting b but the matrix R is useful for computing confidence intervals You can plug b back into the model formula to get the predicted y values at the data points y Xb Hy H X X X 2X Statisticians use a hat circumflex over a letter to denote an estimate of a parameter or a prediction from a model The projection matrix H is called the hat matrix because it puts the hat on y The residuals are the difference between the observed and predicted y values r y y I H y The residuals are useful for detecting failures in the model assumptions since they correspond to the errors in the model equation By assumption these errors each have independent normal distributions with mean zero and a con stant variance The residuals however are correlated and have variances that depend on the locations of the data points It is a common practice to scale Studentize the residuals so they all have the same variance In the equation below the scaled residual t has a Student s t distribution with n p degrees of freedom r a i 6 qy 1 hy 2 5 A r i where 64 Ml n p 1 n p I 1 h e t is the scaled residual for the ith data point e rj is the raw residual for the ith data point e n isthe sample size e pis the number of parameters in the model e h is the ith diagonal elem
61. above relationship to calculate confidence intervals for the estimate of the normal parameter o in the function nor mfi t Mathematical Definition The x2 pdf is v 2 2 2 x e y f v _ _ V 2 r v 2 Example and Plot The x distribution is skewed to the right especially for few degrees of freedom v The plot shows the x distribution with four degrees of freedom x 0 0 2 15 y chi2pdf x 4 plot x y 0 2 0 157 N oncentral Chi square Distribution Background They distribution is actually a simple special case of the noncen tral chi square distribution One way to generate random numbers with a x2 distribution with v degrees of freedom is to sum the squares of v standard normal random numbers mean equal to zero What if weallow thenormally distributed quantities to havea mean other than zero The sum of squares of these numbers yields the noncentral chi square distribution The noncentral chi square distribution requires two parameters the degrees of freedom and the noncentrality The noncentrality parameter is the sum of the squared means of the normally distributed quantities 1 19 The noncentral chi square has scientific application in thermodynamics and signal processing The literature in these areas may refer to it as the Ricean or generalized Rayleigh distribution Mathematical Definition There are many equivalent formulae for the noncentral chi squar
62. as a completely separate group If we call gname without arguments it labels each point with its row number 1 83 4 T T E T T T T T 3 234 65 314 213 2 44 4 E 40 44 pR P tte p Q 1 T 4 2 ia Bata 5 by ARE LEF Ean REEF gor ne Fi 179 l Hie te ar a 1 mabe th Ht J aH 270 5 F ue A He Pete 4H 4 2 t F T 4 4 F 4 3 4 4 f f f fi f fi fi f 4 2 0 2 4 6 8 10 12 14 1st Principal Component We can create an index variable containing the row numbers of all the metro politan areas we chose metro 43 65 179 213 234 270 314 names metro ans Boston MA Chicago IL Los Angeles Long Beach CA New York NY Philadelphia PA NJ San Francisco CA Washington DC MD VA Toremove these rows from the ratings matrix rsubset ratings nsubset names nsubset metro rsubset metro size rsubset ans 322 9 To practice repeat the analysis using the variabler subset as the new data matrix andnsubset as the string matrix of labels The Component Variances Third Output The third output vari ances is a vector containing the variance explained by the corresponding column of newdata variances variances 4083
63. be the same size except that a scalar argument functions as a constant matrix of the same size of the other argument The maximum observable value N is a positive integer The discrete uniform cdf is fl p FxN Oa aa NCS The result p is the probability that a single observation from the discrete uniform distribution with maximum N will be a positive integer less than or equal to x The values x do not need to be integers What is the probability of drawing a number 20 or less from a hat with the numbers from 1 to 50 inside probability unidcdf 20 50 probability 0 4000 2 221 unidinv Purpose Syntax Description Examples 2 222 Inverse of the discrete uniform cumulative distribution function X uni dinv P N uni dinv P N returns thesmallest integer X such that the discrete uniform cdf evaluated at X is equal to or exceeds P You can think of P as the probability of drawing a number as large as X out of a hat with the numbers 1 through N inside The argument P must lie on the interval 0 1 and N must bea positive integer Each element of X is a positive integer x unidinv 0 7 20 14 y unidinv 0 7 eps 20 15 A small change in thefirst parameter produces a large jump in output The cdf and its inverse are both step functions The example shows what happens at a step unidpdf Purpose Syntax Description Examples Discrete uniform probability density fun
64. binomial pdf is y f x n p z pa ia 1 n The result y is the probability of observing x successes in n independent trials of where the probability of success in any given trial is p A Quality Assurance inspector tests 200 circuit boards a day If 2 of the boards have defects what is the probability that the inspector will find no defective boards on any given day binopdf 0 200 0 02 ans 0 0176 What is the most likely number of defective boards the inspector will find y binopdf 0 200 200 0 02 x i max y j 2 33 binornd Purpose Random numbers from the binomial distribution Syntax R binornd N P R binornd N P mm R binornd N P mm nn Description R binornd N P generates binomial random numbers with parametersN and P The size of R is the common size of N and P if both are matrices If either parameter is a scalar the size of R is the size of the other R binornd N P mm generates binomial random numbers with parameters N andP mm is a 1 by 2 vector that contains the row and column dimensions of R R binornd N p mm nn generates binomial random numbers with parame ters N andP The scalars mm and nn are the row and column dimensions of R Algorithm The bi nornd function uses the direct method using the definition of the bino mial distribution as a sum of Bernoulli random variables Examples n 10 10 60 rl binornd n 1 n rl 2 1 0 1 1 2 r2 binornd n 1
65. binv P A B wei binv P A B computes the inverse of the Weibull cdf with parameters A and B for the probabilities in P The arguments P A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameters A and B must be positive The inverse of the Weibull cdf is 1 X F pja b Bingo vo 1 P A batch of light bulbs have lifetimes in hours distributed Weibull with param etersa 0 15 andb 0 24 What is the median lifetime of the bulbs life wei binv 0 5 0 15 0 24 lifte 588 4721 What is the 90th percentile life wei binv 0 9 0 15 0 24 life 8 7536e 04 weiblike Purpose Syntax Description Example Reference See Also Weibull negative log likelihood function ogL wei blike params data logL info wei blike params data ogl wei blike params data returns the Weibull log likelihood with parameters params 1 aandparams 2 bgiventhedata X logL info weiblike params data adds Fisher s information matrix info The diagonal elements of NFO are the asymptotic variances of their respective parameters The Weibull negative log likelihood is n logL log J f a b x logf a b x i l i 1 wei bl i ke is a utility function for maximum likelihood estimation Continuing the example for wei bfi t r weibrnd 0 5 0 8 100 1 logL info weiblike 0 4746 0 7832 r
66. calculates the harmonic mean of a sample For vectors harmmean x isthe harmonic mean of the elements in x For matrices har mmean X iS a row vector containing the harmonic means of each column The harmonic mean is i a Xi i l The sample average is greater than or equal to the harmonic mean x exprnd 1 10 6 harmonic harmmean x harmonic 0 3382 0 3200 0 3710 0 0540 0 4936 0 0907 average mean x average 1 3509 1 1583 0 9741 0 5319 1 0088 0 8122 mean medi an geomean trimmean hist Purpose Syntax Description Examples See Also Plot histograms ist y ist y nb ist y x n x hist y SS oa gt so gt ist calculates or plots histograms hist y draws a 10 bin histogram for the data in vector y The bins are equally spaced between the minimum and maximum values in y hist y nb draws a histogram with nb bins hist y x draws a histogram using the bins in the vector x n x hist y n x hist y nb and n x hist y x donot draw graphs but return vectorsn andx containing the frequency counts and the bin locations such that bar x n plots the histogram This is useful in situations where more control is needed over the appearance of a graph for example to combine a histogram into a more elaborate plot statement Generate bell curve histograms from Gaussian data x 2 9 0 1 2 9 y normrnd 0 1 1000 1 hist y x 50 r 40 F 30
67. centered Montgomery Douglas Introduction to Statistical Quality Control J ohn Wiley amp Sons 1991 pp 369 374 capaplot histfit 2 41 capaplot Purpose Syntax Description Example See Also Process capability plot p capaplot data specs p h capaplot data specs capaplot data specs fits the observations in the vector dat a assuming a normal distribution with unknown mean and variance and plots the distribu tion of a new observation T distribution The part of the distribution between the lower and upper bounds contained in the two element vector s pecs is shaded in the plot p h capapl ot data specs returns the probability of the new observation being within specification inp and handles to the plot elements inh Imagine a machined part with specifications requiring a dimension to be within 3 thousandths of an inch of nominal Suppose that the machining process cuts too thick by one thousandth of an inch on average and also has a standard deviation of one thousandth of an inch data normrnd 1 1 30 1 p capaplot data 3 3 p 0 9784 The probability of a new observation being within specs is 97 84 Probability Between Limits is 0 9784 0 4 0 3 0 2 F 0 1 F capable histfit caseread Purpose Syntax Description Example See Also Read casenames from a file names caseread filename names caseread names caseread filenam
68. constant matrix of the same size as the other inputs The lognormal cdf is In t p 1 20 i oJ2n p F X y 0 x 0 0 2 10 y logncdf x 0 1 plot x y grid xlabel x ylabel p 1 0 8 F 0 6 Q 0 4 F 0 2 F Evans Merran Hastings Nicholas and Peacock Brian Statistical Distri bu tions Second Edition Wiley 1993 p 102 105 cdf logninv lognpdf l ognrnd lognstat 2 109 logninv Purpose Inverse of the lognormal cumulative distribution function cdf Syntax X Description X logninv P MU SI GMA logni nv P MU SI GMA computes the inverse lognormal cdf with mean MU and standard deviation SI GMA at the probabilities in P The size of X is the common size of P MU and SI GMA We define the lognormal inverse function in terms of the lognormal cdf x F plp o X F x o p int w h Fomo fe 77 where p F x u 6 o2 0 t dt Example p 0 005 0 01 0 995 crit logninv p 1 0 5 plot p crit xlabel Probability ylabel Critical Value grid 10 r 8 2 6 8 4 2 0 A i 1 i 0 0 2 0 4 0 6 0 8 1 Probability Reference Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 p 102 105 See Also icdf logncdf lognpdf lognrnd lognstat 2 110 lognpdf Purpose Syntax Description Example Reference See Also Lognormal probability de
69. distribution 3 0 1 3 normpdf x 0 1 X f The variablef contains the density of the normal pdf with parameters 0 and 1 at the values in x The first input argument of every pdf is the set of values for which you want to evaluate the density Other arguments contain as many parameters as are necessary to define the distribution uniquely The normal distribution requires two parameters a location parameter the mean u and a scale parameter the standard deviation o Cumulative Distribution Function cdf If f is a probability density function the associated cumulative distribution function F is X F x P X lt x f t dt The cdf of a value x F x is the probability of observing any outcome less than or equal to x A cdf has two theoretical properties e The cdf ranges from 0 to 1 e If y gt x then the cdf of y is greater than or equal to the cdf of x The cdf function call has the same general format for every distribution in the Statistics Toolbox The following commands illustrate how to call the cdf for the normal distribution x 3 0 1 3 p normcdf x 0 1 The variable p contains the probabilities associated with the normal cdf with parameters 0 and 1 at the values in x The first input argument of every cdf is the set of values for which you want to evaluate the probability Other argu ments contain as many parameters as are necessary to define the distribution uniquely Inverse Cumulative
70. distribution models thetotal number of successes in repeated trials from an infinite population under the following conditions e Only two outcomes are possible on each of n trials e The probability of success for each trial is constant e All trials are independent of each other J ames Bernoulli derived the binomial distribution in 1713 Ars Conjectandi Earlier Blaise Pascal had considered the special case where p 1 2 Mathematical Definition The binomial pdf is n 1 y f x n p pra Mia 1 n n n whee a hea P The binomial distribution is discrete The pdf is nonzero for zero and the non negative integers less than n Parameter Estimation Suppose you are collecting data from a widget manufac turing process and you record the number of widgets within specification in each batch of 100 Y ou might be interested in the probability that an individual widget is within specification Parameter estimation is the process of deter mining the parameter p of the binomial distribution that fits this data best in some sense One popular criterion of goodness is to maximize the likelihood function The likelihood has the same form as the binomial pdf above But for the pdf the parameters n and p are known constants and the variable is x The likelihood function reverses the roles of the variables Here the sample values the xs are already observed So they are the fixed constants The variables are the unknown parameters
71. fixed parameters F x a b Cumulative distribution function I a b Indicator function In this example the function takes the value 1 on the closed interval from a to b and is 0 else where pandq p is the probability of some event qis the probability of p so q J p vi Before You Begin Typographical Conventions To Indicate This Guide Uses Example Example code MATLAB output MATLAB strings Function names Mathematical expressions Monospace type Monospace type Quoted italic Monospace type Monospace type Variables in italics Functions operators and constants in stan dard type To assign the value 5 to A enter A 5 MATLAB responds with A model Thecos function finds the cosine of each array element This vector represents the polynomial pHx 2x 3 vii viii Tutorial Probability Distributions Descriptive Statistics Linear Models Nonlinear Regression Models Hypothesis Tests Multivariate Statistics Statistical Plots Statistical Process Control SPC Design of Experiments DOE Demos References 1 5 1 42 1 51 1 65 1 71 1 77 1 88 1 95 1 100 1 109 1 119 1 2 The Statistics Toolbox for use with MATLAB supplies basic statistics capa bility on the level of a first coursein engineering or scientific statistics The sta tistics functions it provides
72. gamlike mle weiblike beta pdf Purpose Syntax Description Examples Beta probability density function pdf Y betapdf X A B bet apdf X A B computes the beta pdf with parameters A and 8B at the values in X The arguments X A and B must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments The parameters A and8 must both be positive and X must lie on the interval 0 1 The probability density function for the beta distribution is 1 a 1 b 1 Ba b 1 x l 0 1 X y f x a b A likdihood function is the pdf viewed as a function of the parameters Maximum likelihood estimators M LEs are the values of the parameters that maximize the likelihood function for a fixed value of x The uniform distribution on 0 1 is a degenerate case of the beta where a landb 1 a 0 5 1 2 4 0 5000 1 0000 2 0000 4 0000 betapdf 0 5 a a ise 1 0 6366 1 0000 1 5000 2 1875 2 27 betarnd Purpose Syntax Description Examples 2 28 Random numbers from the beta distribution R betarnd A B R betarnd A B m R betarnd A B m n R betarnd A B generates beta random numbers with parameters A andB Thesizeof Rk isthe common sizeof A andB if both are matrices If either param eter is a scalar the size of R is the size of the other R betarnd A B m generates beta random numbers with parameters
73. gt Hy tail 1 specifies the alternative Hx lt Hy This example generates 100 normal random numbers with theoretical mean zero and standard deviation one We then generate 100 more normal random numbers with theoretical mean one half and standard deviation one The observed means and standard deviations are different from their theoretical values of course We test the hypothesis that there is no true difference 2 219 ttest2 between the two means Notice that the true difference is only one half of the standard deviation of the individual observations so we are trying to detect a signal that is only one half the size of the inherent noise in the process normrnd 0 1 100 1 normrnd 0 5 1 100 1 h significance ci ttest2 x y X y h 1 significance 0 0017 ci 0 7352 0 1720 The result h 1 means that we can reject the null hypothesis The significance is 0 0017 which means that by chance we would have observed values of t more extreme than the one in this example in only 17 of 10 000 similar experiments A 95 confidence interval on the mean is 0 7352 0 1720 which includes the theoretical and hypothesized difference of 0 5 2 220 unidcdf Purpose Syntax Description Examples Discrete uniform cumulative distribution cdf function P unidcdf X N uni dcdf X N computes the discrete uniform cdf with parameter settings N at the values in X The arguments X and N must
74. hygernd M K N mm nn R hygernd M K N generates hypergeometric random numbers with param etersM K andN ThesizeofR isthecommon size of M K andN if all are matrices If any parameter is a scalar the size of R is the common size of the nonscalar parameters R hygernd M K N mm generates hypergeometric random numbers with parameters M K and N mmisa 1 by 2 vector that contains the row and column dimensions of R R hygernd M K N mm nn generates hypergeometric random numbers with parameters M K and N The scalars mm and nn are the row and column dimen sions of R numbers hygernd 1000 40 50 numbers 1 2 103 hygestat Purpose Syntax Description Examples 2 104 Mean and variance for the hypergeometric distribution MN V hygestat M K N For the hypergeometric distribution e The mean is N M KM KM N e The variance is NMA M MI The hypergeometric distribution approaches the binomial where p K M as M goes to infinity mv hygestat 10 1 4 10 0 3 9 m 0 9000 0 9000 0 9000 0 9000 0 0900 0 7445 0 8035 0 8094 mv binostat 9 0 1 m 0 9000 0 8100 icdf Purpose Syntax Description Examples Inverse of a specified cumulative distribution function icdf X icdf name P Al A2 A3 i cdf is autility routine allowing you to access all the inverse cdfs in the Statis tics Toolbox using the name of the distribution as a parameter icdf na
75. in a multiple regression do not have an important explanatory effect on the response If this assumption is true then it is a convenient simplification to keep only the statistically significant terms in the model Onecommon problemin multiple regression analysis is multicollinearity of the input variables The input variables may be as correlated with each other as they arewith theresponse If this is the case the presence of one input variable in the model may mask the effect of another input Steowise regression used as a canned procedure is a dangerous tool because the resulting model may include different variables depending on the choice of starting model and inclu sion strategy The Statistics Toolbox uses an interactive graphical user interface GUI to provide a more understandable comparison of competing models Y ou can explore the GUI using the Hald 1960 data set H ere are the commands to get started load hald stepwise ingredients heat The Hald data come from a study of the heat of reaction of various cement mix tures There are 4 components in each mixture and the amount of heat pro duced depends on the amount of each ingredient in the mixture Stepwise Regression Interactive GUI The interface consists of three interactively linked figure windows e The Stepwise Regression Plot e The Stepwise Regression Diagnostics Table e The Stepwise History Plot 1 61 1 62 All three windows have hot regions When your
76. likelihood has the same form as the gamma pdf above But for the pdf the parameters are known constants and the variable is x The likelihood function reverses the roles of the variables H ere the sample values the xs are already observed So they are the fixed constants The variables are the unknown parameters Maximum likelihood estimation MLE involves calculating the values of the parameters that give the highest likelihood given the particular set of data The function ga mf i t returns the MLEs and confidence intervals for the param eters of the gamma distribution Hereis an example using random numbers from the gamma distribution with a 10 and b 5 lifetimes gamrnd 10 5 100 1 phat pci gamfit lifetimes phat 10 9821 4 7258 7 4001 3 1543 14 5640 6 2974 Note phat 1 andphat 2 b TheMLE for the parameter a is 10 98 compared to the true value of 10 The 95 confidence interval for a goes from 7 4 to 14 6 which includes the true value Similarly the MLE for the parameter b is 4 7 compared to the true value of 5 The 95 confidence interval for b goes from 3 2 to 6 3 which also includes the true value In our life tests we do not know the true value of a and b soit is nice to havea confidence interval on the parameters to give a range of likely values Example and Plot n the example the gamma pdf is plotted with the solid line The normal pdf has a dashed line type x gaminv 0 005 0 01 0 995 10
77. linear model One way Analysis of Variance ANOVA The purpose of a one way ANOVA is to find out whether data from several groups have a common mean That is to determine whether the groups are actually different in the measured characteristic One way ANOVA is a simple special case of the linear model The one way ANOVA form of the model is i al where e yj is a matrix of observations e a is a matrix whose columns are the group means The dot j notation means that o applies to all rows of the jth column e amp j isa matrix of random disturbances The model posits that the columns of y area constant plus a random distur bance You want to know if the constants are all the same The data below comes from a study of bacteria counts in shipments of milk Hogg and Ledolter 1987 The columns of the matrix hogg represent different 1 51 1 52 shipments The rows are bacteria counts from cartons of milk chosen randomly from each shipment Do some shipments have higher counts than others load hogg p anoval hogg p 1 1971e 04 hogg hogg 24 14 11 7 19 15 7 9 7 24 21 12 i 4 19 27 17 13 7 15 33 14 12 12 10 23 16 18 18 20 The standard ANOVA table has columns for the sums of squares degrees of freedom mean squares SS df and F statistic ANOVA Table Source SS df MS F Columns 803 4 200 7 9 008 Error 557 2 25 22 29 Total 1360 29 You can usetheF statistic to do a hypothesis test to find o
78. missing values For vectors nanst d x isthe standard deviation of the non NaN elements of x For matrices nanstd X is a row vector containing the standard deviations of the non NaN elements in each column of X Example m magic 3 m 1 6 8 NaN NaN NaN m NaN 1 6 3 5 NaN 4 NaN 2 nstd nanstd m nstd 0 7071 2 8284 2 8284 See Also nanmax nanmin nanmean nanmedi an nansum 2 126 nansum Purpose Sum ignoring NaNs Syntax y nansum X Description nansum X the sum treating NaNs as missing values For vectors nansum x is the sum of thenon NaN elements of x For matrices nansum X iS a row vector containing the sum of the non NaN elements in each column of X Example m magi c 3 m 1 6 8 NaN NaN NaN m NaN 1 6 3 5 NaN 4 NaN 2 nsum nansum m nsum See Also nanmax nanmin nanmean nanmedian nanstd 2 127 nbincdf Purpose Syntax Description Example See Also 2 128 Negative binomial cumulative distribution function Y nbincdf X R P Y nbincdf X R P returns the negative binomial cumulative distri bution function with parameters Rk andP at the values in x The size of Y is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs The negative binomial cdf is x y F xi p i Tea o 1 0 i 0 The motivation for the negative binomial is performing successive trials each having a cons
79. models Even though the true model is nonlinear you may find that the polynomial model provides a good fit Because polynomial models are much easier to fit and work with than non linear models a polynomial model is often preferable even when modeling a nonlinear process Keep in mind however that such models are unlikely to be reliable for extrapolating outside the range of the data References Atkinson A C and A N Donev Optimum E xperimental Designs Oxford Sci ence Publications 1992 Bates Douglas and Donald Watts Nonlinear Regression Analysis and Its Applications J ohn Wiley and Sons 1988 pp 271 272 Bernoulli J Ars Conjectandi Basiliea Thurnisius 11 19 Chatterjee S and A S Hadi Influential Observations High Leverage Points and Outliers in Linear Regression Statistical Science 1986 pp 379 416 Efron Bradley amp Robert J Tibshirani An Introduction to the Bootstrap Chapman and Hall New York 1993 Evans M N Hastings and B Peacock Statistical Distributions Second Edi tion J ohn Wiley and Sons 1993 Hald A Statistical Theory with Engineering Applications J ohn Wiley and Sons 1960 p 647 Hogg R V andJ Ledolter Engineering Statistics MacMillan Publishing Company 1987 J ohnson N and S Kotz Distributions in Statistics Continuous Univariate Distributions J ohn Wiley and Sons 1970 Moore J Total Biochemical Oxygen Demand of Dairy Manures Ph D thesis
80. mouse is above one of these regions the pointer changes from an arrow to a circle Clicking on this point initiates some activity in the interface Stepwise Regression Plot This plot shows the regression coefficient and confidence interval for every term in or out of the model The green lines represent terms in the model while red lines indicate that the term is not currently in the model Statistically significant terms are solid lines Dotted lines show that the fitted coefficient is not significantly different from zero Clicking on a linein this plot toggles its state That is a term in the model green line gets removed turns red and terms out of the model red line enter the model turn green The coefficient for a term out of the model is the coefficient resulting from adding that term to the current model Scale Inputs Pressing this button centers and normalizes the columns of the input matrix to have a standard deviation of one Export This pop up menu allows you to export variables from the stepwise function tothe base workspace Close The Close button removes all the figure windows Stepwise Regression Diagnostics Figure This table is a quantitative view of the information in the Stepwise Regression Plot Thetable shows the Hald model with thesecond and third terms removed Confidence Intervals Column Parameter Lower Upper a 1 44 1 02 1 86 2 0 4161 0 1602 0 9924 3 0 41 1 029 0 2086 4 0 614
81. noncentral F where 0 As increases the distribution flattens like the plot in the example Example Compare the noncentral F pdf with 6 10 tothe F pdf with the same number of numerator and denominator degrees of freedom 5 and 20 respectively x 0 01 0 1 10 01 pl ncfpdf x 5 20 10 p fpdf x 5 20 plot x p x pl 0 8 0 67 4 ie 0 4 ure I 0 a 0 2 4 6 8 10 12 References J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 189 200 See Also ncfcdf ncfinv ncfrnd ncfstat 2 135 ncfrnd Purpose Syntax Description Example References See Also 2 136 Random matrices from the noncentral F distribution R ncfrnd NUl NU2 DELTA R ncfrnd NUl NU2 DELTA m R ncfrnd NUl NU2 DELTA m n R ncfrnd NU1 NU2 DELTA returns a matrix of random numbers chosen from the noncentral F distribution with parameters NU1 NU2 and DELTA The size of R is thecommon size of NU1 NU2 and DELTA if all are matrices If any parameter is a scalar the size of R is the size of the other parameters R ncfrnd NU1 NU2 DELTA m returns a matrix of random numbers with parameters NU1 NU2 and DELTA mis a 1 by 2 vector that contains the row and column dimensions of R R ncfrnd NU1 NU2 DELTA m n generates random numbers with parame ters NU1 NU2 and DELTA The scalars m andn are the row and colu
82. of X y data using a linear additive model rstool x y model allows control over the initial regression model model can be one of the following strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms rstool x y model alpha plots 100 1 al pha global confidence interval for predictions as two red curves For example al pha 0 01 gives 99 confi dence intervals rstool displays a vector of plots one for each column of the matrix of inputs x Theresponse variable y isa column vector that matches the number of rows inx rstool x y model alpha xname yname labels the graph using the string matrix xname for the labels to the x axes and the string yname to label the y axis common to all the plots Drag the dotted white reference line and watch the predicted values update simultaneously Alternatively you can get a specific prediction by typing the value of x into an editable text field Use the pop up menu labeled Model to interactively change the model Use the pop up menu labeled Export to move specified variables to the base workspace See Quadratic Response Surface M odels on page 1 59 nlintool schart Purpose Syntax Description Example Chart of standard deviation for Statistical Process Control schart DATA conf schart DATA conf
83. of Bin b a 95 confidence interval for B in the p by 2 vector bi nt The residuals areinr and a 95 confidence interval for each residual is in the n by 2 vector ri nt The vector st ats contains the R statistic along with the F and p values for the regression b bint r rint stats regress y X alpha gives 100 1 al pha confi dence intervals for bi nt andri nt For example al pha 0 2 gives 80 confi dence intervals Suppose the true model is y 10 x e e N 0 0 011 where is the identity matrix regress gt lt 1 y X 10 1 11 1165 12 0627 13 0075 14 0352 14 9303 16 1696 17 0059 18 1797 19 0264 20 0872 b bint b 10 0456 1 0030 bint 9 9165 0 9822 Compareb to 10 1 Notethatbint includes the true model values ones 10 1 DO oO OH DN S amp S WwW PP he normrnd 0 0 1 10 1 regress y X 0 05 Reference Chatterjee S and A S Hadi Influential Observations High Leverage Points and Outliers in Linear Regression Statistical Science 1986 pp 379 416 2 191 regstats Purpose Syntax Description 2 192 Regression diagnostics graphical user interface regstats responses DATA regstats responses DATA model regstats responses DATA generates regression diagnostics for a linear addi tive model with a constant term The dependent variable is the vector responses Values of the independent variables are in the matri
84. of each row point in the matrix DATA for a linear additive regression model h everage DATA model finds the leverage on a regression using a spec ified model type model can be one of these strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms Leverage is a measure of the influence of a given observation on a regression due to its location in the space of the inputs One rule of thumb is to compare the leverage to 2p n where n is the number of observations and p is the number of parameters in the model For the Hald dataset this value is 0 7692 load hald h max leverage ingredients linear h 0 7004 Since 0 7004 lt 0 7692 there are no high leverage points using this rule Q R gr x2fx DATA model leverage sum Q Q Goodall C R 1993 Computation using the QR decomposition Handbook in Statistics Volume 9 Statistical Computing C R Rao ed Amsterdam NL Elsevier North Holland regstats logncdf Purpose Syntax Description Example Reference See Also Lognormal cumulative distribution function P ogncdf X MU SI GMA P ogncdf X MU SI GMA Computes the lognormal cdf with mean MU and stan dard deviation SI GMA at the values in X The size of P is the common size of X MU andS GMA A scalar input functions as a
85. out the price of gas at a small number of randomly chosen stations around the state and compare the average price to 1 15 Of course the average price you get will probably not be exactly 1 15 due to variability in price from one station to the next Suppose your average price was 1 18 Is this three cent difference a result of chance variability or is the original assertion incorrect A hypothesis test can provide an answer Terminology To get started there are some terms to define and assumptions to make Terms Null hypothesis Alternative hypothesis Significance level p value Confidence interval The null hypothesis is the original assertion In this casethe null hypothesis is that the average price of a gallon of gas is 1 15 The notation is Hp u 1 15 There are three possibilities for the alternative hypothesis You might only be interested in the result if gas prices were actually higher In this case the 1 71 1 72 alternative hypothesis is Hy u gt 1 15 The other possibilities are Hy u lt 1 15 and H u 1 15 The significancelevd is related to the degree of certainty you require in order to reject the null hypothesis in favor of the alternative By taking a small sample you cannot be certain about your conclusion So you decide in advance to reject the null hypothesis if the probability of observing your sampled result is less than the significance level For a typical significance level of 5 the nota
86. parameter S GMA must be positive The normal pdf is 2 y fioe e O The likdihood function is the pdf viewed as a function of the parameters Maximum likelihood estimators M LEs are the values of the parameters that maximize the likelihood function for a fixed value of x The standard normal distribution has u 0 and o 1 If xis standard normal then xc u is also normal with mean u and standard deviation o Conversely if y is normal with mean u and standard deviation o then x y u o is standard normal mu 0 0 1 2 y i max normpdf 1 5 mu 1 MLE mu i MLE 1 5000 2 155 normplot Purpose Syntax Description Examples 2 156 Normal probability plot for graphical normality testing nor mpl ot X h nor mpl ot X nor mpl ot X displays anormal probability plot of the data in X For matrix xX nor mpl ot displays a line for each column of x The plot has the sample data displayed with the plot symbol Superimposed on the plot is a line joining the first and third quartiles of each column of x A robust linear fit of the sample order statistics This line is extrapolated out to the ends of the sample to help evaluate the linearity of the data If the data does come from a normal distribution the plot will appear linear Other probability density functions will introduce curvature in the plot h normpl ot X returns a handle to the plotted lines Generate a normal s
87. pe 0 0678 0 6460 0 5673 0 5062 0 6785 0 0200 0 5440 0 4933 0 0290 0 7553 0 4036 0 5156 0 7309 0 1085 0 4684 0 4844 latent 517 7969 67 4964 12 4054 0 2372 J Edward J ackson A User s Guideto Principal Components J ohn Wiley amp Sons Inc 1991 pp 1 25 barttest pcacov pcares qqplot Purpose Syntax Description Examples Quantilequantile plot of two samples qqgpl ot X Y qqpl ot X Y pvec h qqgplot qqplot X Y displays a quantile quantile plot of two samples If the samples do come from the same distribution the plot will be linear For matrix X andY qqp ot displays a separate line for each pair of columns The plotted quantiles are the quantiles of the smaller dataset The plot has the sample data displayed with the plot symbol Superimposed on the plot is alinejoiningthefirst and third quartiles of each distribution this is a robust linear fit of the order statistics of the two samples This line is extrapolated out to the ends of the sample to help evaluate the linearity of the data Useqapl ot X Y pvec tospecify the quantiles in the vector pvec h qgplot X Y pvec returns handles tothe lines inh Generate two normal samples with different means and standard deviations Then make a quantile quantile plot of the two samples x normrnd 0 1 100 1 y normrnd 0 5 2 50 1 qqplot x y 10 2 i amp S Of O gt 5 10 3 2 1
88. practice since all the parts are well within specification S Charts TheS chart is a plot of the standard deviation of a process taken at regular intervals The standard deviation is a measure of the variability of a process So the plot indicates whether there is any systematic change in the process variability Continuing with the piston manufacturing example we can look at the standard deviation of each set of 4 measurements of runout schart runout 0 45 T r 1 UCL 0 45 J 0 35 F J 0 3 F J 0 25 4 0 2 F 4 Standard Deviation 0 15 F J 0 1 F 4 0 05 F J 0 LCL L L i L 1 L 1 0 5 10 15 20 25 30 35 40 Sample Number The average runout is about one ten thousancth of an inch There is no indica tion of nonrandom variability EW MA Charts The EWMA chart is another chart for monitoring the process average It oper ates on slightly different assumptions than the Xbar chart The mathematical model behind the Xbar chart posits that the process mean is actually constant over time and any variation in individual measurements is due entirely to chance 1 97 The EWMA mode is a little looser Here we assume that the mean may be varying in time Hereis an EWMA chart of our runout example Compare this with the plot on page 1 96 ewmap l ot runout 0 5 0 01 spec Exponentially Weighted Moving Average EWMA Chart 0 57 USL 4 0 4 F 4 0 3 F 4 0 27 4 0 1 t 24 UCL J EWMA o
89. samples from the supported probability distributions The M file calls itself recursively using theaction andf lag parameters For general use call randtoo l without parameters To output the current set of random numbers press the Output button The results are stored in the variableans Alternatively the command r randtool output places the sample of random numbers in the vector Fa Tosample repetitively from the same distribution press the Resample button Tochange the distribution function choose from the pop up menu of functions at the top of the figure To change the parameter settings move the sliders or type a value in the edit box under the name of the parameter To changethe limits of a parameter type a value in the edit box at the top or bottom of the parameter slider To change the sample size type a number in the Sample Size edit box When you are done press the Close button For an extensive discussion see The disttool Demo on page 1 109 disttool 2 179 range Purpose Syntax Description Example See Also Sample range y range X range X returns the difference between the maximum and the minimum of a sample For vectors range x is the range of the elements For matrices range X iS arow vector containing the range of each column of xX The range is an easily calculated estimate of the spread of a sample Outliers have an undue influence on this statistic which makes it an u
90. the plotted lines The purpose of a Weibull probability plot is to graphically assess whether the data in X could come from a Weibull distribution If the data are Weibull the plot will be linear Other distribution types may introduce curvature in the plot r wei brnd 1 2 1 5 50 1 wei bp ot r Weibull Probability Plot oito ctii ag ee 0 05 pati oiii hae Probability OOA a Ee A p e a S y i e ie EETAS EE ee ee eae ereere 10 10 Data nor mpl ot 2 239 weibrnd Purpose Syntax Description Examples Reference Random numbers from the Weibull distribution R wei brnd A B R wei brnd A B m R wei brnd A B m n R wei brnd A B generates Weibull random numbers with parameters A and B The size of R is the common size of A and B if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R wei brnd A B m generates Weibull random numbers with parameters A andB mis a 1 by 2 vector that contains the row and column dimensions of R R wei brnd A B m n generates Weibull random numbers with parameters A andB The scalars m and n are the row and column dimensions of R Devroye refers to the Weibull distribution with a single parameter this is wei brnd with A 1 nl weibrnd 0 5 0 5 2 0 5 0 5 2 nl 0 0093 1 5189 0 8308 0 7541 n2 weibrnd 1 2 1 2 1 6 n2 29 7822 0 9359 2 1477 12 6402 0 005
91. this makes var x the minimum variance unbiased estimator MVUE of o2 the second parameter var x 1 normalizes by n and yields the second moment of the sample data about its mean moment of inertia var X w Computes the variance using the vector of weights w The number of elements in w must equal the number of rows in the matrix X For vector x w and x must match in length Each element of w must be positive var Supports both common definitions of variance Let SS be the sum of the squared deviations of the elements of a vector x from their mean Then var x SS n 1 the MVUE and var x 1 SS n the maximum likelihood estimator MLE of 2 var Examples x lt 1 1 w 1 3 vl var x vl 2 v2 var x 1 v2 1 v3 var Xx w v3 0 7500 See Also cov std 2 233 w eibcdf Purpose Syntax Description Examples 2 234 Weibull cumulative distribution function cdf P wei bcdf X A B wei bcdf X A B computes the Weibull cdf with parameters A and B at the values in X The arguments X A and B8 must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments Parameters A andB are positive The Weibull cdf is at X p F x a b J abt te dt 1 e b x 0 20 X What is the probability that a value from a Weibull distribution with parame tersa 0 15andb 0 24 is less than 500 probability weib
92. time you click the Run button the levels for the reactants and results of the run are entered in the Trial and Error Data window Based on the results of previous runs you can change the levels of the reac tants toincrease the reaction rate Theresults are determined using an under lying model that takes into account the noisein the process so even if you keep all of the levels the same the results will vary from run torun You areallotted a budget of 13 runs When you have completed the runs you can use the Plot menu on theTrial and Error Data window to plot the relationships between the reactants and the reaction rate or click the Analyze button When you click Analyze r s mdemo calls therstool function which you can then use to try to optimize the results Next perform another set of 13 runs this time from a designed experiment In the Experimental Design Data window click the Do Experiment button r s m demo calls thecordexch function to generate a D optimal design and then for each run computes the reaction rate Now use the Plot menu on the Experimental Design Data window to plot the relationships between the levels of the reactants and the reaction rate or click the Response Surface button to call rstoo to find the optimal levels of the reactants Compare the analysis results for the two sets of data It is likely though not certain that you ll find some or all of these differences e You can fit a full quadratic
93. v 5 solid line to the shorter tailed standard normal distribution dashed line x 5 0 1 5 y tpdf x 5 z normpdf x 0 1 plot x y x z 0 4 0 37 0 2 0 1 0 5 1 37 1 38 Noncentral t Distribution Background Thenoncentral t distribution is a generalization of the familiar Student s t distribution If xand s are the mean and standard deviation of an independent random sample of size n from a normal distribution with mean u and o n then X u t f v v n l1l Suppose that the mean of the normal distribution is not u Then the ratio has the noncentral t distribution The noncentrality parameter is the difference between the sample mean and u The noncentral t distribution allows us to determine the probability that we would detect a difference between x and in at test This probability is the power of the test As x u increases the power of a test also increases Mathematical Definition The most general representation of the noncentral t dis tribution is quite complicated J onnson and Kotz 1970 give a formula for the probability that a noncentral t variate falls in the range t t Pr t tI v 8 MENEAR r t lt x lt t v 8 L ji e ES 2 where I x a b is the incomplete beta function with parameters a and b Example and Plot x 5 0 1 5 pl nctcdf x 10 1 p tcdf x 10 plot x p x pl 1 r 0 8 0 6 F
94. where I x a b is the incomplete beta function with parameters a and b Example and Plot x 0 01 0 1 10 01 pl ncfpdf x 5 20 10 p fpdf x 5 20 plot x p x pl 0 8 1 06 0 4 0 2 l 0 go 0 2 4 6 8 10 12 Gamma Distribution Background Thegamma distribution is a family of curves based on two param eters The chi square and exponential distributions which are children of the gamma distribution are one parameter distributions that fix one of the two gamma parameters 1 25 1 26 The gamma distribution has the following relationship with the incomplete gamma function T x a b gammaind a For b 1 the functions are identical When a is large the gamma distribution closely approximates a normal distri bution with the advantage that the gamma distribution has density only for positive real numbers Mathematical Definition The gamma pdf is ix 1 a 1_b Xx e a bT a y f x a b Parameter Estimation Suppose you are stress testing computer memory chips and collecting data on their lifetimes You assume that these lifetimes follow a gamma distribution You want to know how long you can expect the average computer memory chip to last Parameter estimation is the process of deter mining the parameters of the gamma distribution that fit this data best in some sense One popular criterion of goodness is to maximize the likelihood function The
95. which includes the theoretical and hypothesized mean of zero 2 218 ttest2 Purpose Syntax Description Examples Hypothesis testing for the difference in means of two samples h significance ci ttest2 x y h significance ci ttest2 x y al pha h significance ci ttest2 x y alpha tail h ttest2 x y performs a t test to determine whether two samples from a normal distribution inx andy could have the same mean when the standard deviations are unknown but assumed equal h the result is 1 if you can reject the null hypothesis at the 0 05 significance level al pha and 0 otherwise significance is the p value associated with the T statistic a S significance isthe probability that the observed value of T could be as large or larger by chanceunder the null hypothesis that the mean of xis equal tothe mean of y ci is a 95 confidence interval for the true difference in means h significance ci ttest2 x y alpha gives control of the significance level al pha For example if al pha 0 01 and the result h is 1 you can reject the null hypothesis at thesi gnificance level 0 01 ci inthis caseisa 100 1 a pha confidence interval for the true difference in means ttest2 x y al pha tail allows specification of one or two tailed tests t ai is a flag that specifies one of three alternative hypotheses tail 0 default specifies the alternative Ly Hy tail 1 specifies the alternative Uy
96. with the Z statistic z X H si g isthe probability that the observed value of Z could be as large or larger by chance under the null hypothesis that the mean of x is equal to u ci iS al al pha confidence interval for the true mean This example generates 100 normal random numbers with theoretical mean zero and standard deviation one The observed mean and standard deviation ztest are different from their theoretical values of course We test the hypothesis that there is no true difference x normrnd 0 1 100 1 m mean x m 0 0727 h sig ci ztest x 0 1 h 0 sig 0 4669 0 1232 0 2687 The result h 0 means that we cannot reject the null hypothesis The signif icance level is 0 4669 which means that by chance we would have observed values of Z more extreme than the one in this example in 47 of 100 similar experiments A 95 confidence interval on the mean is 0 1232 0 2687 which includes the theoretical and hypothesized mean of zero A absolute deviation 1 44 additive 1 53 alternative hypothesis 1 71 analysis of variance 1 24 ANOVA 1 51 anoval 2 11 2 15 anova2 2 11 2 19 B bacteria counts 1 51 barttest 2 12 baseball odds 2 30 2 32 Bernoulli random variables 2 34 beta distribution 1 13 betacdf 2 3 2 23 betafit 2 3 2 24 betainv 2 5 2 25 betalike 2 3 2 26 betapdf 2 4 2 27 betarnd 2 6 2 28 betastat 2 8 2 29 binocdf 2 3 2 30 inofit 2 3 2 31 i noi nv 2 5 2 32 inomial
97. 0 For vectors prctile X p isthepth percentile of the elements in X For instance if p 50 then Y is the median of x For matrix X and scalar p prctile X p isa row vector containing the pth percentile of each column If p is a vector the ith row of Y isp i ofx x 1 5 1 5 X 1 2 3 4 5 2 4 6 8 10 3 6 9 12 15 4 8 12 16 20 5 10 15 20 25 y prcetile x 25 50 75 1 7500 3 5000 5 2500 7 0000 8 7500 3 0000 6 0000 9 0000 12 0000 15 0000 4 2500 8 5000 12 7500 17 0000 21 2500 2 175 princomp Purpose Syntax Description Example Reference See Also 2 176 Principal Components Analysis PC princomp X PC SCORE atent tsquare princomp X PC SCORE atent tsquare princomp X takes a data matrix X and returns the principal components in PC the so called Z scores in SCORE the eigenvalues of the covariance matrix of X in at ent and Hotelling s T statistic for each data point intsquare The Z scores are the data formed by transforming the original data into the space of the principal components The values of the vector at ent are the variance of the columns of SCORE Hotelling s T is a measure of the multi variate distance of each observation from the center of the data set Compute principal components for thei ngredi ents data in the Hald dataset and the variance accounted for by each component load hald pc score latent tsquare princomp ingredients pc latent
98. 0 1 0 99 0 99 1 20 ndim prob barttest x 0 05 ndim prob 0 0 0 0 5081 0 6618 1 0000 princomp pcacov pcares betacdf Purpose Syntax Description Examples Beta cumulative distribution function cdf P betacdf X A B bet acdf X A B computes the beta cdf with parameters A and 8 at the values in X The arguments X A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments The parameters A and8 must both be positive and x must lie on the interval 0 1 The beta cdf is 1 p Fix B a D a b f te 11 9b 1dt The result p is the probability that a single observation from a beta distribu tion with parameters a and b will fall in the interval 0 x xX 0 1 0 2 0 9 a 2 b 2 p betacdf x a b p 0 0280 0 2160 0 5000 0 7840 0 9720 a 1 2 3 p betacdf 0 5 a a 0 5000 0 5000 0 5000 2 23 betafit Purpose Syntax Description Example Reference See Also 2 24 Parameter estimates and confidence intervals for beta distributed data phat betafit x phat pci betafit x al pha betafit computes the maximum likelihood estimates of the parameters of the beta distribution from the data in the vector x With two output parameters betafit alSoreturns confidence intervals on the parameters in the form of a 2 by 2 matrix The first column of the matrix con
99. 0 10 y gampdf x 100 10 yl normpdf x 1000 100 plot x y x yl x 10 5 0 L L 1 f 1 700 800 900 1000 1100 1200 1300 1 27 Geometric Distribution Background The geometric distribution is discrete existing only on the nonne gative integers It is useful for modeling the runs of consecutive successes or failures in repeated independent trials of a system The geometric distribution models the number of successes before one failure in an independent succession of tests where each test results in success or failure Mathematical Definition The geometric pdf is y F X P Pag 1 9 where q 1 p Example and Plot Suppose the probability of that a five year old battery failing in cold weather is 0 03 What is the probability of starting 25 consecutive days during along cold snap 1 geocdf 25 0 03 ans 0 4530 The plot shows the cdf for this scenario x 0 25 y geocdf x 0 03 stairs x y 0 6 0 47 0 27 0 re 0 5 10 15 20 1 28 Hypergeometric Distribution Background The hypergeometric distribution models the total number of suc cesses in a fixed size sample drawn without replacement from a finite popula tion Thedistribution is discrete existing only for nonnegativeintegers less than the number of samples or the number of possible successes whichever is greater The hypergeometric
100. 0 0 0121 Devroye L Non Uniform Random Variate Generation Springer Verlag New York 1986 weibstat Purpose Mean and variance for the Weibull distribution Syntax M V weibstat A B Description For the Weibull distribution e The mean is 1 a r 1 b e The variance is 2 a ra 2b4 r7 14 b Examples mv weibstat 1 4 1 4 m 1 0000 0 6267 0 6192 0 6409 1 0000 0 1073 0 0506 0 0323 wei bstat 0 5 0 7 ans 3 4073 2 241 x 2fx Purpose Syntax Description Example See Also 2 242 Transform a factor settings matrix to a design matrix D x2fx X D x2fx X model D x2fx X transforms a matrix of system inputs X toa design matrix for a linear additive model with a constant term D x2fx X model allows control of the order of the regression model model can be one of these strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms Alternatively the argument mode can bea matrix of terms In this case each row of model represents one term The valuein a column is the exponent to raise the same column in X for that term This allows for models with polyno mial terms of arbitrary order x2fx is a utility function forrstool regstats andcordexch gt lt iT 1 2 3 4 5 6 model quadratic D x2fx x m
101. 000 the two popper types 0 0001 and the interaction between brand and popper type 0 7462 These values indicate that both popcorn brand and popper type affect the yield of popcorn but thereis no evidence of a synergistic interaction effect of the two The conclusion is that you can get the greatest yield using the Gourmet brand and an Air popper the three values located in popcorn 4 6 1 Hogg R V andJ Ledolter Engineering Statistics MacMillan Publishing Company 1987 2 21 barttest Purpose Syntax Description Example See Also 2 22 Bartlett s test for dimensionality ndim barttest x al pha ndi m prob chisquare barttest x alpha ndim barttest x alpha returns the number of dimensions necessary to explain the nonrandom variation in the the data matrix x using the signifi cance probability a pha The dimension is determined by a series of hypothesis tests The test for ndi m 1 tests the hypothesis that the variances of the data values along each principal component are equal the test for ndi m 2 tests the hypothesis that the variances along the second through last components are equal and soon ndim prob chisquare barttest x alpha returns the number of dimen sions the significance values for the hypothesis tests and the y2 values asso ciated with the tests xX mvnornd 0 0 1 0 99 0 99 1 20 x 3 4 mvnrnd 0 0 1 0 99 0 99 1 20 x 5 6 mvnrnd 0
102. 0000 4 0000 nanmin nanmax nanmedian nanstd nansum 2 123 nanmedian Purpose Syntax Description Example See Also Median ignoring NaNs y nanmedi an X nanmedian X the median treating NaNs as missing values For vectors nanmedi an x isthe median of the non NaN elements of x For matrices nanmedi an X iS a row vector containing the median of the non NaN elements in each column of X m magi c 4 m 1 6 9 11 NaN NaN NaN NaN m NaN 2 NaN 13 5 NaN 10 8 9 7 NaN 12 4 14 15 1 nmedian nanmedian m nmedian 5 0000 7 0000 12 5000 10 0000 nanmin nanmax nanmean nanstd nansum nanmin Purpose Minimum ignoring NaNs Syntax m nanmin a m ndx nanmi n a m nanmin a b Description m nanmi n a returns the minimum with Na Ns treated as missing F or vectors nanmin a isthe smallest non NaN element in a For matrices nanmi n A isa row vector containing the minimum non NaN element from each column m ndx nanmin a also returns the indices of the minimum values in vector ndx m Example m magic 3 m 1 6 8 NaN NaN NaN m NaN 1 6 3 5 NaN 4 NaN 2 nmin mnidx nanmi n m nmin 3 1 2 minidx 2 1 3 See Also nanmax nanmean nanmedian nanstd nansum nanmin a b returns the smaller of a orb which must match in size 2 125 nanstd Purpose Standard deviation ignoring NaNs Syntax y nanstd X Description nanstd X thestandard deviation treating NaNs as
103. 2 2 3425 r31 r3 1 r31 0 2008 0 1957 0 2045 0 1921 J Edward J ackson A User s Guideto Principal Components J ohn Wiley amp Sons Inc 1991 pp 1 25 barttest pcacov princomp pdf Purpose Syntax Description Examples Probability density function pdf for a specified distribution Y pdf name X Al A2 A3 pdf name X Al A2 A3 returns a matrix of densities name is a string containing the name of the distribution X is a matrix of values and A1 A2 and A3 are matrices of distribution parameters Depending on the distribution some of the parameters may not be necessary The arguments X 41 42 and A3 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments pdf is a utility routine allowing access to all the pdfs in the Statistics Toolbox using the name of the distribution as a parameter pdf Normal 2 2 0 1 a iT 0 0540 0 2420 0 3989 0 2420 0 0540 p pdf Poisson 0 4 1 5 0 3679 0 2707 0 2240 0 1954 0 1755 2 163 perms Purpose All permutations Syntax P perms v Description P perms v wherevisarow vector of length n creates a matrix whose rows consist of all possible permutations of then elements of v The matrix P contains n rows andn columns perms is only practical when n is less than 8 or 9 Example perms 2 4 6 ans NY FN DF DD EN DN D amp Dor BN YP 2 164 poisscd
104. 3 1208 Confidence Intervals on the Predicted Responses Usingnl predci form 95 confidence intervals on the predicted responses from the reaction kinetics example yhat delta nl predci hougen reactants betahat f opd rate yhat delta opd 8 5500 8 2937 0 9178 3 7900 3 8584 0 7244 4 8200 4 7950 0 8267 0 0200 0 0725 0 4775 2 7500 2 5687 0 4987 14 3900 14 2227 0 9666 2 5400 2 4393 0 9247 4 3500 3 9360 0 7327 13 0000 12 9440 0 7210 8 5000 8 2670 0 9459 0 0500 0 1437 0 9537 11 3200 11 3484 0 9228 3 1300 3 3145 0 8418 The matrix opd has the observed rates in column 1 and the predictions in column 2 The 95 confidence interval is column 2 column 3 Note that the confidence interval contains the observations in each case An Interactive GUI for Nonlinear Fitting and Prediction The function nl i ntoo for nonlinear models is a direct analog of rstoo l for polynomial models nl i nt ool requires the sameinputs asnlinfit nlintool callsnl infit The purpose of nl i nt ool islarger than just fitting and prediction for nonlinear models This GUI provides an enviroment for exploration of the graph of a mul tidimensional nonlinear function If you have already loaded r eacti on mat youcanstartnlintool nlintool reactants rate hougen beta 0 01 xn yn 1 69 1 70 You will see a vector of three plots The dependent variable of all three plots is the reaction rate The first plot has hy
105. 4 2 143 ncx2i nv 2 6 2 144 ncx2pdf 2 5 2 145 ncx2rnd 2 7 2 146 ncx2stat 2 8 2 147 negative binomial distribution 1 13 1 31 Newton s method 2 81 nlinfit 2 12 2 148 nlintool 2 12 2 149 nl parci 2 12 2 150 nl predci 2 12 2 151 noncentral chi square distribution 1 13 noncentral F distribution 1 13 1 24 noncentral t distribution 1 13 1 38 nonlinear 2 2 nonlinear regression models 1 65 normal distribution 1 13 1 32 normal probability plots 1 88 1 89 normcdf 2 4 2 152 normdemo 2 11 2 158 normfit 2 3 2 153 normi nv 2 6 2 154 norm ike 2 3 nor mpdf 2 5 2 155 nor mpl ot 2 10 2 156 nor mr nd 2 7 2 157 normstat 2 8 2 159 notches 2 38 null hypothesis 1 71 Index 0 one way analysis of variance ANOVA 1 51 outliers 1 42 P pareto 2 10 2 160 parts 2 14 Pascal Blaise 1 16 PCA 2 2 pcacov 2 12 2 161 pcares 2 12 2 162 pdf 1 6 pdf 2 163 percentiles 1 47 perms 2 164 plots 1 47 2 2 poisscdf 2 4 2 165 poissfit 2 3 2 166 poissinv 2 6 2 167 Poisson distribution 1 13 1 34 poisspdf 2 5 2 168 poissrnd 2 7 2 169 poisstat 2 8 2 170 pol yconf 2 11 2 171 polydata 2 14 polyfit 2 11 2 172 polynomial 1 111 polytool 1 109 2 13 2 173 polyval 2 11 2 174 popcorn 2 20 popcorn 2 14 prctile 2 9 2 175 Principal Components Analysis 1 77 component scores 1 81 component variances 1 85 Hotelling s T squared 1 87 Scree plot 1 86 princomp 2 12 2 176 pr
106. 5 incomplete beta function 1 13 incomplete gamma function 1 26 inspector 2 165 integral equation 2 25 interaction 1 53 interpolated 2 206 interquartile range iqr 1 44 inverse cdf 1 6 1 7 iqr 1 44 i qr 2 9 2 106 K kurtosis 2 9 2 107 L lawdata 2 14 least squares 2 172 leverage 2 108 lifetimes 1 21 light bulbs life of 2 66 likelihood function 2 27 linear 2 2 linear models 1 51 ogncdf 2 4 2 109 ogninv 2 6 2 110 lognormal distribution 1 13 1 30 ognpdf 2 5 2 111 ognrnd 2 7 2 112 ognstat 2 8 2 113 lottery 2 224 sl i ne 2 10 2 114 LU factorizations 2 171 M Macintosh 2 171 mad 2 9 2 115 1 3 Index 1 4 mahal 2 116 mean 1 6 1 11 mean 2 9 2 117 Mean Squares MS 2 15 measures of central tendency 1 42 measures of dispersion 1 43 medi an 2 9 2 118 mileage 2 14 ml e 2 3 2 119 models linear 1 51 nonlinear 1 65 moment 2 9 2 120 Monte Carlo simulation 2 106 moore 2 14 multiple linear regression 1 56 multivariate statistics 1 77 mvnrnd 2 121 N nanmax 2 9 2 122 nanmean 2 9 2 123 nanmedian 2 9 2 124 nanmin 2 9 2 125 NaNs 1 46 nanstd 2 9 2 126 nansum 2 9 2 127 nbincdf 2 4 2 128 nbininv 2 6 2 129 nbinpdf 2 5 2 130 nbinrnd 2 7 2 131 nbinstat 2 8 2 132 ncfcdf 2 4 2 133 ncfinv 2 6 2 134 ncfpdf 2 5 2 135 ncfrnd 2 7 2 136 ncfstat 2 8 2 137 nctcdf 2 4 2 138 nctinv 2 6 2 139 nctpdf 2 5 2 140 nctrnd 2 7 2 141 nctstat 2 8 2 142 ncx2cdf 2
107. 5 0 5 rndgeo 0 1 3 1 0 0 1 0 2 0 2 87 geomean Purpose Syntax Description Examples See Also Geometric mean of a sample m geomean X geomean calculates the geometric mean of a sample For vectors geomean x is the geometric mean of the elements in x For matrices geomean X isarow vector containing the geometric means of each column The geometric mean is 1 n n j 1 The sample average is greater than or equal to the geometric mean x exprnd 1 10 6 geometric geomean x geometric 0 7466 0 6061 0 6038 0 2569 0 7539 0 3478 average mean x average 1 3509 1 1583 0 9741 0 5319 1 0088 0 8122 mean median harmmean trimmean geopdf Purpose Syntax Description Examples Geometric probability density function pdf Y geopdf X P geocdf X P computes the geometric pdf with probabilities P at the values in X The arguments X andP must be the same size except that a scalar argument functions as a constant matrix of the same size as the other argument The parameter P is on the interval 0 1 The geometric pdf is y F X P pg 0 1 9 where q 1 p Suppose you toss a fair coin repeatedly If the coin lands face up heads that is a success What is the probability of observing exactly three tails before getting a heads p geopdf 3 0 5 p 0 0625 2 89 geornd Purpose Syntax Description Examples Random numbers from the geometr
108. 50 unifinv Purpose Syntax Description Examples Inverse continuous uniform cumulative distribution function cdf X unifinv P A B uni finv P A B Computes the inverse of the uniform cdf with parameters A and8 at the values in X The arguments X A and B must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments A andB are the minimum and maximum values respectively The inverse of the uniform cdf is 1 x F pja b a p a b l jo 1 P The standard uniform distribution hasA O and8 1 What is the median of the standard uniform distribution medi an_value unifinv 0 5 median_value 0 5000 What is the 99th percentile of the uniform distribution between 1 and 1 percentile unifinv 0 99 1 1 om iT percenti 0 9800 2 227 unifit Purpose Syntax Description Example See Also 2 228 Parameter estimates for uniformly distributed data ahat bhat unifit X ahat bhat ACI BCI unifit X ahat bhat ACI BCI unifit X alpha ahat bhat unifit X returns the maximum likelihood estimates M LEs of the parameters of the uniform distribution given the data in x ahat bhat ACI BCI unifit X also returns 95 confidence intervals ACI and BCI which are matrices with two rows The first row contains the lower bound of the interval for each column of the matrix X The second row
109. 50 1 table chi2 p crosstab rl r2 table 10 5 6 13 chi2 4 1723 p 0 1242 The result 0 1242 is not a surprise A very small value of p would make us suspect the randomness of the random number generator tabulate daugment Purpose Syntax Description Example See Also D optimal augmentation of an experimental design settings daugment startdes nruns settings X daugment startdes nruns model settings daugment startdes nruns augments an initial experimental design startdes withnruns new tests settings X daugment startdes nruns model also supplies the design matrix X The input model controls the order of the regression model By default daugment assumes a linear additive model Alternatively model can be any of these e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms daugment uses the coordinate exchange algorithm We add 5 runs to a 2 factorial design to allow us to fit a quadratic model startdes 1 l 1 1 1 1 1 1 settings daugment startdes 5 quadratic settings 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 The result is a 32 factorial design cordexch dcovary rowexch 2 57 dcovary Purpose Syntax Description Example See Also D Optimal design with specified fix
110. 679 0 3033 0 2388 0 1947 0 1637 yl exppdf 1 mu yl 0 3679 0 3033 0 2388 0 1947 0 1637 2 83 gamrnd Purpose Syntax Description Examples Random numbers from the gamma distribution R gamrnd A B R gamrnd A B m R gamrnd A B mn R gamrnd A B generates gamma random numbers with parameters A and B The size of R is the common size of A and B if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R gamrnd A B m generates gamma random numbers with parametersA and B mis a 1 by 2 vector that contains the row and column dimensions of R R gamrnd A B m n generates gamma random numbers with parameters A and 8 The scalars m and n are the row and column dimensions of R nl gamrnd 1 5 6 10 nl 9 1132 12 8431 24 8025 38 5960 106 4164 n2 gamrnd 5 10 1 5 30 9486 33 5667 33 6837 55 2014 46 8265 n3 gamrnd 2 6 3 1 5 n3 12 8715 11 3068 3 0982 15 6012 21 6739 gamstat Purpose Mean and variance for the gamma distribution Syntax M V gamstat A B Description For the gamma distribution e the mean is ab e the variance is ab2 Examples mv gamstat 1 5 1 5 m 1 4 9 16 25 v 1 8 27 64 125 mv gamstat 1 5 1 1 5 1 0000 0 5000 0 3333 0 2500 0 2000 2 85 geocdf Purpose Syntax Description Examples Geometric cumulative distribution function cdf Y geocdf X P geocdf X P com
111. 8 run experiment with 3 factors that is optimal with respect to a linear drift in the response over time First we create our dri ft input variable Note that dri f t is normalized to have mean zero Its minimum is 1 and its maximum is 1 drift linspace 1 1 8 drift 1 0000 0 7143 0 4286 0 1429 0 1429 0 4286 0 7143 1 0000 settings dcovary 3 drift linear settings 1 0000 1 0000 1 0000 1 0000 1 0000 1 0000 1 0000 0 7143 1 0000 1 0000 1 0000 0 4286 1 0000 1 0000 1 0000 0 1429 1 0000 1 0000 1 0000 0 1429 1 0000 1 0000 1 0000 0 4286 1 0000 1 0000 1 0000 0 7143 1 0000 1 0000 1 0000 1 0000 1 108 Demos The Statistics Toolbox has demonstration programs that create an interactive environment for exploring the probability distribution random number gener ation curve fitting and design of experiments functions Demo Purpose disttool Graphic interaction with probability distributions randtool Interactive control of random number generation pol ytool Interactive graphic prediction of polynomial fits rsmdemo Design of Experiments and regression modeling The disttool Demo disttool isa graphic environment for developing an intuitive understanding of probability distributions Thedisttool demo has the following features e A graph of the cdf pdf for the given parameters of a distribution A pop up menu for changing the distribution function A pop up
112. A and B mis a 1 by 2 vector that contains the row and column dimensions of r R betarnd A B m n generates an m byn matrix of beta random numbers with parameters A and8 a 11 2 2 b 1 22 1 2 r betarnd a b 0 6987 0 6139 0 9102 0 8067 r betarnd 10 10 1 5 0 5974 0 4777 0 5538 0 5465 0 6327 r betarnd 4 2 2 3 0 3943 0 6101 0 5768 0 5990 0 2760 0 5474 betastat Purpose Syntax Description Examples Mean and variance for the beta distribution M V betastat A B For the beta distribution e The mean is z2 a b ab e The variance is a b 1 a b 2 If the parameters are equal the mean is 1 2 1 6 v a m betastat a a m 0 5000 0 5000 0 5000 0 5000 0 0833 0 0500 0 0357 0 0278 0 5000 0 0227 0 5000 0 0192 2 29 binocdf Purpose Syntax Description Examples 2 30 Binomial cumulative distribution function cdf Y binocdf X N P bi nocdf X N P computes the binomial cdf with parameters N and P at the values in X The arguments X N andP must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameter N must bea positive integer andP must lie on the interval 0 1 The binomial cdf is X N l i y F xin p Vo p a Mga cat i 0 Theresult y is the probability of observing up to x successes in n independent
113. Also raylcdf raylinv raylpdf raylrnd rcoplot Purpose Residual case order plot Syntax rcoplot r rint Description rcoplot r rint displays an errorbar plot of the confidence intervals on the residuals from a regression The residuals appear in the plot in case order r andrint areoutputs from ther egress function Example X ones 10 1 1 10 y X 10 1 normrnd 0 0 1 10 1 b bint r rint regress y X 0 05 rcoplot r rint 0 2 H wo 01t 0 T oO 3s o0 T t O 0 1 t 0 2 0 2 4 6 8 10 Case Number Thefigureshows a plot of the residuals with error bars showing 95 confidence intervals on the residuals All the error bars pass through the zero line indi cating that there are no outliers in the data See Also regress 2 187 refcurve Purpose Syntax Description Example See Also Add a polynomial curve to the current plot h refcurve p ref curve adds a graph of the polynomial p to the current axes The function for a polynomial of degree n is y Pxx pox Note that p goes with the highest order term F PAX Pys7 h refcurve p returns the handle to the curve Plot data for the height of a rocket against time and add a reference curve showing the theoretical height assuming no air friction The initial velocity of the rocket is 100 m sec h 85 162 230 289 339 381 413 437 452 458 456 440 400 356 plot h refcurve 4 9 100 0
114. OVA for comparing the means of two or more columns and two or more rows of the sample in X The data in different columns represent changes in one factor The data in different rows represent changes in the other factor If there is more than one observa tion per row column pair then the argument r eps indicates the number of observations per cell The matrix below shows the format for a set up where the column factor has two levels the row factor has three levels and there are two replications The subscripts indicate row column and replicate respectively X111 X121 X112 X122 X211 X221 X212 X222 X311 X321 X312 X322 anova2 returns the p values for the null hypotheses that the means of the columns and the means of the rows of X are equal If any p value is near zero this casts doubt on the null hypothesis and suggests that the means of the source of variability associated with that p value are in fact different The choice of a limit for the p value to determine whether the result is statis tically significant is left to the researcher It is common to declare a result significant if the p value is less than 0 05 or 0 01 2 19 anova2 Examples 2 20 anova2 alsodisplays a figure showing the standard ANOVA table which divides the variability of the data in X intothree or four parts depending on the value of reps e The variability due to the differences among the column means e The variabilit
115. P m n generates random numbers with parameters R and P The scalars m andn are the row and column dimensions of RND The negative binomial models consecutive trials each having a constant prob ability P of success The parameter R is the number of successes required before stopping Suppose you want to simulate a process that has a defect probability of 0 01 How many units might Quality Assurance inspect before finding 3 defective items r nbinrnd 3 0 01 1 6 3 496 142 420 396 851 178 nbincdf nbininv nbinpdf nbinstat 2 131 nbinstat Purpose Mean and variance of the negative binomial distribution Syntax M V nbinstat R P Description M V nbinstat R P returns the mean and variance of the negative bino mial distibution with parameters R andP e The mean is A F e The varianceis rq 5 p where q 1 p Example p 0 1 0 2 0 9 r 1 5 R P meshgrid r p M V nbinstat R P M 9 0000 18 0000 27 0000 36 0000 45 0000 2 3333 4 6667 7 0000 9 3333 11 6667 1 0000 2 0000 3 0000 4 0000 5 0000 0 4286 0 8571 1 2857 1 7143 2 1429 0 1111 0 2222 0 3333 0 4444 0 5556 V 90 0000 180 0000 270 0000 360 0000 450 0000 7 7778 15 5556 23 3333 31 1111 38 8889 2 0000 4 0000 6 0000 8 0000 10 0000 0 6122 1 2245 1 8367 2 4490 3 0612 0 1235 0 2469 0 3704 0 4938 0 6173 See Also nbincdf nbininv nbinpdf nbinrnd 2 132 ncfcdf Purpose Noncentral F cumulative distribution function cdf Syntax P
116. Purpose Syntax Description Interactive plot for prediction of fitted polynomials polytool x y polytool x y n polytool x y n al pha polytool x y fits aline tothe column vectors x andy and displays an inter active plot of the result This plot is graphic user interface for exploring the effects of changing the polynomial degree of the fit The plot shows the fitted curve and 95 global confidence intervals on a new predicted value for the curve Text with current predicted value of y and its uncertainty appears left of the y axis polytool x y n initially fits a polynomial of order n polytool x y n alpha plots 100 1 a pha confidence intervals on the predicted values pol ytool fits by least squares using the regression model 2 yi Bot B1Xi B gt X i t Bx Ej N 0 0 vi Cov e j 0 Vi j Evaluate the function by typing a value in the x axis edit box or dragging the vertical reference line on the plot The shape of the pointer changes from an arrow to a cross hair when you areover the vertical linetoindicatethat theline is draggable The predicted value of y will update as you drag the reference line The argument n controls the degree of the polynomial fit To change the degree of the polynomial choose from the pop up menu at the top of the figure When you are done press the Close button 2 173 polyval Purpose Syntax Description Examples See Also 2 174 Po
117. Sign test for paired samples p signtest x y alpha p h signtest x y alpha p signtest x y alpha returns the significance probability that the medians of two matched samples x andy are equal x andy must be vectors of equal length y may also bea scalar in this case si gntest computes the probability that the median of x is different from the constant y al pha is the desired level of significance and must be a scalar between zero and one p h signtest x y alpha also returns the result of the hypothesis test h h is zero if the difference in medians of x andy is not significantly different from zero h is one if the two medians are significantly different p is the probability of observing a result equally or more extreme than the one using the data x andy if the null hypothesis is true p is calculated using the signs plus or minus of the differences between corresponding elements in x andy Ifp is near zero this casts doubt on this hypothesis This example tests the hypothesis of equality of means for two samples gener ated with nor mr nd The samples have the same theoretical mean but different standard deviations x normrnd 0 1 20 1 y normrnd 0 2 20 1 p h signtest x y 0 05 0 8238 ranksum signrank ttest skewness Purpose Syntax Description Example See Also Sample skewness y skewness xX skewness X returnsthesampleskewness of X For vectors skewness x isthe skewn
118. Statistics Toolbox Computation Visualization Programming lt i User s Guide Version 2 1 How to Contact The MathWorks 508 647 7000 Phone 508 647 7001 Fax The MathWorks Inc Mail 24 Prime Park Way Natick MA 01760 1500 http www mathworks com Web ftp mat hworks com Anonymous FTP server comp soft sys matlab Newsgroup support mathworks com Technical support suggest mathworks com Product enhancement suggestions bugs mathworks com Bug reports doc mathworks com Documentation error reports subscribe mathworks com Subscribing user registration service mathworks com Order status license renewals passcodes info mathworks com Sales pricing and general information Statistic Toolbox User s Guide COPYRIGHT 1993 1998 by The MathWorks Inc All Rights Reserved The software described in this document is furnished under a license agreement The software may be used or copied only under the terms of the license agreement No part of this manual may be photocopied or repro duced in any form without prior written consent from The MathWorks Inc U S GOVERNMENT If Licensee is acquiring the Programs on behalf of any unit or agency of the U S Government the following shall apply a For units of the Department of Defense the Government shall have only the rights specified in the license under which the commercial computer software or commercial software documentation was obtained as set forth in subparagr
119. TA and the matrix of data X BETA must have 5 elements and X must have three columns The model formis y b1 x2 x3 b5 1 b2 x1 b3 x2 b4 x3 Reference 1 Bates Douglas and Watts Donald Nonlinear Regression Analysis and Its Applications Wiley 1988 p 271 272 Copyright c 1993 97 by The MathWorks Inc B A Jones 1 06 95 bl beta 1 b2 beta 2 b3 beta 3 b4 beta 4 b5 beta 5 xl x 1 X2 x 2 x3 x 3 yhat b1 x2 x3 b5 1 b2 x1 b3 x2 b4 x3 1 67 Tofit ther eacti on data call the function nlinfit load reaction betahat nlinfit reactants rate hougen beta betahat 2526 0628 0400 1124 1914 POO OF nl infit has two optional outputs They are the residuals and J acobian matrix at the solution The residuals are the differences between the observed and fitted responses TheJ acobian matrix is the direct analog of the matrix X in the standard linear regression model These outputs are useful for obtaining confidence intervals on the parameter estimates and predicted responses Confidence Intervals on the Parameter Estimates Usingn parci form 95 confidence intervals on the parameter estimates bet ahat from the reaction kinetics example betahat f J nlinfit reactants rate hougen beta betaci nlparci betahat f betaci 0 7467 3 2519 0 0377 0 1632 0 0312 0 1113 0 0609 0 2857 0 7381
120. U R exprnd MU m R exprnd MU m n R exprnd MU generates exponential random numbers with mean MU The size of R is the size of MU R exprnd MU m generates exponential random numbers with mean MU mis a 1 by 2 vector that contains the row and column dimensions of R R exprnd MU m n generates exponential random numbers with mean MU The scalars m and n are the row and column dimensions of R nl exprnd 5 10 nl 7 5943 18 3400 2 7113 3 0936 0 6078 9 5841 n2 exprnd 5 10 1 6 n2 3 2752 1 1110 23 5530 23 4303 5 7190 3 9876 n3 exprnd 5 2 3 24 3339 13 5271 1 8788 4 7932 4 3675 2 6468 ex pstat Purpose Mean and variance for the exponential distribution Syntax M V expstat MU Description For the exponential distribution e The mean is u e The variance is 112 Examples mv expstat 1 10 100 1000 m 1 10 100 1000 v 1 100 10000 1000000 2 69 fcdf Purpose Syntax Description Examples 2 70 F cumulative distribution function cdf P fcdf X V1 V2 fcdf X V1 V2 computes the F cdf with parameters V1 andv2 at the values in X The arguments Xx V1 and V2 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments Parameters V1 and V2 must contain positive integers The F cdf is ea v v 2 Fama 2 pt at 0 Vi V3 V2 y Vit V2 aka Gy 7 V2 Theresult p isthe probability t
121. a distribution obtained by setting a 1 in the equation below 1 yas 1 5 a bT a The exponential distribution is special because of its utility in modeling events that occur randomly over time The main application area is in studies of life times y f x a b 1 21 1 22 Mathematical Definition The exponential pdf is X 1 f x e u y u T Parameter Estimation Suppose you are stress testing light bulbs and collecting data on their lifetimes You assume that these lifetimes follow an exponential distribution Y ou want to know how long you can expect the average light bulb to last Parameter estimation is the process of determining the parameters of the exponential distribution that fit this data best in some sense One popular criterion of goodness is to maximize the likelihood function The likelihood has the same form as the beta pdf on the previous page But for the pdf the parameters are known constants and the variable is x The likelihood function reverses the roles of the variables Here the sample values the xs are already observed So they are the fixed constants The variables are the unknown parameters Maximum likelihood estimation MLE involves calcu lating the values of the parameters that give the highest likelihood given the particular set of data The function ex pf it returns the MLEs and confidence intervals for the param eters of the exponential distribution Here is an example
122. ables Note that while this example only uses three reactants rst ool can accommo date an arbitrary number of independent variables nterpretability may be limited by the size of the monitor for large numbers of inputs The GUI also has two pop up menus The Export menu facilitates saving var ious important variables in the GUI to the base workspace Below the E xport menu thereis another menu that allows you to change the order of the polyno mial model from within the GUI If you used the commands above this menu will have the string Full Quadratic Other choices are e Linear has the constant and first order terms only e Pure Quadratic includes constant linear and squared terms e Interactions includes constant linear and cross product terms Stepwise Regression Stepwise regression is a technique for choosing the variables to include in a multiple regression model F orward stepwise regression starts with no model terms At each step it adds the most statistically significant term the one with the highest F statistic or lowest p value until there are none left Backward stepwise regression starts with all the terms in the model and removes the least significant terms until all the remaining terms are statistically signifi cant It is also possible to start with a subset of all the terms and then add sig nificant terms or remove insignificant terms An important assumption behind the method is that some input variables
123. ained pcacov X takes the covariance matrix X and returns the principal components in pc the eigenvalues of the covariance matrix of X in at ent and the percentage of the total variance in the observa tions explained by each eigenvector in explained load hald covx cov ingredients pc variances explained pcacov covx pce 0 0678 0 6460 0 5673 0 5062 0 6785 0 0200 0 5440 0 4933 0 0290 0 7553 0 4036 0 5156 0 7309 0 1085 0 4684 0 4844 variances 517 7969 67 4964 12 4054 0 2372 explained 86 5974 11 2882 2 0747 0 0397 J Edward J ackson A User s Guideto Principal Components J ohn Wiley amp Sons Inc 1991 pp 1 25 barttest pcares princomp 2 161 pcares Purpose Syntax Description Example Reference See Also 2 162 Residuals from a Principal Components Analysis residuals pcares X ndi m pcares X ndim returns ther esi duals obtained by retainingndi m principal components of X Note that ndi mis a scalar and must be less than the number of columns in X Use the data matrix not the covariance matrix with this function This example shows the drop in the residuals from the first row of the Hald data as the number of component dimensions increase from one to three load hald rl pcares ingredients 1 r2 pcares ingredients 2 r3 pcares ingredients 3 Pid eat h rll 2 0350 2 8304 6 8378 3 0879 r21 r2 1 r21 2 4037 2 6930 1 648
124. aining the names of each case in the first column dat a is a numeric matrix with a value for each variable case pair e data is a numeric matrix with a value for each variable case pair tbhlread Example See Also data varnames casenames data 470 530 520 480 Varnames Male Female Casenames Verbal Quantitative caseread tbl write tblread sat dat 2 209 tol w rite Purpose Syntax Description Example See Also 2 210 Writes tabular data to the file system tbl write data varnames casenames tbl write data varnames casenames filename tbl write data varnames casenames displays the File Open dialog box for interactive specification of the tabular data output file The file format has variable names in the first row case names in the first column and dat a starting in the 2 2 position varnames isa string matrix containing the variable names casenames is a string matrix containing the names of each casein the first column data is a numeric matrix with a value for each variable case pair tblwrite data varnames casenames filename allows commandline specification of a file in the current directory or the complete pathname of any filein the string filename Continuing the example fromt bl read tbl write data varnames casenames sattest dat type sattest dat Male Female Verbal 470 530 Quantitative 520 480 casewrite tblread
125. ample here is the way to generate random numbers from the beta dis tribution Four statements obtain random numbers the first returns a single number the second returns a 2 by 2 matrix of random numbers and the third and fourth return 2 by 3 matrices of random numbers a l boos 2s C fel obs 2 2 d 25 75 5 10 m 2 3 nrow 2 ncol 3 t rl betarnd a b 0 4469 r2 betarnd c d r2 0 8931 0 4832 0 1316 0 2403 r3 betarnd a b m r3 0 4196 0 6078 0 1392 0 0410 0 0723 0 0782 r4 betarnd a b nrow ncol r4 0 0520 0 3975 0 1284 0 3891 0 1848 0 5186 Mean and Variance The mean and variance of a probability distribution are generally simple func tions of the parameters of the distribution The Statistics Toolbox functions ending in stat all produce the mean and variance of the desired distribution given the parameters 1 11 1 12 The example shows a contour plot of the mean of the Weibull distribution as a function of the parameters 0 5 0 1 5 1 0 04 2 Y meshgrid x y weibstat X Y c h contour x y Z 0 4 0 6 1 0 1 8 clabel c x y X Z Overview of the Distributions The Statistics Toolbox supports 20 probability distributions These are e Beta e Binomial e Chi square Noncentral Chi square Discrete Uniform Exponential F Noncentral F Gamma e Geometric e Hypergeometric e Lognormal e Negative Bi
126. ample and a normal probability plot of the data x normrnd 0 1 50 1 h normpl ot x Normal Probability Plot 0 99 F 0 98 F 0 95 ese cee ee eee ea iak l 0 90 bor ee oof o x a Probability oa oO 0 25 fo idee ti N eS a Ra ee eae a 0 10 bent eine gars wash ee 4 0 05 Pika eesaes a cee ee ieee enna eee rere ene eee see 0 01 HiAk 25 2 45 4 05 0 05 1 15 Data The plot is linear indicating that you can model the sample by a normal distri bution normrnd Purpose Syntax Description Examples Random numbers from the normal distribution R normrnd MU SI GMA R normrnd MU SI GMA m R normrnd MU SI GMA m n R normrnd MU SI GMA generates normal random numbers with mean MU and standard deviation SI GMA The size of R is the common size of MU and SI GMA if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R normrnd MU SI GMA m generates normal random numbers with parame ters MU andS GMA mis a1 by 2 vector that contains the row and column dimen sions of R R normrnd MU SIGMA m n generates normal random numbers with param eters MU and SI GMA The scalars m andn aretherow and column dimensions of R nl normrnd 1 6 1 1 6 nl 2 1650 2 3134 3 0250 4 0879 4 8607 6 2827 n2 normrnd 0 1 1 5 n2 0 0591 1 7971 0 2641 0 8717 1 4462 n3 normrnd 1
127. and has the desirable property of being in the same units as the data That is if the data is in meters the stan dard deviation is in meters as well The variance is in meters2 which is more difficult to interpret Neither the standard deviation nor the variance is robust to outliers A data value that is separate from the body of the data can increase the value of the statistics by an arbitrarily large amount The mean absolute deviation mad is also sensitive to outliers But the mad does not move quite as much as the standard deviation or variance in response to bad data The interquartile range iqr is the difference between the 75th and 25th per centile of the data Since only the middle 50 of the data affects this measure it is robust to outliers The example below shows the behavior of the measures of dispersion for a sample with one outlier x ones 1 6 100 X 1 1 1 1 1 1 100 stats iqr x mad x range x std x stats 0 24 2449 99 0000 37 4185 1 45 Functions for Data with Missing Values NaNs Most real world datasets have one or more missing elements It is convenient to code missing entries in a matrix as NaN Not a Number Here is a simple example m magic 3 m 1 5 9 NaN NaN NaN m NaN 1 6 3 NaN 7 4 9 NaN Simply removing any row with aNaN in it would leave us with nothing But any arithmetic operation involving NaN yields NaN as below sum m ans NaN NaN NaN T
128. antaneous failure rate If f t and F t are the pdf and cdf of a distribution then the hazard rate is f t A a F t Substituting the pdf and cdf of the exponential distribution for f t and F t above yields a constant The example on the next page shows that the hazard rate for the Weibull distribution can vary Mathematical Definition The Weibull pdf is b y f xja b abx e lia Parameter Estimation Suppose we wish to model the tensile strength of a thin fil ament using the Weibull distribution The function wei bfit give MLEs and confidence intervals for the Weibull parameters strength weibrnd 0 5 2 100 1 Simulated strengths p ci weibfit strength p 0 4746 1 9582 0 3851 1 6598 0 5641 2 2565 The default 95 confidence interval for each parameter contains the true value Example and Plot The exponential distribution has a constant hazard function which is not generally the case for the Weibull distribution The plot shows the hazard functions for exponential dashed line and Weibull solid line distributions having the same mean life The Weibull hazard rate here increases with age a reasonable assumption t 0 0 1 3 hl exppdf t 0 6267 1 expcdf t 0 6267 h2 weibpdf t 2 2 1 weibcdf t 2 2 plot t hl t h2 15 r 10 f 5 0 i j 0 0 5 1 1 5 2 2 5 3 1 41 Descriptive Statistics Data samples can have thousands even mil
129. aph a of the Rights in Commercial Computer Software or Commercial Software Documentation Clause at DFARS 227 7202 3 therefore the rights set forth herein shall apply and b For any other unit or agency NOTICE Notwithstanding any other lease or license agreement that may pertain to or accompany the delivery of the computer software and accompanying documentation the rights of the Government regarding its use reproduction and disclo sure are as set forth in Clause 52 227 19 c 2 of the FAR MATLAB Simulink Handle Graphics and Real Time Workshop are registered trademarks and Stateflow and Target Language Compiler are trademarks of The MathWorks Inc Other product or brand names are trademarks or registered trademarks of their respective holders Printing History September 1993 First printing Version 1 March 1996 Second printing Version 2 J anuary 1997 Third printing ForMATLAB 5 May 1997 Revised for MATLAB 5 1 online version J anuary 1998 Revised for MATLAB 5 2 online version Contents Before You Begin 00 0 c eee eee v What is the Statistics Toolbox 1 aeaaea a v How to Use This Guide 0 0 ccc ee v Mathematical Notation 0 0 0 cc cece ees vi Typographical Conventions 00 cece eee eee vii Tutorial Probability Distributions 0 00 cece eee ee ees 1 2 Parameter Estimation 0 0 0 ccc eee ees 1 2 Descriptive Statistics 0 cece ees 1 3 Linear Mo
130. are building blocks suitable for use inside other analytical tools The Statistics Toolbox has more than 200 M files supporting work in the top ical areas below e Probability distributions e Parameter estimation e Descriptive statistics Linear models Nonlinear models e Hypothesis tests Multivariate statistics Statistical plots Statistical Process Control Design of Experiments Probability Distributions The Statistics Toolbox supports 20 probability distributions For each distribu tion there are five associated functions They are e Probability density function pdf e Cumulative distribution function cdf e Inverse of the cumulative distribution function e Random number generator e Mean and variance as a function of the parameters Parameter Estimation The Statistics Toolbox has functions for computing parameter estimates and confidence intervals for data driven distributions beta binomial exponential gamma normal Poisson uniform and Weibull Descriptive Statistics The Statistics Toolbox provides functions for describing the features of a data sample These descriptive statistics include measures of location and spread percentile estimates and functions for dealing with data having missing values Linear Models In the area of linear models the Statistics Toolbox supports one way and two way analysis of variance ANOVA multiple linear regression stepwise regression response surface prediction and
131. arguments X and V must be the same size except that a scalar argument func tions as a constant matrix of the same size of the other argument The degrees of freedom V must be a positive integer The chi square pdf is v 2 2 2 xX e y f v V 27 T v 2 The x density function with n degrees of freedom is the same as the gamma density function with parameters n 2 and 2 If x is standard normal then x is distributed x2 with one degree of freedom If X1 X2 Xn are n independent standard normal observations then the sum of the squares of the x s is distributed y2 with n degrees of freedom nu 1 6 X nu y chi2pdf x nu y 0 2420 0 1839 0 1542 0 1353 0 1220 0 1120 The mean of the x distribution is the value of the parameter nu The above example shows that the probability density of the mean falls as nu increases chi2rnd Purpose Syntax Description Examples Random numbers from the chi square x2 distribution R chi 2rnd V R chi2rnd V m R chi2rnd V m n R chi2rnd V generates x2 random numbers with v degrees of freedom The size of R isthe size of V R chi2rnd V m generates x2 random numbers with v degrees of freedom m is a 1 by 2 vector that contains the row and column dimensions of R R chi2rnd V m n generates x random numbers with v degrees of freedom The scalars m and n are the row and column dimensions of R Note that the first and third command
132. at ambdaci poissfit X al pha poissfit X returns the maximum likelihood estimate MLE of the param eter of the Poisson distribution given the data x lambdahat lambdaci poissfit X also gives 95 confidence intervals in lamdaci lambdahat lambdaci poissfit X alpha gives 100 1 al pha percent confidence intervals For examplea pha 0 001 yields 99 9 confidence inter vals The sample average is the MLE of r poissrnd 5 10 2 l lci poissfit r 4 8000 4 8000 3 5000 3 5000 6 2000 6 2000 betafit binofit expfit gamfit poissfit unifit weibfit poissinv Purpose Syntax Description Examples Inverse of the Poisson cumulative distribution function cdf X poissinv P LAMBDA poissinv P LAMBDA returns the smallest value X such that the Poisson cdf evaluated at X equals or exceeds P If the average number of defects A is two what is the 95th percentile of the number of defects poissinv 0 95 2 ans 5 What is the median number of defects median_defects poissinv 0 50 2 median_defects 2 2 167 poisspdf Purpose Syntax Description Examples 2 168 Poisson probability density function pdf Y poisspdf X LAMBDA poisspdf X LAMBDA computes the Poisson pdf with parameter settings LAMBDA at the values in X The arguments X and LAMBDA must be the same size except that a scalar argument functions as a constant matrix of the same size of th
133. at takes nonnegative integer values The parameter A is both the mean and the vari 1 34 ance of the distribution Thus as thesize of the numbers in a particular sample of Poisson random numbers gets larger so does the variability of the numbers As Poisson 1837 showed the Poisson distribution is the limiting case of a binomial distribution where N approaches infinity and p goes to zero while Np A The Poisson and exponential distributions are related If the number of counts follows the Poisson distribution then the interval between individual counts follows the exponential distribution Mathematical Definition The Poisson pdf is X A A y f Xx xe loo 1 KO Parameter Estimation The MLE and the MVUE of the Poisson parameter is the sample mean The sum of independent Poisson random variables is also Poisson with parameter equal to the sum of the individual parameters The Statistics Toolbox makes use of this fact to calculate confidence intervals on i As gets large the Poisson distribution can be approximated by a normal dis tribution with u A and o A The Statistics Toolbox uses this approximation for calculating confidence intervals for values of greater than 100 Example and Plot The plot shows the probability for each non negative integer when 5 x 0 15 y poisspdf x 5 plot x y 0 2 0 157 J 0 1 J 0 05 a T ot te 0 5 10 15 1 35
134. bust to outliers They are useful when the sample is distributed lognormal or heavily skewed The example shows the behavior of the measures of location for a sample with one outlier x ones 1 6 100 1 1 1 1 1 1 100 locate geomean x harmmean x mean x medi an x trimmean x 25 locate 1 9307 1 1647 15 1429 1 0000 1 0000 You can see that the mean is far from any data value because of the influence of the outlier The median and trimmed mean ignore the outlying value and describe the location of the rest of the data values Measures of Dispersion The purpose of measures of dispersion is to find out how spread out the data values are on the number line Another term for these statistics is measures of spread 1 43 The table gives the function names and descriptions Measures of Dispersion iqr Interquartile Range ma d Mean Absolute Deviation range Range std Standard Deviation in MATLAB var Variance The range the difference between the maximum and minimum values is the simplest measure of spread But if there is an outlier in the data it will bethe minimum or maximum value Thus the range is not robust to outliers The standard deviation and the variance are popular measures of spread that are optimal for normally distributed samples The sample variance is the min imum variance unbiased estimator of the normal parameter o2 The standard deviation is the square root of the variance
135. cdf 500 0 15 0 24 probability 0 4865 How sensitive is this result to small changes in the parameters A B meshgrid 0 1 0 05 0 2 0 2 0 05 0 3 probability weibcdf 500 A B probability 0 2929 0 4054 0 5000 0 3768 0 5080 0 6116 0 4754 0 6201 0 7248 weibfit Purpose Syntax Description Example See Also Parameter estimates and confidence intervals for Weibull data phat wei bfit x phat pci wei bfit x phat pci wei bfit x al pha phat wei bfit x returns the maximum likelihood estimates phat of the parameters of the Weibull distribution given the data in the vector x phat is a two element row vector phat 1 estimates the Weibull parameter a and phat 2 estimates b in the pdf b y f xja b abx e lg i phat pci wei bfit x alsoreturns 95 confidence intervals in a matrix pci with 2 rows The first row contains the lower bound of the confidence interval The second row contains the upper bound The columns of pci corre spond to the columns of phat phat pci weibfit x alpha allows control over the confidence interval returned 100 1 a pha r wei brnd 0 5 0 8 100 1 phat pci weibfit r phat 0 4746 0 7832 0 3851 0 6367 0 5641 0 9298 betafit binofit expfit gamfit normfit poissfit unifit 2 235 weibinv Purpose Syntax Description Examples 2 236 Inverse of the Weibull cumulative distribution function X wei
136. ction pdf Y unidpdf X N uni dpdf X N computes the discrete uniform pdf with parameter settings N at the values in X The arguments X and N must be the same size except that a scalar argument functions as a constant matrix of the same size of the other argument The parameter N must be a positive integer The discrete uniform pdf is 1 y f xX N NL piney You can think of y as the probability of observing any one number between 1 and n For fixed n the uniform discrete pdf is a constant y unidpdf 1 6 10 y 0 1000 0 1000 0 1000 0 1000 0 1000 0 1000 Now fixx andvaryn likelihood unidpdf 5 4 9 likelihood 0 0 2000 0 1667 0 1429 0 1250 0 1111 2 223 unidrnd Purpose Syntax Description Examples 2 224 Random numbers from the discrete uniform distribution R unidrnd N R unidrnd N mm R unidrnd N mm nn The discrete uniform distribution arises from experiments equivalent to drawing a number from one toN out of a hat R unidrnd N generates discrete uniform random numbers with maximum N The size of R is the size of N R unidrnd N mm generates discrete uniform random numbers with maximum N mm is a 1 by 2 vector that contains the row and column dimensions of R R unidrnd N mm nn generates discrete uniform random numbers with maximum N The scalars mm and nn are the row and column dimensions of R The parameter N must have positive integer ele
137. cumulative distribution function cdf X norminv P MU SI GMA nor minv P MU SI GMA computes the inverse of the normal cdf with parameters MU and SI GMA at the values in P The arguments P MU and SI GMA must all be thesame size except that scalar arguments function as constant matrices of the common size of the other arguments The parameter S GMA must be positive and P must lie on 0 1 We define the normal inverse function in terms of the normal cdf x F p b o X F x u 0 p i i t p where p F x u o 20 p ju o J2n oe dt Theresult x isthesolution of the integral equation above with the parameters u and o where you supply the desired probability p Find an interval that contains 95 of the values from a standard normal distri bution x norminv 0 025 0 975 0 1 1 9600 1 9600 Note the interval x is not the only such interval but it is the shortest xl norminv 0 01 0 96 0 1 xl 2 3263 1 7507 The interval x also contains 95 of the probability but it is longer than x normpdf Purpose Syntax Description Examples Normal probability density function pdf Y normpdf X MU SI GMA nor mpdf X MU SI GMA computes the normal pdf with parameters mu ands GMA at the values in X The arguments X MU and SI GMA must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The
138. d be an overall difference in mileage due to a difference in the pro duction methods between factories There is probably a difference in the mileage of the different models irrespective of the factory due to differences in design specifications These effects are called additive Finally a factory might make high mileage cars in one mode perhaps because of a superior production line but not be different from the other factory for other models This effect is called an interaction It is impossible to detect an interaction unless there are duplicate observations for some combi nation of fac tory and car model 1 53 1 54 Two way ANOVA is a special case of the linear model The two way ANOVA form of the model is Yijk a 5 B Vij ijk where e Yijk iS a matrix of observations e wis a constant matrix of the overall mean e aj is a matrix whose columns are the group means the rows of a sum to 0 e Bi is a matrix whose rows are the group means the columns of B sum to 0 e yj is a matrix of interactions the rows and columns of y sum to zero e ijk IS a matrix of random disturbances The purpose of the example is to determine the effect of car model and factory on the mileage rating of cars load mileage mi leage mileage 33 3000 34 5000 37 4000 33 4000 34 8000 36 8000 32 9000 33 8000 37 6000 32 6000 33 4000 36 6000 32 5000 33 7000 37 0000 33 0000 33 9000 36 7000 cars 3 p anova2 mileage cars
139. d moving average chart for SPC ewmapl ot data ewmapl ot data lambda ewmapl ot data ambda al pha ewmapl ot data lambda alpha specs h ewmaplot ewmapl ot data produces an EWMA chart of the grouped responses in data Therows of dat a contain replicate observations taken at a given time The rows should be in time order ewmapl ot data lambda produces an EWMA chart of the grouped responses in data and specifes how much the current prediction is influenced by past obser vations Higher values of ambda give more weight to past observations By default ambda 0 4 ambda must be between 0 and 1 ewmapl ot data lambda al pha produces an EWMA chart of the grouped responses in dat a and specifies the significance level of the upper and lower plotted confidence limits al pha is 0 01 by default This means that roughly 99 of the plotted points should fall between the control limits ewmaplot data lambda alpha specs produces an EWMA chart of the grouped responses in dat a and specifies a two element vector specs for the lower and upper specification limits of the response Note h ewmaplot returns a vector of handles to the plotted lines Consider a process with a slowly drifting mean over time An EWMA chart is preferableto an x bar chart for monitoring this kind of process This simulation demonstrates an EWMA chart for a slow linear drift t 1 30 r normrnd 10 0 02 t ones 4 1 0
140. das 1 0 0 ccc cc ee eens 1 3 Nonlinear ModelS 0 00 0 ccc cece ee eee ees 1 3 Hypothesis Tests 0 0 c eee eee 1 3 Multivariate Statistics 2 0 0 eee 1 3 Statistical Plots 0 ce ee ee 1 3 Statistical Process Control SPC 0 0 0 e eee 1 3 Design of Experiments DOE c cca eee ees 1 4 Probability Distributions 2 0 5 1 5 Overview of the Functions 0 0 ccc cee eee 1 6 Probability Density Function pdf 0000 0s 1 6 Cumulative Distribution Function cdf 1 7 Inverse Cumulative Distribution Function 1 7 Random Numbers 0 000 c cece ee ee eee ees 1 9 Mean and Variance 0 cc ees 1 11 Overview of the Distributions 0 0 0 cee eee 1 13 Beta Distribution 0 0 ccc eee eens 1 13 Binomial Distribution 0 cee ees 1 16 Chi square y2 Distribution 0 00 cee 1 18 Noncentral Chi square Distribution 4 1 19 Discrete Uniform Distribution 00 cee eee 1 20 Exponential Distribution 00 e ee eee 1 21 F DISthiDULION rar eeren oA a a aatace Galle ha Goarave Sieg a sae 1 23 Noncentral F Distribution 0 0 00 c cece ees 1 24 Gamma Distribution 0 0 cece eee ees 1 25 Geometric Distribution 0 ccc ees 1 28 Hypergeometric Distribution 000e ee eee 1 29 Lognormal Distribution
141. demo Purpose Syntax Description Example See Also Demo of design of experiments and surface fitting rsmdemo rsmdemo creates a GUI that simulates a chemical reaction Tostart you have a budget of 13 test reactions Try tofind out how changes in each reactant affect the reaction rate Determine the reactant settings that maximize the reaction rate Estimate the run to run variability of the reaction Now run a designed experiment using the model pop up Compare your previous results with the output from response surface modeling or nonlinear modeling of the reaction The GUI has the following elements e A Run button to perform one reactor run at the current settings e An Export button to export the X and y data to the base workspace e Three sliders with associated data entry boxes to control the partial pres sures of the chemical reactants Hydrogen n Pentane and sopentane e A text box to report the reaction rate e A text box to keep track of the number of test reactions you have left See The rsmdemo Demo on page 1 116 rstool nlintool cordexch 2 197 rstool Purpose Syntax Description Example See Also 2 198 Interactive fitting and visualization of a response surface rstool x y rstool x y model rstool x y model alpha xname yname rstool x y displays an interactive prediction plot with 95 global confidence intervales This plot results from a multiple regression
142. distribution 1 13 1 16 i nopdf 2 4 2 33 i nornd 2 6 2 34 i nostat 2 8 2 35 bootstrap 2 36 bootstrap sampling 1 48 box plots 1 88 boxplot 2 10 2 38 O Z gt o io C capability studies 1 98 capable 2 11 2 40 capaplot 2 11 caseread 2 13 2 43 casewrite 2 13 2 44 cdf 1 6 1 7 cdf 2 3 2 45 census 2 14 Central Limit Theorem 1 32 Chatterjee and Hadi example 1 58 chi 2cdf 2 3 2 46 chi 2i nv 2 5 2 47 chi 2pdf 2 4 2 48 chi 2rnd 2 6 2 49 chi 2stat 2 8 2 50 chi square distribution 1 13 1 18 circuit boards 2 33 cities 2 14 classify 2 51 coin 2 86 combnk 2 52 confidence intervals hypothesis tests 1 71 nonlinear regression 1 68 control charts 1 95 EWMA charts 1 97 S charts 1 96 Xbar charts 1 95 cordexch 2 12 2 53 corrcoef 2 54 cov 2 55 Cp index 1 99 2 40 Cpk index 1 99 2 40 crosstab 2 56 cumulative distribution function cdf 1 6 Index 1 2 D data 2 2 daugment 2 12 2 57 dcovary 2 12 2 58 demos 1 109 2 2 design of experiments 1 116 polynomial curve fitting 1 111 probability distributions 1 109 random number generation 1 110 descriptive 2 2 descriptive statistics 1 42 Design of Experiments DOE 1 100 D optimal designs 1 103 fractional factorial designs 1 102 full factorial designs 1 101 Devroye L 2 240 discrete uniform distribution 1 13 1 20 discrim 2 14 distributions 1 2 1 5 disttool 2 13 2 59 DOE 2 2 D optimal designs 1 103 dummyvar 2 60 E erf 1
143. distribution differs from the binomial only in that the pop ulation is finite and the sampling from the population is without replacement The hypergeometric distribution has three parameters that have direct phys ical interpretation M is the size of the population K is the number of items with the desired characteristic in the population n is the number of samples drawn Sampling without replacement means that once a particular sample is chosen it is removed from the relevant population for drawing the next sample Mathematical Definition The hypergeometric pdf is een xA n x a n y f x M K n 1 29 1 30 Example and Plot The plot shows the cdf of an experiment taking 20 samples from a group of 1000 where there are 50 items of the desired type x 0 10 y hygecdf x 1000 50 20 stairs x y 1 T SS T T 0 8 F 0 6 f 0 4 F 0 2 i 0 2 4 6 8 10 Lognormal Distribution Background Thenormal and lognormal distributions are closely related If X is distributed lognormal with parameters u and o then InX is distributed normal with parameters u and o The lognormal distribution is applicable when the quantity of interest must be positive since InX exists only when the random variable X is positive E cono mists often model the distribution of income using a lognormal distribution Mathematical Definition The lognormal pdf is Inx w 20 y f x e H o
144. drogen as the independent variable The second and third plots have n pentane and isopentane respectively Each plot shows the fitted relationship of the reaction rate to the independent variable at a fixed value of the other two independent variables The fixed value of each independent variable is in an editable text box below each axis You can change the fixed value of any independent variable by either typing a new valuein the box or by dragging any of the 3 vertical lines toa new position When you change the value of an independent variable all the plots update to show the current picture at the new point in the space of the independent vari ables Note that while this example only uses three reactants nl i nt ool can accom modate an arbitrary number of independent variables nterpretability may be limited by the size of the monitor for large numbers of inputs Hypothesis Tests A hypothesis test is a procedure for determining if an assertion about a char acteristic of a population is reasonable For example suppose that someone says that the average price of a gallon of regular unleaded gas in Massachusetts is 1 15 How would you decide whether this statement is true Y ou could try tofind out what every gas station in the state was charging and how many gallons they were selling at that price That approach might be definitive but it could end up costing more than the information is worth A simpler approach is tofind
145. dying the effects of two machines and three operators on a process The first column of gr oup would have the values one or two depending on which machine was used The second column of group would havethe values one two or three depending on which operator ran the machine group 1 1 1 2 1 3 2 1 2 2 2 3 D dummyvar group D SCO OF re PrPrF OOO coorROOF orooro oorno oo pinv regress errorbar Purpose Syntax Description Example See Also Plot error bars along a curve errorbar X Y L U symbol errorbar X Y L errorbar Y L errorbar X Y L U symbol plotsX versusY with error bars specified by L and U X Y L andU must bethesamelength IfX Y L andU are matrices then each column produces a separate line The error bars are each drawn a distance of U i aboveandL i below the pointsin xX Y symbol isa string that controls the line type plotting symbol and color of the error bars errorbar X Y L plots X versus Y with symmetric error bars about Y errorbar Y L plotsY with error bars Y L Y L lambda 0 1 0 2 0 5 r poissrnd ambda ones 50 1 p pci poissfit r 0 001 L p pci 1 U pci 2 p errorbar 1 3 p L U jl z 0 1200 0 1600 0 2600 i 0 2000 0 2200 0 3400 0 8 r 0 6 F 0 47 0 27 0 i 0 5 1 1 5 2 2 5 3 3 5 errorbar isa function in MATLAB 2 61 ew maplot Purpose Syntax Description Example 2 62 Exponentially weighte
146. e reads the contents of filename and returns a string matrix of names fil ename isthe name of a file in the current directory or the complete pathname of any file elsewhere cas er ead treats each lineas a separate case names caseread displays the File Open dialog box for interactive selection of the input file Usethefilemonths dat created using the function cas ewr ite on the next page type months dat January February March April May names caseread months dat names January February March April May tblread gname casewrite 2 43 casew rite Purpose Syntax Description Example See Also 2 44 Write casenames from a string matrix toa file casewrite strmat filename casewrite strmat casewrite strmat filename writes the contents ofstrmat tofilename Each row of str mat represents one casename f i ename isthe name of a filein the current directory or the complete pathname of any file elsewhere casewrite writes each name toa separate linein filename casewrite strmat displays the File Open dialog box for interactive specifica tion of the output file str mat str mat January Februar March Apri May y str2mat January February March April May casewrite strmat months dat type months dat January Februar March Apri May y gname caseread tbl write cdf Purpose Syntax Description Examples See Also Computes a ch
147. e distribution function One formulation uses a modified Bessel func tion of the first kind Another uses the generalized Laguerre polynomials The Statistics Toolbox computes the cumulative distribution function values using a weighted sum of x probabilities with the weights equal to the probabilities of a Poisson distribution The Poisson parameter is one half of the noncen trality parameter of the noncentral chi square 3 G 3 2 v e Pr Fao ji x 2 F x v 2j lt x Example and Plot x 0 0 1 10 pl ncx2pdf x 4 2 p chi2pdf x 4 plot x p x pl 0 2 oO 0 15 7 0 1 0 05 f Discrete Uniform Distribution Background The discrete uniform distribution is a simple distribution that puts equal weight on the integers from one to N Mathematical Definition The discrete uniform pdf is 1 y f xX N nla Example and Plot Asfor all discrete distributions the cdf is a step function The plot shows the discrete uniform cdf for N 10 x 0 10 y unidcdf x 10 stairs x y set gca Xlim 0 11 1 0 8 0 6 F 0 4 ah inc SiS E EE E 2 4 6 8 10 To pick a random sample of 10 from a list of 553 items numbers unidrnd 553 1 10 numbers 293 372 5 213 37 231 380 326 515 468 Exponential Distribution Background Like the chi square the exponential distribution is a special case of the gamm
148. e interpolating contour plot Weibull plotting Statistical Process Control capable Quality capability indices capapl ot Plot of process capability ewmapl ot Exponentially weighted moving average plot histfit Histogram and normal density curve norms pec Plots normal density between limits schart Time plot of standard deviation xbarplot Time plot of means Linear Models anoval One way Analysis of Variance ANOVA anova2 Two way Analysis of Variance Iscov Regression given a covariance matrix in MATLAB pol yconf Polynomial prediction with confidence intervals polyfit Polynomial fitting in MATLAB pol yval Polynomial prediction in MATLAB regress Multiple linear regression ridge Ridge regression rstool Response surface tool stepwise Stepwise regression GUI 2 11 2 12 Nonlinear Regression nlinfit Nonlinear least squares fitting nlintool Prediction graph for nonlinear fits nl parci Confidence intervals on parameters nl predci Confidence intervals for prediction nnls Non negative Least Squares in MATLAB Design of Ex periments cordexch D optimal design using coordinate exchange daugment D optimal augmentation of designs dcovary D optimal design with fixed covariates ff2n Two level full factorial designs fullfact Mixed level full factorial designs hadamard Hadamard designs in MATLAB rowexch D optimal design using row exchange Principal Compo
149. e other argument The parameter must be positive The Poisson pdf is ASN y f x xe lco 1 x can be any non negative integer The density function is zero unless x is an integer A computer hard disk manufacturer has observed that flaws occur randomly in the manufacturing process at the average rate of two flaws in a 4 Gb hard disk and has found this rateto be acceptable What is the probability that a disk will be manufactured with no defects In this problem 2 and x 0 p poisspdf 0 2 p 0 1353 poissrnd Purpose Syntax Description Examples Random numbers from the Poisson distribution R poissrnd LAMBDA R poissrnd LAMBDA m R poissrnd LAMBDA m n R poissrnd LAMBDA generates Poisson random numbers with mean LAMBDA The size of R is the size of LAMBDA R poissrnd LAMBDA m generates Poisson random numbers with mean LAMBDA mis a 1 by 2 vector that contains the row and column dimensions of R R poissrnd LAMBDA m n generates Poisson random numbers with mean LAMBDA The scalars m andn are the row and column dimensions of R Generate a random sample of 10 pseudo observations from a Poisson distribu tion with 2 lambda 2 random samplel poissrnd lambda 1 10 random samplel 1 0 1 2 1 3 4 2 0 0 random sample2 poissrnd lambda 1 10 random sample2 1 1 1 5 0 3 2 2 3 4 random sample3 poissrnd lambda ones 1 10 random sample3 3 2
150. e size of R is the size of the other parameter R frnd V1 V2 m generates random numbers from the F distribution with parameters V1 andV2 misa 1 by 2 vector that contains the row and column dimensions of R R frnd V1 V2 m n generates random numbers fromthe F distribution with parameters V1 and Vv2 Thescalarsmandn aretherow and column dimensions of R nl frnd 1 6 1 6 nl 0 0022 0 3121 3 0528 0 3189 0 2715 0 9539 n2 frnd 2 2 2 3 n2 0 3186 0 9727 3 0268 0 2052 148 5816 0 2191 n3 frnd 1 2 3 4 5 6 1 2 3 n3 0 6233 0 2322 31 5458 2 5848 0 2121 4 4955 fstat Purpose Syntax Description Examples Mean and variance for the F distribution M V fstat V1 V2 For the F distribution e The mean for values of nz greater than 2 is 2 V gt 2 e The variance for values of n greater than 4 is 2 2v gt V1 V2 2 2 Vi Vz 2 V2 4 The mean of the F distribution is undefined if v2 is less than 3 The variance is undefined for v2 less than 5 fstat returnsNaN when the mean and variance are undefined mv fstat 1 5 1 5 m NaN NaN 3 0000 2 0000 1 6667 NaN NaN NaN NaN 8 8889 2 75 fsurfht Purpose Syntax Description Example 2 76 Interactive contour plot of a function fsurfht fun xlims ylims fsurfht fun xlims ylims pl p2 p3 p4 p5 fsurfht fun xlims ylims is an interactive contour plot of the function specified by the text variab
151. eage were truly equal from factory to factory There does not appear to be any interaction between factories and models The p value 0 8411 means that the observed result is quite likely 84 out 100 times given that there is no interaction The p values returned by anova2 depend on assumptions about the random disturbances in the model equation F or the p values to be correct these distur bances need to be independent normally distributed and have constant vari ance 1 55 1 56 Multiple Linear Regression The purpose of multiple linear regression is to establish a quantitative rela tionship between a group of predictor variables the columns of X and a response y This relationship is useful for e Understanding which predictors have the most effect e Knowing the direction of the effect i e increasing x increases decreases y e Using the model to predict future values of the response when only the pre dictors are currently known The linear model takes its common form y XB e e yisann by 1 vector of observations e X is ann by p matrix of regressors e Bis ap by 1 vector of parameters e gisan n by 1 vector of random disturbances The solution to the problem is a vector b which estimates the unknown vector of parameters B The least squares solution is b 6 X X X y This equation is useful for developing later statistical formulas but has poor numeric properties regress uses QR decomposition of
152. ecdf X M K N hygecdf X M K N computes the hypergeometric cdf with parameters M K and N at the values in X The arguments X M K and N must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The hypergeometric cdf is K yM K N i p F x M K N L m i N The result p is the probability of drawing up to x items of a possible K in N drawings without replacement from a group of M objects Suppose you have a lot of 100 floppy disks and you know that 20 of them are defective What is the probability of drawing zero to two defective floppies if you select 10 at random p hygecdf 2 100 20 10 p 0 6812 hygeinv Purpose Syntax Description Examples Inverse of the hypergeometric cumulative distribution function cdf X hygeinv P M K N hygeinv P M K N returnsthesmallest integer X such that the hypergeometric cdf evaluated at X equals or exceeds P You can think of P as the probability of observing X defective items in N drawings without replacement from a group of M items wherek are defective Suppose you are the Quality Assurance manager of a floppy disk manufac turer The production line turns out floppy disks in batches of 1 000 You want to sample 50 disks from each batch to see if they have defects You want to accept 99 of the batches if there are no more than 10 defective disks in the batch What is the maximum
153. ed covariates settings dcovary factors covariates settings X dcovary factors covariates model settings dcovary factors covariates model creates a D Optimal design subject to the constraint of fixedcovariates foreachrun factors is the number of experimental variables you wish to control settings X dcovary factors covariates model alSocreates the associated design matrix X The input model controls the order of the regression model By default dcovar y assumes a linear additive model Alter natively model can be any of these e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms Suppose we wish to block an 8 run experiment into 4 blocks of size 2 tofit a linear model on two factors covariates dummyvar 1 12 2 3 3 4 4 settings dcovary 2 covariates 1 3 linear settings H H ao 2 2 O O 2 oo CO CO rr Oo amp oCoOoOrRrFOO Oe The first two columns of the output matrix contain the settings for the two factors The last 3 columns are dummy variable codings for the 4 blocks daugment cordexch disttool Purpose Syntax Description See Also Interactive graph of cdf or pdf for many probability distributions disttool The disttool command sets up a graphic user interface for exploring the effects of changing parameters on
154. ee 1 2 3 4 5 6 7 8 The Bootstrap In the last decade the statistical literature has examined the properties of res ampling as a means to acquire information about the uncertainty of statistical estimators The bootstrap is a procedure that involves choosing random samples with replacement from a data set and analyzing each sample the same way Sam pling with replacement means that every sample is returned to the data set after sampling So a particular data point from the original data set could appear multipletimes in a given bootstrap sample The number of elements in each bootstrap sample equals the number of elements in the original data set The range of sample estimates we obtain allows us to establish the uncertainty of the quantity we are estimating Here is an example taken from Efron and Tibshirani 1993 comparing LSAT scores and subsequent law school GPA for a sample of 15 law schools load awdata plot Isat gpa lsline 3 6 3 4 ee 540 560 580 600 620 640 660 680 The least squares fit line indicates that higher LSAT scores go with higher law school GPAs But how sure are we of this conclusion The plot gives us some intuition but nothing quantitative We can calculate the correlation coefficient of the variables using the corrcoef function rhohat corrcoef Isat gpa rhohat 1 0000 0 7764 0 7764 1 0000 Now we havea number 0 7764 describing
155. emo Reaction kinetics data ASCII data for tbl read example anoval Purpose Syntax Description One way Analysis of Variance ANOVA p anoval X p anoval x group anoval X performs a balanced one way ANOVA for comparing the means of two or more columns of data on the sample in X It returns the p value for the null hypothesis that the means of the columns of X are equal If the p value is near zero this casts doubt on the null hypothesis and suggests that the means of the columns are in fact different anoval x group performs a one way ANOVA for comparing the means of two or more samples of data in x indexed by the vector group The input group identifies the group of the corresponding element of the vector x The values of gr oup areintegers with minimum equal to one and maximum equal to the number of different groups to compare There must be at least one element in each group This two input form of anoval does not require equal numbers of elements in each group so it is appropriate for unbalanced data The choice of a limit for the p value to determine whether the result is statis tically significant is left to the researcher It is common to declare a result significant if the p value is less than 0 05 or 0 01 anoval alsodisplays two figures The first figure is the standard ANOVA table which divides the variability of the data in X into two parts e The variability due to the differe
156. ence interval lines to toggle the state of the model coeffi cients If the confidence interval line is green the term is in the model If the the confidence interval line is red the term is not in the model Use the pop up menu Export to move variables to the base workspace See Stepwise Regression on page 1 61 Draper Norman and Smith Harry Applied Regression Analysis Second Edition J ohn Wiley amp Sons Inc 1981 pp 307 312 regstats regress rstool 2 205 surfht Purpose Syntax Description 2 206 Interactive contour plot surf ht Z surf ht x y Z surf ht Z isan interactive contour plot of the matrix Z treating the values in Z as height above the plane The x values are the column indices of Z while the y values are the row indices of Z surf ht x y Z wherex andy are vectors specify the x and y axes on the contour plot The length of x must match the number of columns in z and the length of y must match the number of rows in Z There are vertical and horizontal reference lines on the plot whose intersection defines the current x value and y value You can drag these dotted white refer ence lines and watch the interpolated z value at the top of the plot update simultaneously Alternatively you can get a specific interpolated z value by typing the x value and y value into editable text fields on the x axis and y axis respectively tabulate Purpose Syntax Description Example See A
157. ent of H The left hand side of the second equation is the estimate of the variance of the errors excluding the ith data point from the calculation A hypothesis test for outliers involves comparing t with the critical values of thet distribution If t is large this casts doubt on the assumption that this residual has the same variance as the others A confidence interval for the mean of each error is 1 37 Confidence intervals that do not include zero are equivalent to rejecting the hypothesis at a significance probability of that the residual mean is zero 1 57 1 58 Such confidence intervals are good evidence that the observation is an outlier for the given model Example The example comes from Chatterjee and Hadi 1986 in a paper on regression diagnostics The dataset originally from Moore 1975 has five predictor vari ables and one response load moore X ones size moore 1 1 moore 1 5 The matrix X has a column of ones then one column of values for each of the five predictor variables The column of ones is necessary for estimating the y intercept of the linear model y moore 6 b bint r rint stats regress y X The y intercept is b 1 which corresponds to the column index of the column of ones stats stats 0 8107 11 9886 0 0001 The elements of the vector st ats are the regression R statistic the F statistic for the hypothesis test that all the regression coeff
158. ents over time with statistical limits applied Actually control chart is a slight misnomer The chart itself is actually a monitoring tool The control activity may occur if the chart indicates that the process is changing in an undesirable systematic direction The Statistics Toolbox supports three common control charts e Xbar charts e S charts e Exponentially weighted moving average EWMA charts Xbar Charts Xbar charts are a plot of the average of a sample of a process taken at regular intervals Suppose we are manufacturing pistons to a tolerance of 0 5 thou 1 95 sandths of an inch We measure the runout deviation from circularity in thou sandths of an inch at 4 points on each piston load parts conf 0 99 spec 0 5 0 5 xbarplot runout conf spec Xbar Chart 0 6 r USL 0 4 F J g S 02f 1 45 UCL o al awe LCL 0 4 J LSL 0 10 20 30 40 Samples The lines at the bottom and the top of the plot show the process specifications The central line is the average runout over all the pistons The two lines flanking the center line are the 99 statistical control limits By chance only one measurement in 100 should fall outside these lines We can see that even in this small run of 36 parts there are several points outside the boundaries labeled by their observation numbers This is an indication that the process mean is not in statistical control This might not be of much concern in
159. equation of the beta cdf with param eters a and b where you supply the desired probability p We use N ewton s Method with modifications to constrain steps to the allowable range for x i e 0 1 p 0 01 0 5 0 99 x betainv p 10 5 0 3726 0 6742 0 8981 2 25 betalike Purpose Syntax Description Example See Also 2 26 Negative beta log likelihood function logL betalike params data logL info betalike params data logL betalike params data returns the negative of the beta log likelihood function for the two beta parameters params given the column vector data The length of ogL isthelength of data logL info betalike params data also returns Fisher s information matrix i nf o The diagonal elements of i nf o are the asymptotic variances of their respective parameters betalike isa utility function for maximum likelihood estimation of the beta distribution The likelihood assumes that all the elements in the data sample are mutually independent Since bet al i ke returns the negative gamma log likelihood function minimizing bet al ike usingf mi ns is the same as maxi mizing the likelihood This continues the example for bet afi t where we calculated estimates of the beta parameters for some randomly generated beta distributed data r betarnd 4 3 100 1 logl info betalike 3 9010 2 6193 r logl 33 0514 info 0 2856 0 1528 0 1528 0 1142 betafit fmins
160. ere are several functions in the Statistics Toolbox that generate D optimal designs These arecordexch daugment dcovary androwexch Generating D optimal Designs cordexch androwexch are two competing optimization algorithms for com puting a D optimal design given a model specification Both cordexch androwexch are iterative algorithms They operate by improving a starting design by making incremental changes to its elements In the coordinate exchangealgorithm theincrements arethe individual elements of the design matrix In row exchange the elements are the rows of the design matrix Atkinson and Donev 1992 is a reference To generate a D optimal design you must specify the number of inputs the number of runs and the order of the model you wish to fit Both cordexch androwexch take the following strings to specify the model e inear l the default model with constant and first order terms e interaction i includes constant linear and cross product terms e quadratic g interactions plus squared terms e nurequadratic p includes constant linear and squared terms Alternatively you can use a matrix of integers to specify the terms Details are in the help for the utility function x2f x For a simple example using the coordinate exchange algorithm consider the problem of quadratic modeling with two inputs The model form is 2 2 2 y Bo Bixi Bzx2 B 2X1X2 BrixXa Bz2X2 1 103
161. ervals del ta on the nonlinear least squares predictions pred The confidence interval calculation is valid for systems where the length of r exceeds the length of beta andj is of full column rank nl predci uses the outputs ofnl infit for its inputs Continuing the example from nl i nfit load reaction beta resids nlinfit reactants rate hougen beta ci nlpredci hougen reactants beta resids ci 2937 8584 7950 0729 5687 2227 4393 9360 9440 2670 1437 3484 3145 Wr a O a WN FN CO F amp F W CO nlinfit nlintool nlparci 2 151 normcdf Purpose Syntax Description Examples 2 152 Normal cumulative distribution function cdf P normcdf X MU SI GMA normcdf X MU SI GMA computes the normal cdf with parameters MU andS GMA at the values in X The arguments X MU and SI GMA must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameter S GMA must be positive The normal cdf is t u u 0 S i a oJ2n oo dt The result p is the probability that a single observation from a normal distri bution with parameters u and will fall in the interval o x p F x The standard normal distribution has u O and o 1 What is the probability that an observation from a standard normal distribu tion will fall on the interval 1 1 p normcdf 1 1 p 2 p t
162. ess of the elements of x For matrices skewness X iS a row vector containing the sample skewness of each column Skewness is a measure of the asymmetry of the data around the sample mean If skewness is negative the data are spread out more to the left of the mean than to the right If skewness is positive the data are spread out more to the right The skewness of the normal distribution or any perfectly symmetric distribution is zero The skewness of a distribution is defined as 3 E x y Ow lo where E x is the expected value of x X randn 5 4 X 1 1650 1 6961 1 4462 0 3600 0 6268 0 0591 0 7012 0 1356 0 0751 La T971 1 2460 1 3493 0 3516 0 2641 0 6390 1 2704 0 6965 0 8717 0 5774 0 9846 y skewness xX 0 2933 0 0482 0 2735 0 4641 kurtosis mean moment std var 2 203 std Purpose Syntax Description Examples See Also 2 204 Standard deviation of a sample y std X std X computes the sample standard deviation of the data in X For vectors std x isthe standard deviation of the elements in x For matrices st d X is a row vector containing the standard deviation of each column of X std X normalizes by n 1 wheren is the sequence length F or normally distrib uted data the square of the standard deviation is the minimum variance unbi ased estimator of o 2 the second parameter The standard deviation is De 5 lt 2 2 i l where the sample average is X SI
163. ew such driving forces But an abundance of instrumen tation allows us to measure dozens of system variables When this happens we can take advantage of this redundancy of information We can simplify our problem by replacing a group of variables with a single new variable Principal Components Analysis is a quantitatively rigorous method for achieving this simplification The method generates a new set of variables called principal components Each principal component is a linear combination of the original variables All the principal components are orthogonal to each other so there is no redundant information The principal components as a whole form an orthogonal basis for the space of the data There are an infinite number of ways to construct an orthogonal basis for sev eral columns of data What is so special about the principal component basis The first principal component is a single axis in space When you project each observation on that axis the resulting values form a new variable And the variance of this variable is the maximum among all possible choices of the first axis The second principal component is another axis in space perpendicular to the first Projecting the observations on this axis generates another new variable The variance of this variable is the maximum among all possible choices of this second axis 1 77 1 78 The full set of principal components is as large as the original set of variables But
164. f Purpose Syntax Description Examples Poisson cumulative distribution function cdf P poisscdf X LAMBDA poisscdf X LAMBDA computes the Poisson cdf with parameter settings LAMBDA at the values in X The arguments X and LAMBDA must be the same size except that a scalar argument functions as a constant matrix of the same size of the other argument The parameter LAMBDA is positive The Poisson cdf is floor x j p F xjy e A i 0 I Quality Assurance performs randomtests of individual hard disks Their policy is toshut down the manufacturing process if an inspector finds more than four bad sectors on a disk What is the probability of shutting down the process if the mean number of bad sectors A is two probability 1 poisscdf 4 2 probability 0 0527 About 5 of the time a normally functioning manufacturing process will produce more than four flaws on a hard disk Suppose the average number of flaws A increases to four What is the proba bility of finding fewer than five flaws on a hard drive probability poisscdf 4 4 probability 0 6288 This means that this faulty manufacturing process continues to operate after this first inspection almost 63 of the time 2 165 poissfit Purpose Syntax Description Example See Also 2 166 Parameter estimates and confidence intervals for Poisson data lambdahat poissfit X lambdahat ambdaci poissfit X lambdah
165. filer eaction mat contains simulated data from this reaction load reaction who Your variables are beta rate Xn model reactants yn The Variables e rate is a vector of observed reaction rates 13 by 1 e reactants is athree column matrix of reactants 13 by 3 e beta is vector of initial parameter estimates 5 by 1 e model isa string containing the nonlinear function name e xn isa string matrix of the names of the reactants e yn isa string containing the name of the response Fitting the Hougen Watson Model The Statistics Toolbox provides the function nl i nf it for finding parameter estimates in nonlinear modeling n infit returnstheleast squares parameter estimates That is it finds the parameters that minimize the sum of the squared differences between the observed responses and their fitted values It uses the Gauss N ewton algorithm with Levenberg M arquardt modifications for global convergence nlinfit requires the input data the responses and an initial guess of the unknown parameters You must also supply a function that takes the input data and the current parameter estimate and returns the predicted responses In MATLAB this is called a function function Hereis the hougen function function yhat hougen beta x HOUGEN Hougen Watson model for reaction kinetics YHAT HOUGEN BETA X gives the predicted values of the reaction rate YHAT as a function of the vector of parameters BE
166. fit gaminv Purpose Syntax Description Algorithm Examples Inverse of the gamma cumulative distribution function cdf X gaminv P A B gaminv P A B Computes the inverse of the gamma cdf with parameters A and B for the probabilities in P The argumentsP A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameters A and8 must both be positive and P must lie on the interval 0 1 The gamma inverse function in terms of the gamma cdf is X F pja b x F x a b p t where p F x a b A 1 Dte tehat bT a 0 Thereis noknown analytic solution totheintegral equation above ga mi nv uses an iterative approach N ewton s method to converge to the solution This example shows the relationship between the gamma cdf and its inverse function a 1 5 b 6 10 x gaminv gamcdf 1 5 a b a b 1 0000 2 0000 3 0000 4 0000 5 0000 2 81 gamlike Purpose Syntax Description Example See Also 2 82 Negative gamma log likelihood function ogL gamlike params data logl info gamlike params data ogl gamlike params data returns the negative of the gamma log likeli hood function for the parameters params given dat a The length of the vector ogL is the length of the vector data loglL info gamlike params data adds Fisher s information matrix i nfo The diagonal eleme
167. gned to make this visualization more intuitive 1 59 1 60 An Interactive GUI for Response Surface Fitting and Prediction The functionrstool is useful for fitting response surface models The purpose ofrstool islarger than just fitting and prediction for polynomial models This GUI provides an enviroment for exploration of the graph of a multidimensional polynomial You can learn about rst ool by trying the commands below The chemistry behind the data inreaction mat deals with reaction kinetics as a function of the partial pressure of three chemical reactants hydrogen n pentane and iso pentane load reaction rstool reactants rate quadratic 0 01 xn yn You will see a vector of three plots The dependent variable of all three plots is the reaction rate The first plot has hydrogen as the independent variable The second and third plots have n pentane and isopentane respectively Each plot shows the fitted relationship of the reaction rate to the independent variable at a fixed value of the other two independent variables The fixed value of each independent variable is in an editable text box below each axis You can change the fixed value of any independent variable by either typing a new valuein the box or by dragging any of the 3 vertical lines toa new position When you change the value of an independent variable all the plots update to show the current picture at the new point in the space of the independent vari
168. hat a single observation from an F distribution with parameters v1 and v2 will fall in the interval 0 x This exampleillustrates an important and useful mathematical identity for the F distribution nul 1 5 nu2 6 10 X 2 6 Fl fcdf x nul nu2 Fis 0 7930 0 8854 0 9481 0 9788 0 9919 F2 1 fcdf 1 x nu2 nul 0 7930 0 8854 0 9481 0 9788 0 9919 ff2n Purpose Syntax Description Example See Also Two level full factorial designs X ff2n n X ff2n n creates a two level full factorial design X n is the number of columns of X The number of rows is 2 gt lt 1 ff 2n 3 ee eS ae Ss SS SS FH oe Sa EY Ee eS SS oOo X is the binary representation of the numbers from 0 to 2 1 fullfact 2 71 finv Purpose Syntax Description Examples 2 72 Inverse of the F cumulative distribution function cdf X finv P V1 V2 finv P V1 V2 computes the inverse of the F cdf with numerator degrees of freedom V1 and denominator degrees of freedom V2 for the probabilities in P The arguments P V1 andV2 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments The parameters V1 and V2 must both be positive integers and P must lie on the interval 0 1 The F inverse function is defined in terms of the F cdf x F p vz V gt GF x v V9 p atv yy we where p F X vj v gt a a
169. he NaN functions support the tabled arithmetic operations ignoring NaN nansum m ans NaN functions nanmax Maximum ignoring NaNs nanmean Mean ignoring NaNs nanmedi an Median ignoring NaNs nanmin Minimum ignoring NaNs nanstd Standard deviation ignoring NaNs nansum Sum ignoring NaNs Percentiles and Graphical Descriptions Trying to describe a data sample with two numbers a measure of location and a measure of spread is frugal but may be misleading Another option is to compute a reasonable number of the sample percentiles This provides information about the shape of the data as well as its location and spread The example shows the result of looking at every quartile of a sample con taining a mixture of two distributions normrnd 4 1 1 100 normrnd 6 0 5 1 200 100 0 0 25 1 prctile x p p yi X p y Z 0 25 0000 50 0000 75 0000 100 0000 1 5172 4 6842 5 6706 6 1804 7 6035 Compare the first two quantiles to the rest 1 47 The box plot is a graph for descriptive statistics The graph below is a box plot of the data above boxpl ot x Values wo woh ODN Column Number The long lower tail and plus signs show the lack of symmetry in the sample values For more information on box plots see page 1 88 The histogram is a complementary graph hist x 100 r r r r l i 80 F 60 F 40 F ol
170. he matrix training sample andtrai ning must have the same number of columns trai ning and group must have the same number of rows class isa vector with the same number of rows aSsample load discrim sample ratings idx training ratings 1 200 g group 1 200 class classify sample training g first5 class 1 5 first5 2 2 2 2 2 mahal 2 51 combnk Purpose Syntax Description Example 2 52 Enumeration of all combinations of n objects k at a time C combnk v k C combnk v k returns all combinations of then elements inv taken k ata time C combnk v k produces a matrix with k columns Each row of C hask of the elements in the vector v has n k n k rows It is not feasible to use this function ifv has more than about 10 elements Combinations of characters from a string C combnk bradley 4 last5 c 31 35 last5 brdl bray brae bral brad Combinations of elements from a numeric vector c combnk 1 4 2 RR rR NR NY Ww Rw fw Lf SE cordexch Purpose Syntax Description Example See Also D Optimal design of experiments coordinate exchange algorithm settings cordexch nfactors nruns settings X cordexch nfactors nruns settings X cordexch nfactors nruns model settings cordexch nfactors nruns generates the factor settings matrix settings foraD Optimal design using a linear additive model with a co
171. he sample The dashed line extends the solid line to the ends of the sample The scale of the y axis is not uniform The y axis values are probabilities and as such go from zero to one The distance between the tick marks on the y axis matches the distance between the quantiles of a normal distribution The quantiles are close together near the median probability 0 5 and stretch out symmetrically moving away from the median Compare the vertical distance from the bottom of the plot to the probability 0 25 with the distance from 0 25 to 0 50 Similarly compare the distance from the top of the plot to the proba bility 0 75 with the distance from 0 75 to 0 50 If all the data points fall near the line the assumption of normality is reason able But if the data is nonnormal the plus signs may follow a curve asin the example using exponential data below x exprnd 10 100 1 normpl ot x Normal Probability Plot 0 997 FT T T T T T a j T T T ri O EE EIE SEE ye tees 0 98 t i 4 0 95 i e a 0 90 f 0 75 feta Slee Ste aa E A a eel ase E Sitio be i Gi Se oe Rae ht Sieg E E E E EE a Probability oO a oO 0 25 0 10 0 05 0 02 0 01 0 003 Lt EEEE SENTE MEET Pa 6 asa petals aes REER EREE fi Sle see REEN AA ssh ithe 0 5 10 15 20 25 30 35 40 45 Data This plot is clear evidence that the underlying distribution is not normal Quantile Quan
172. i2rnd Chi square random numbers exprnd Exponential random numbers frnd F random numbers gamr nd Gamma random numbers Random Number Generators geornd hygernd lognrnd nbinrnd ncfrnd nctrnd ncx2rnd normrnd poissrnd raylrnd random trnd uni drnd unifrnd wei brnd Geometric random numbers H ypergeometric random numbers Lognormal random numbers Negative binomial random numbers Noncentral F random numbers Noncentral t random numbers Noncentral Chi square random numbers Normal Gaussian random numbers Poisson random numbers Rayleigh random numbers Parameterized random number routine Student s t random numbers Discrete uniform random numbers Continuous uniform random numbers Weibull random numbers Moments of Distribution Functions betastat binostat chi2stat expstat fstat gamstat geostat hygestat lognstat nbinstat ncfstat netstat ncx2stat normst at poisstat rayl stat tstat unidstat unifstat wei bstat Beta mean and variance Binomial mean and variance Chi square mean and variance Exponential mean and variance F mean and variance Gamma mean and variance Geometric mean and variance H ypergeometric mean and variance Lognormal mean and variance Negative binomial mean and variance Noncentral F mean and variance Noncentral t mean and variance Noncentral Chi square mean and variance Normal Gaussian mean and variance Poisson mean and variance
173. ibutions continuous and discrete Continuous data Continuous statistics Discrete Beta Chi square Binomial Exponential Noncentral Chi square Discrete Uniform Gamma F Geometric Lognormal Noncentral F H ypergeometric Normal t Negative Binomial Rayleigh Noncentral t Poisson Uniform Weibull Suppose you are studying a machine that produces videotape One measure of the quality of the tape is the number of visual defects per hundred feet of tape The result of this experiment is an integer since you cannot observe 1 5 defects To model this experiment you should use a discrete probability distri bution A measure affecting the cost and quality of videotape is its thickness Thick tape is more expensive to produce while variation in the thickness of the tape on the reel increases the likelihood of breakage Suppose you measure the thickness of the tape every 1000 feet The resulting numbers can take a con tinuum of possible values which suggests using a continuous probability dis tribution to model the results Using a probability model does not allow you to predict the result of any indi vidual experiment but you can determine the probability that a given outcome will fall inside a specific range of values 1 5 Overview of the Functions MATLAB provides five functions for each distribution e Probability density function pdf e Cumulative distribution function cdf e Inverse cumulative distribution function e
174. ic distribution R geornd P R geornd P m R geornd P m n The geometric distribution is useful when you wish to model the number of failed trials in a row before a success where the probability of success in any given trial is the constant P R geornd P generates geometric random numbers with probability param eter P The size of R is the size of P R geornd P m generates geometric random numbers with probability parameter P m is a1 by 2 vector that contains the row and column dimensions of R R geornd P m n generates geometric random numbers with probability parameter P The scalars m and n are the row and column dimensions of R The parameter P must lie on the interval 0 1 rl geornd 1 2 1 6 rl 2 10 2 5 2 60 r2 geornd 0 01 1 5 65 18 334 291 63 r3 geornd 0 5 1 6 geostat Purpose Syntax Description Examples Mean and variance for the geometric distribution M V geostat P For the geometric distribution e The mean is gt e The varianceis A p where q 1 p mv geostat 1l 1 6 m 0 1 0000 2 0000 3 0000 4 0000 0 2 0000 6 0000 12 0000 20 0000 5 0000 30 0000 2 91 Purpose Syntax Description See Also 2 92 Interactively draw a linein a figure gline fig h gline fig gline gline fig draws a line segment by clicking the mouse at the two end points of the line segment in the figure f i g A
175. icients are zero and the p value associated with this F statistic R is 0 8107 indicating the model accounts for over 80 of the variability in the observations The F statistic of about 12 and its p value of 0 0001 indicate that it is highly unlikely that all of the regression coefficients are zero rcoplot r rint N E p 3 D or OPS T 3 D 0 5 1 0 5 10 15 20 Case Number The plot shows the residuals plotted in case order by row The 95 confidence intervals about these residuals are plotted as error bars The first observation is an outlier since its error bar does not cross the zero reference line Quadratic Response Surface Models Response Surface Methodology RSM is a tool for understanding the quantita tive relationship between multiple input variables and one output variable Consider one output z as a polynomial function of two inputs x and y z f x y describes a two dimensional surfacein the space x y z Of course you can have as many input variables as you want and the resulting surface becomes a hyper surface For three inputs x X X the equation of a quadratic response surface is y by b X X b3x linear terms 1D 72X X2 17 3X X3 23XX3 interaction terms b X 7 b 9x7 b33x 7 quadratic terms It is difficult to visualize a k dimensional surface in k 1 dimensional space when k gt 2 The function r stool isa GUI desi
176. ictions and 90 confidence intervals for computing time for LU factorizations of square matrices with 100 to 200 columns The hardware was a Power Macintosh 7100 80 n 100 100 20 200 for i n A rand i i tig B lu A t ceil i 80 20 toc end p polyfit n 2 7 t 3 time delta_t polyconf p n 2 7 0 1 time 0 0829 0 1476 0 2277 0 3375 0 4912 0 7032 delta t 0 0064 0 0057 0 0055 0 0055 0 0057 0 0064 2 171 polyfit Purpose Syntax Description Example See Also 2 172 Polynomial curve fitting p S polyfit x y n p polyfit x y n finds the coefficients of a polynomial p x of degreen that fits the data p x i toy i in aleast squares sense The result p is a row vector of length n 1 containing the polynomial coefficients in descending powers p x px pox oe Dak Pn41 p S polyfit x y n returns polynomial coefficients p and matrix S for usewithpol yval to produce error estimates on predictions If the errors inthe data y are independent normal with constant variance pol yval will produce error bounds which contain at least 50 of the predictions You may omits if you are not going to pass it topol yval orpolyconf for calcu lating error estimates p polyfit 1 10 1 10 normrnd 0 1 1 10 1 p 1 0300 0 4561 S 19 6214 2 8031 0 1 4639 8 0000 0 2 3180 0 polyval polytool polyconf polyfit isa function in MATLAB polytool
177. inomial probability density function Y nbinpdf X R P nbi npdf X R P returns the negative binomial probability density function with parameters R and at the valuesin x Note that the density function is zero unless X is an integer The size of Y is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs The negative binomial pdf is y f xIr p is Hoal o 109 The negative binomial models consecutive trials each having a constant prob ability P of success The parameter R is the number of successes required before stopping x 0 10 y nbinpdf x 3 0 5 plot x y set gca Xlim 0 5 10 5 0 2 ried 0 15 a t 0 17 0 05 a ae 0 1 1 0 2 4 6 8 10 nbincdf nbininv nbinrnd nbinstat pdf nbinrnd Purpose Syntax Description Example See Also Random matrices from a negative binomial distribution RND nbinrnd R P RND nbinrnd R P m RND nbinrnd R P m n RND nbinrnd R P isa matrix of random numbers chosen from a negative binomial distribution with parameters R andP The size of RND is the common size of R and P if both are matrices If either parameter is a scalar the size of D is the size of the other parameter 5 RND nbinrnd R P m generates random numbers with parameters R and P m is a 1 by 2 vector that contains the row and column dimensions of RND RND nbinrnd R
178. ion n USL u w LSL Cpk min gt 35 where the process mean is u Design of Experiments DOE 1 100 There is a world of difference between data and information To extract infor mation from data you have to make assumptions about the system that gener ated the data Using these assumptions and physical theory you may be able to develop a mathematical model of the system Generally even rigorously formulated models have some unknown constants The goal of experimentation is to acquire data that allow us to estimate these constants But why do we need to experiment at all We could instrument the system we want to study and just let it run Sooner or later we would have all the data we could use In fact this is a fairly common approach There are three characteristics of his torical data that pose problems for statistical modeling e Suppose we observe a changein the operating variables of a system followed by a change in the outputs of the system That does not necessarily mean that the change in the system caused the change in the outputs e A common assumption in statistical modeling is that the observations are independent of each other This is not the way a system in normal operation works e Controlling a system in operation often means changing system variables in tandem But if two variables change together it is impossible to separate their effects mathematically Designed experiments directly addre
179. ion is not the y value at the maximum mumax mean pricel mumax 115 1500 sigmamax std pricel sqrt 19 20 sigmamax 3 7719 2 77 fullfact Purpose Syntax Description Example See Also 2 78 Full factorial experimental design design fullfact levels design fullfact levels givethe factor settings for a full factorial design Each element in the vector evel s specifies the number of unique values in the corresponding column of desi gn For example if the first element of evel s is 3 then the first column of desi gn contains only integers from 1 to 3 Iflevels 2 4 fullfact generates an 8run design with 2 levels in thefirst column and 4 in the second column d fullfact 2 4 H FW WP PY FP ff2n dcovary daugment cordexch gamcdf Purpose Syntax Description Examples Gamma cumulative distribution function cdf P gamcdf X A B gamcdf X A B computes the gamma cdf with parametersA ands at thevalues in X The arguments X A and B must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments Parameters A andB are positive The gamma cdf is t a b Loe ae b I a 9 p F x The result p is the probability that a single observation from a gamma distri bution with parameters a and b will fall in the interval 0 x gammai nc isthe gamma distribution with a single paramete
180. it is commonplace for the sum of the variances of the first few principal components to exceed 80 of the total variance of the original data By exam ining plots of these few new variables researchers often develop a deeper understanding of the driving forces that generated the original data Example Let us look at a sample application that uses 9 different indices of the quality of life in 329 U S cities These are climate housing health crime transporta tion education arts recreation and economics F or each index higher is better so for example a higher index for crime means a lower crime rate We start by loading the data in cities mat load cities whos Name Size categories 9 by 14 names 329 by 43 ratings 329 by 9 Thewhos command generates a table of information about all the variables in the workspace The cities data set contains three variables e categories astring matrix containing the names of the indices e names a String matrix containing the 329 city names e ratings the data matrix with 329 rows and 9 columns Herearethecategories categories categories climate housing health crime transportation education arts recreation economics Let s look at the first several rows of city names too first5 names 1 5 first5 Abilene TX Akron OH Albany GA Albany Troy NY Albuquerque NM To get a quick impression of the ratings data make a boxplot boxplot ratings 0
181. its of the parameter sliders e An Output button to output the current sample to the variableans e A Resample button to allow repetitive sampling with constant sample size and fixed parameters e A Close button to end the demonstration distributions pop up histogram upper and lower parameter bounds draw again from the same distribution outputto __ variable ans sample size parameter value parameter control The polytool Demo Thepol ytool demois an interactive graphic environment for polynomial curve fitting and prediction 1 111 1 112 The pol yt ool demo has the following features e A graph of the data the fitted polynomial and global confidence bounds on a new predicted value e y axis text to display the predicted y value and its uncertainty at the current x value e A data entry box to change the degree of the polynomial fit e A data entry box to evaluate the polynomial at a specific x value e A draggable vertical reference line to do interactive evaluation of the polyno mial at varying x values e A Close button to end the demonstration You can usepol yt oo todocurve fitting and prediction for any set of x y data but for the sake of demonstration the Statistics Toolbox provides a dataset pol ydata mat to teach some basic concepts Tostart the demonstration you must first load the dataset load polydata who Your variables are
182. le fun The x axis limits are specified by xI ims xmi n x max and the y axis limits specified by yl i ms fsurfht fun xlims ylims pl p2 p3 p4 p5 allows for five optional parameters that you can supply tothe function f un Thefirst two arguments of fun are the x axis variable and y axis variable respectively There are vertical and horizontal reference lines on the plot whose intersection defines the current x value and y value You can drag these dotted white refer ence lines and watch the calculated z values at the top of the plot update simultaneously Alternatively you can get a specific z value by typing the x value and y value into editable text fields on the x axis and y axis respec tively Plot the Gaussian likelihood function for the gas mat data load gas Write the M file gaus ike m function z gauslike mu si gma pl n length pl z ones size mu for l n z z normpdf pl i mu sigma end fsurfht gauslike callSnormpdf treating the data sample as fixed and the parameters uand as variables Assume that the gas prices are normally distributed and plot the likelihood surface of the sample fsurfht gauslike 112 118 3 5 pricel z Value 1 285e 24 46 gt 4 4 Y Value 4 4 3 87 3 6F 3 4 Ser 3 f f f 112 113 114 115 116 shig 118 X Yalue 115 The sample mean is the x value at the maximum but the sample standard deviat
183. level al pha For exampleifal pha 0 01 andtheresult h is 1 you can reject the null hypothesis at the significance level 0 01 If h O you cannot reject the null hypothesis at theal pha level of significance h sig ci ttest x m al pha tail allows specification of one or two tailed tests tail isa flag that specifies one of three alternative hypoth eses tail 0 default specifies the alternative X u tail 1 specifies the alternative x gt w tail 1 specifies the alternative X lt u sig is the p value associated with the T statistic Tar S sig isthe probability that the observed value of T could beas large or larger by chance under the null hypothesis that the mean of x is equal to u ci iS al al pha confidence interval for the true mean This example generates 100 normal random numbers with theoretical mean zero and standard deviation one The observed mean and standard deviation are different from their theoretical values of course We test the hypothesis that thereis no true difference 2 217 ttest Normal random number generator test x normrnd 0 1 1 100 h sig ci ttest x 0 h 0 1165 0 2620 The result h 0 means that we cannot reject the null hypothesis The signif icance level is 0 4474 which means that by chance we would have observed values of T more extreme than the one in this example in 45 of 100 similar experiments A 95 confidence interval on the mean is 0 1165 0 2620
184. lions of values Descriptive statis tics area way to Summarize this data into a few numbers that contain most of the relevant information Measures of Central Tendency Location The purpose of measures of central tendency is tolocate the data values on the number line In fact another term for these statistics is measures of location The table gives the function names and descriptions Measures of Location geomean Geometric Mean harmmean Harmonic Mean mean Arithmetic average in MATLAB median 50th percentile in MATLAB trimmean Trimmed Mean The average is a simple and popular estimate of location If the data sample comes from a normal distribution then the sample average is also optimal minimum variance unbiased estimate of u Unfortunately outliers data entry errors or glitches exist in almost all real data The sample average is sensitive to these problems One bad data value can move the average away from the center of the rest of the data by an arbi trarily large distance The median and trimmed mean are two measures that are resistant robust to outliers The median is the 50th percentile of the sample which will only change slightly if you add a large perturbation to any value The idea behind the trimmed mean is to ignore a small percentage of the highest and lowest values of a sample for determining the center of the sample The geometric mean and harmonic mean like the average are not ro
185. lso Frequency table table tabul ate x tabul ate x table tabul ate x takes a vector of positive integers x and returns a matrix table The first column of tabl e contains the values of x The second contains the number of instances of this value The last column contains the percentage of each value tabulate with no output arguments displays a formatted table in the command window tabulate 1 2 4 4 3 4 Value Count Percent 1 1 16 67 2 1 16 67 3 1 16 67 4 3 50 00 pareto 2 207 thlread Purpose Syntax Description 2 208 Read tabular data from the file system data varnames casenames tblread data varnames casenames tblread filename data varnames casenames tblread filename delimiter data varnames casenames tbl read displays the File Open dialog box for interactive selection of thetabular data file Thefileformat has variable names in the first row case names in the first column and data starting in the 2 2 position data varnames casenames tblread filename allows command line specification of the name of a file in the current directory or the complete path name of any file data varnames casenames tblread filename delimiter allows specification of the field deli miter in the file Accepted values are tab S pace Or comma e varnames iS a string matrix containing the variable names in the first row e casenames iS a string matrix cont
186. lynomial evaluation Y polyval p X Y DELTA polyval p X Y polyval p X returns the predicted value of a polynomial given its coeffi cients p at the values in x Y DELTA polyval p X uses the optional output S generated by pol yfit togenerateerror estimates Y DELTA If the errors in the data input topol yfit areindependent normal with constant variance Y DELTA contains at least 50 of the predictions If p is a vector whose elements are the coefficients of a polynomial in descending powers then pol yval p X is the value of the polynomial evalu ated at X If X is a matrix or vector the polynomial is evaluated at each of the elements Simulate the function y x adding normal random errors with a standard deviation of 0 1 Then usepo yf i t toestimatethe polynomial coefficients Note that tredicted Y values are within DELTA of the integer X in every case p 5 polyfit 1 10 1 10 normrnd 0 0 1 1 10 1 X magic 3 Y D polyval p X S Y 8 0696 1 0486 6 0636 3 0546 5 0606 7 0666 4 0576 9 0726 2 0516 D 0889 0 0951 0 0861 0889 0 0861 0 0870 0870 0 0916 0 0916 oo polyfit polytool pol yconf polyval isa function in MATLAB prctile Purpose Syntax Description Examples Percentiles of a sample Y pretile X p Y prctile X p calculates a value that is greater than p percent of the values in X The values of p must liein the interval 0 10
187. me P Al A2 A3 returns a matrix of critical values X name isa string containing the name of the distribution P is a matrix of probabilities andA 8 andC are matrices of distribution parameters Depending on the distribution some of the parameters may not be necessary The arguments P Al 42 and A3 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments x icdf Normal 0 1 0 2 0 9 0 1 1 2816 0 5244 0 0 5244 1 2816 x icdf Poisson 0 1 0 2 0 9 1 5 2 105 iqr Purpose Syntax Description Examples See Also 2 106 I nterquartile range IQR of a sample y igr X i qr X computes the difference between the 75th and the 25th percentiles of the samplein X The IQR is a robust estimate of the spread of the data since changes in the upper and lower 25 of the data do not affect it If there are outliers in the data then the IQR is more representative than the standard deviation as an estimate of the spread of the body of the data The IQR is less efficient than the standard deviation as an estimate of the spread when the data is all from the normal distribution Multiply the QR by 0 7413 to estimate o the second parameter of the normal distribution This Monte Carlo simulation shows the relative efficiency of the IQR tothe sample standard deviation for normal data x normrnd 0 1 100 100 s std x s_ IQR 0 7413
188. ments In the Massachusetts lottery a player chooses a four digit number Generate random numbers for Monday through Saturday numbers unidrnd 10000 1 6 1 numbers 2189 470 6788 6792 9346 unidstat Purpose Syntax Description Examples Mean and variance for the discrete uniform distribution M V unidstat N For the discrete uniform distribution N 1 e The mean is 5 eel N e The varianceis D mv unidstat 1 6 m 1 0000 1 5000 2 0000 2 5000 3 0000 0 0 2500 0 6667 1 2500 2 0000 3 5000 2 9167 2 225 unifcdf Purpose Syntax Description Examples 2 226 Continuous uniform cumulative distribution function cdf P unifcdf X A B uni f cdf X A B computes the uniform cdf with parameters A andB at the values in X The arguments X A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments A andB are the minimum and maximum values respectively The uniform cdf is x a p F x a b boa la by The standard uniform distribution hasA 0 andB 1 What is the probability that an observation from a standard uniform distribu tion will be less than 0 75 probability unifcdf 0 75 probability 0 7500 What is the probability that an observation from a uniform distribution with a landb 1 will be less than 0 75 probability unifcdf 0 75 1 1 probability 0 87
189. menu for changing the function type cdf lt gt pdf Sliders to change the parameter settings Data entry boxes to choose specific parameter values Data entry boxes to change the limits of the parameter sliders Draggable horizontal and vertical reference lines to dointeractive evaluation of the function at varying values A data entry box to evaluate the function at a specific x value For cdf plots a data entry box on the probability axis y axis to find critical values corresponding to a specific probability A Close button to end the demonstration 1 109 1 110 distributions EX ee en ee function type pop up cdf value x value pop up 1 cdf function Probability 0 5 draggable horizontal reference line draggable vertical reference line Sigma 2 gt gt upper and lower _ parameter N B aia bounds E os 7 Close parameter value parameter control The randtool Demo randtool iS a graphic environment for generating random samples from var ious probability distributions and displaying the sample histogram Therandtool demo has the following features e A histogram of the sample e A pop up menu for changing the distribution function e Sliders to change the parameter settings e A data entry box to choose the sample size e Data entry boxes to choose specific parameter values e Data entry boxes to change the lim
190. minant analysis The Mahalanobis distance of a matrix r when applied to itself is a way tofind outliers r mvnrnd 0 0 1 0 9 0 9 1 100 r r 10 10 d mahal r r last6 d 96 101 last6 1 1036 2 2393 2 0219 0 3876 1 5571 2 52 7381 The last element is clearly an outlier classify mean Purpose Syntax Description Example See Also Average or mean value of vectors and matrices m mean X mean calculates the sample average n Xi i 1 For vectors mean x iS the mean value of the elements in vector x For matrices mean X iS a row vector containing the mean value of each column Dle Xj These commands generate five samples of 100 normal random numbers with mean zero and standard deviation one The sample averages in xbar are much less variable 0 00 0 10 x normrnd 0 1 100 5 xbar mean x Xbar 0 0727 0 0264 0 0351 0 0424 0 0752 median std cov corrcoef var mean iS a function in the MATLAB Toolbox 2 117 median Purpose Syntax Description Examples See Also 2 118 Median value of vectors and matrices m medi an X medi an X calculates the median value which is the 50th percentile of a sample The median is a robust estimate of the center of a sample of data since outliers have little effect on it For vectors medi an x is the median value of the elements in vector x For matrices medi an X iS a row vector co
191. mn dimen sions of R Compute 6 random numbers from a noncentral F distribution with 10 numer ator degrees of freedom 100 denominator degrees of freedom and a noncen trality parameter 6 of 4 0 Compare this tothe F distribution with the same degrees of freedom r ncfrnd 10 100 4 1 6 f 2 5995 0 8824 0 8220 1 4485 1 4415 1 4864 rl frnd 10 100 1 6 0 9826 0 5911 1 0967 0 9681 2 0096 0 6598 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 189 200 ncfcdf ncfinv ncfpdf ncfstat ncfstat Purpose Mean and variance of the noncentral F distribution Syntax M V ncfstat NU1 NU2 DELTA Description M V ncfstat NU1 NU2 DELTA returns the mean and variance of the noncentral F pdf with NU1 and NU2 degrees of freedom and noncentrality parameter DELTA v gt 8 v1 e The mean is Vi V2 2 where v gt 2 e The variance is Cle v1 25 V4 V gt 2 Yi v gt 2 v gt 4 where vo gt 4 Example m v ncfstat 10 100 4 m 1 4286 v 3 9200 References J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 189 200 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 73 74 See Also ncfcdf ncfinv ncfpdf ncfrnd 2 137 nctcdf Purpose Syntax Description E
192. model with the data from the designed experi ment but the trial and error data may be insufficient for fitting a quadratic mode or interactions model e Using the data from the designed experiment you are more likely to be able to find levels for the reactants that result in the maximum reaction rate 1 117 1 118 Even if you find the best settings using the trial and error data the confi dence bounds arelikely to be wider than those from the designed experiment Part 2 Now analyze the experimental design data with a polynomial model and a non linear model and comparing the results The true model for the process which is used to generate the data is actually a nonlinear model However within the range of the data a quadratic model approximates the true model quite well To seethe polynomial model click the Response Surface button on the Exper imental Design Data window r smde mo callsr stool which fits a full quadratic model tothe data Drag the reference lines to changethe levels of the reactants and find the optimal reaction rate Observe the width of the confidence inter vals Now click the Nonlinear Model button on the Experimental Design Data window rs mdemo calls ni i ntool which fits a Hougen Watson model to the data As with the quadratic model you can drag the reference lines to change the reactant levels Observe the reaction rate and the confidence intervals Compare the analysis results for the two
193. n size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs One hypothesis test for comparing two sample variances is to take their ratio and compareit toan F distribution If the numerator and denominator degrees of freedom are 5 and 20 respectively then you reject the hypothesis that the first variance is equal to the second variance if their ratio is less than below critical finv 0 95 5 20 critical 2 7109 Suppose the truth is that the first variance is twice as big as the second vari ance How likely is it that you would detect this difference prob 1 ncfcdf critical 5 20 2 prob 0 1297 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 189 200 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 73 74 icdf ncfcdf ncfpdf ncfrnd ncfstat ncfpdf Purpose Noncentral F probability density function Syntax Y ncfpdf X NU1 NU2 DELTA Description Y ncfpdf X NU1 NU2 DELTA returns the noncentral F pdf with with numer ator degrees of freedom df NU1 denominator df NU2 and positive noncen trality parameter DELTA at the values in X The size of Y is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs The F distribution is a special case of the
194. nces among the column means e The variability due to the differences between the data in each column and the column mean The ANOVA table has five columns e The first shows the source of the variability e The second shows the Sum of Squares SS due to each source e The third shows the degrees of freedom df associated with each source e The fourth shows the Mean Squares MS which is the ratio SS df e The fifth shows the F statistic which is the ratio of the MS s 2 15 anoval The p value is a function fcdf of F As F increases the p value decreases The second figure displays box plots of each column of X Large differences in the center lines of the box plots correspond to large values of F and correspond ingly small p values Examples The five columns of x arethe constants one through five plus a random normal disturbance with mean zero and standard deviation one The ANOVA proce dure detects the difference in the column means with great assurance The probability p of observing the samplex by chance given that thereis no differ ence in the column means is less than 6 in 100 000 x meshgrid 1 5 Pee Mm KY PY ww ww w PPPS uum uw xX X normrnd 0 1 5 5 2 1650 3 6961 1 5538 3 6400 4 9551 1 6268 2 0591 2 2988 3 8644 4 2011 1 0751 3 7971 4 2460 2 6507 4 2348 1 3516 2 2641 2 3610 2 1296 5 8617 0 3035 2 8717 3 5774 4 9846 4 9438 p anoval x 5 9952e 05 2 16 anoval ANOVA Table S
195. nents Analysis barttest Bartlett s test pcacov PCA from covariance matrix pcares Residuals from PCA princomp PCA from raw data matrix Hypothesis Tests ranksum Wilcoxon rank sum test si gnrank Wilcoxon signed rank test signtest Sign test for paired samples ttest One sample t test ttest2 Two sample t test ztest Z test File I O caseread Read casenames from a file casewrite Write casenames from a string matrix toa file tbl read Retrieve tabular data from the file system tbl write Write data in tabular form to the file system Demonstrations disttool Interactive exploration of distribution functions randtool Interactive random number generation pol ytool Interactive fitting of polynomial models rsmdemo Interactive process experimentation and analysis stat demo Demonstrates capabilities of the Statistics Toolbox 2 13 Data census mat cities mat di scrim mat gas mat hald mat hogg mat awdata mat mileage mat moore mat parts mat popcorn mat polydata mat reaction mat sat dat U S Population 1790 to 1980 Names of US metropolitan areas Classification data Gasoline prices Hald data Bacteria counts from milk shipments GPA versus LSAT for 15 law schools Mileage data for three car models from two factories Five factor one response regression data Dimensional runout on 36 circular parts Data for popcorn example Data for pol yt oo d
196. nomial et Noncentral t e Normal e Poisson e Rayleigh e Uniform e Weibull This section gives a short introduction to each distribution Beta Distribution Background The beta distribution describes a family of curves that are unique in that they are nonzero only on the interval 0 1 A more general version of the function assigns parameters to the end points of the interval The beta cdf is the same as the incomplete beta function 1 13 1 14 The beta distribution has a functional relationship with thet distribution If Y is an observation from Student s t distribution with v degrees of freedom then the following transformation generates X which is beta distributed dY if Y t v then X B3 5 The Statistics Toolbox uses this relationship to compute values of the t cdf and inverse function as well as generating t distributed random numbers Mathematical Definition The beta pdf is 1 a b 1 Bra b 1 x 1 0 1 X y f x a b Parameter Estimation Suppose you are collecting data that has hard lower and upper bounds of zero and one respectively Parameter estimation is the process of determining the parameters of the beta distribution that fit this data best in some sense One popular criterion of goodness is to maximize the likelihood function The likelihood has the same form as the beta pdf on the previous page But for the pdf the parameters are known constants and the variable is
197. nrank signtest ttest2 2 181 raylcdf Purpose Rayleigh cumulative distribution function cdf Syntax P raylcdf X B Description P raylcdf X B returns the Rayleigh cumulative distribution function with parameter B at the values in X The size of P is the common size of X and 8 A scalar input functions as a constant matrix of the same size as the other input The Rayleigh cof is X y F x b hz dt Example y 020 153 p raylcdf x 1 plot x p 1 0 8 0 6 0 4 0 2 0 L 1 L 1 1 0 0 5 1 1 5 2 2 5 3 Reference Evans Merran Hastings Nicholas and Peacock Brian Statistical Distri bu tions Second Edition Wiley 1993 pp 134 136 See Also cdf raylinv raylpdf raylrnd raylstat 2 182 raylinv Purpose Syntax Description Example See Also Inverse of the Rayleigh cumulative distribution function X raylinv P B X raylinv P B returns the inverse of the Rayleigh cumulative distribution function with parameter 8 at the probabilities in The size of X is the common size of P and 8 A scalar input functions as a constant matrix of the same size as the other input x raylinv 0 9 1 2 1460 icdf raylcdf raylpdf raylrnd raylstat 2 183 raylpdf Purpose Syntax Description Example See Also 2 184 Rayleigh probability density function Y rayl pdf X B Y raylpdf X B returns the Rayleigh probability density function with parameter B at the value
198. nreliable esti mator The range of a large sample of standard normal random numbers is approxi mately 6 This is the motivation for the process capability indices Cp and Cpk in statistical quality control applications rv normrnd 0 1 1000 5 near6 range rv near6 6 1451 6 4986 6 2909 5 8894 7 0002 std iqr mad ranksum Purpose Syntax Description Example See Also Wilcoxon rank sum test that two populations are identical p ranksum x y al pha p h ranksum x y al pha p ranksum x y al pha returns the significance probability that the popula tions generating two independent samples x andy are identical x andy are vectors but can have different lengths if they are unequal in length x must be smaller than y al pha is the desired level of significance and must be a scalar between zero and one p h ranksum x y al pha alsoreturns the result of the hypothesis test h h is zero if the populations of x andy are not significantly different h is one if the two populations are significantly different p is the probability of observing a result equally or more extreme than the one using the data x andy if thenull hypothesis is true If p is near zero this casts doubt on this hypothesis This example tests the hypothesis of equality of means for two samples gener ated with poissrnd x poissrnd 5 10 1 y poissrnd 2 20 1 p h ranksum x y 0 05 p 0 0028 sig
199. nsity function pdf Y lognpdf X MU SI GMA Y ogncdf X MU SI GMA computes the lognormal cdf with mean MU and stan dard deviation SI GMA at the values in X The size of Y is the common size of X MU and SI GMA A scalar input functions as a constant matrix of the same size as the other inputs The lognormal pdf is In x w 1 20 y f x u e x6 Ei x 0 0 02 10 y lognpdf x 0 1 plot x y grid xlabel x ylabel p 0 8 Mood Alexander M Graybill Franklin A and Boes DuaneC Introduction to the Theory of Statistics Third Edition McGraw Hill 1974 p 540 541 logncdf logninv ognrnd lognstat 2 111 lognrnd Purpose Syntax Description Example Reference See Also 2 112 Random matrices from the lognormal distribution R ognrnd MU SI GMA R ognrnd MU SI GMA m R ognrnd MU SI GMA m n R ognrnd MU SI GMA generates lognormal random numbers with parame ters MU andsI GMA ThesizeofR isthecommon sizeof MU ands GMA if both are matrices If either parameter is a scalar the size of R is the size of the other parameter R ognrnd MU SI GMA m generates lognormal random numbers with param eters MU and SI GMA mis a 1 by 2 vector that contains the row and column dimensions of R R ognrnd MU SI GMA m n generates lognormal random numbers with parameters MU and SI GMA The scalars m and n are the row and column dimen
200. nstant term settings hasnruns rows andnf actors columns settings X cordexch nfactors nruns also generates the associated design matrix X settings X cordexch nfactors nruns model produces a design for fitting a specified regression model Theinput model can be one of these strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms The D optimal design for two factors in nine runs using a quadratic model is the 32 factorial as shown below settings cordexch 2 9 quadratic settings 2 oF T E DBO OF FF Fr e rowexch daugment dcovary hadamard fullfact ff2n 2 53 corrcoef Purpose Syntax Description See Also 2 54 Correlation coefficients R corrcoef X R corrcoef X returns a matrix of correlation coefficients calculated from an input matrix whose rows are observations and whose columns are variables The element i j of the matrix R is related to the corresponding element of the covariance matrix C cov X by JC i DCJ j cov mean std var corrcoef isa function in MATLAB cov Purpose Syntax Description Algorithm See Also Covariance matrix C cov X C cov x y cov computes the covariance matrix For a single vector cov x returns a scalar containing the variance F or matrices where each row is an ob
201. ntaining the median value of each column Since medi an is implemented using sort it can be costly for large matrices xodd 1 5 modd medi an xodd modd meven median xeven meven 2 5000 This example shows robustness of the median to outliers xoutlier x 10000 moutlier median xoutlier moutlier 3 mean std cov corrcoef median isa function in MATLAB mle Purpose Syntax Description Example See Also Maximum likelihood estimation phat mle dist data phat pci mle dist data phat pci mle dist data al pha phat pci mle dist data al pha pl phat m e dist data returns the maximum likelihood estimates M LEs for the distribution specified in di st using the sample in the vector data phat pci mle dist data returns the MLEs and 95 percent confi dence intervals phat pci mle dist data alpha returns the MLEs and 100 1 a pha percent confidence intervals given the data and the specified alpha phat pci mle dist data alpha p1 is used for the binomial distribu tion only p1 is the number of trials rv binornd 20 0 75 rv 16 p pci mle binomial rv 0 05 20 p 0 8000 pci 0 5634 0 9427 betafit binofit expfit gamfit normfit poissfit weibfit 2 119 moment Purpose Syntax Description Example See Also 2 120 Central moment of all orders m moment xX order
202. nts independent samples from the same normal distribution and that you know the standard deviation o Thet test has the same assumptions except that you estimate the standard deviation using the data instead of specifying it as a known quantity Both tests have an associated signal to noise ratio X Zu or Tes oO S n X I where X Lo l The signal is the difference between the average and the hypothesized mean The noise is the standard deviation posited or estimated If the null hypothesis is true then Z has a standard normal distribution N 0 1 T has a Student s t distribution with the degrees of freedom v equal to one less than the number of data values Given the observed result for Z or T and knowing their distribution assuming the null hypothesis is true it is possible to compute the probability p value of observing this result If the p value is very small then that casts doubt on the truth of the null hypothesis F or example suppose that the p value was 0 001 meaning that the probability of observing the given Z or T was one in a thou sand That should make you skeptical enough about the null hypothesis that you reject it rather than believe that your result was just a lucky 999 to 1 shot Example This example uses the gasoline price data in gas mat There are two samples of 20 observed gas prices for the months of J anuary and F ebruary 1993 load gas prices pricel price2
203. nts of i nf o are the asymptotic variances of their respective parameters gaml ike isa utility function for maximum likelihood estimation of the gamma distribution Sincegaml i ke returns the negative gamma log likelihood func tion minimizing gaml i ke usingf mi ns is the same as maximizing the likeli hood Continuing the example for gamf i t a 2 b 3 r gamrnd a b 100 1 loglL info gamlike 2 1990 2 8069 r logl 267 5585 info 0 0690 0 0790 0 0790 0 1220 betalike fmins gamfit mle weiblike gampdf Purpose Syntax Description Examples Gamma probability density function pdf Y gampdf X A B gampdf X A B computes the gamma pdf with parameters A andB at the values in X The arguments X A and 8B must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments The parameters A and8 must both be positive and X must lie on the interval 0 o The gamma pdf is ux 1 a 1_b xe b T a y f x a b Gamma probability density function is useful in reliability models of lifetimes The gamma distribution is more flexible than the exponential in that the prob ability of surviving an additional period may depend on age Special cases of the gamma function are the exponential and x functions The exponential distribution is a special case of the gamma distribution mu 1 5 y gampdf 1 1 mu y 0 3
204. number of defective disks should you allow in your sample of 50 x hygeinv 0 99 1000 10 50 X 3 What is the median number of defective floppy disks in samples of 50 disks from batches with 10 defective disks x hygeinv 0 50 1000 10 50 X 2 101 hygepdf Purpose Syntax Description Examples 2 102 H ypergeometric probability density function pdf Y hygepdf X M K N hygecdf X M K N computes the hypergeometric pdf with parameters K and N at the values in X The arguments X M K and N must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameters M K and N must be positive integers Also X must be less than or equal to all the parameters and N must be less than or equal tom The hypergeometric pdf is Shay xAN x N The result y is the probability of drawing exactly x items of a possible K in n drawings without replacement from group of M objects y f x M K N Suppose you have a lot of 100 floppy disks and you know that 20 of them are defective What is the probability of drawing 0 through 5 defective floppy disks if you select 10 at random p hygepdf 0 5 100 20 10 p 0 0951 0 2679 0 3182 0 2092 0 0841 0 0215 hygernd Purpose Syntax Description Examples Random numbers from the hypergeometric distribution R hygernd M K N R hygernd M K N mm R
205. o candidate models directly Nonlinear Regression Models RSM isan empirical modeling approach using polynomials as local approxima tions to the true input output relationship This empirical approach is often adequate for process improvement in an industrial setting In scientific applications there is usually relevant theory that allows us to make a mechanistic model Often such models are nonlinear in the unknown parameters Nonlinear models are more difficult to fit requiring iterative methods that start with an initial guess of the unknown parameters Each iter ation alters the current guess until the algorithm converges Mathematical Form The Statistics Toolbox has functions for fitting nonlinear models of the form y f X B e where e yisann by 1 vector of observations e fis any function of X and 8 e X is ann by p matrix of input variables e Bis ap by 1 vector of unknown parameters to be estimated e eis ann by 1 vector of random disturbances Nonlinear Modeling Example The Hougen Watson model Bates and Watts 1988 for reaction kinetics is one specific example of this type The form of the model is B1 X2 X3 Bs rate W _ _ _ ____ 1 B2 X1 Bb3 X B4 X3 where B4 B2 B5 are the unknown parameters and x X2 and x3 arethe three input variables The three inputs are hydrogen n pentane and isopentane It is easy to see that the parameters do not enter the model linearly 1 65 1 66 The
206. obability 2 2 probability density function pdf 1 6 probability distributions 1 5 p value 1 55 1 71 qqpl ot 2 10 2 177 QR decomposition 1 56 quality assurance 2 33 quantile quantile plots 1 88 1 91 R random 2 178 random number generator 1 6 random numbers 1 9 randtool 1 109 2 13 2 59 2 179 range 2 9 2 180 ranksum 2 13 2 181 raylcdf 2 4 2 182 Rayleigh distribution 1 13 raylinv 2 6 2 183 raylpdf 2 5 2 184 raylrnd 2 7 2 185 raylstat 2 8 2 186 rcoplot 2 10 2 187 reaction 2 14 refcurve 2 10 2 188 reference lines 1 109 references 1 119 refline 2 10 2 189 regress 2 11 2 190 1 5 Index 1 6 regression 1 24 nonlinear 1 65 stepwise 1 61 regstats 2 192 relative efficiency 2 106 residuals 1 59 Response Surface M ethodology RSM 1 59 ridge 2 11 2 195 robust 1 42 robust linear fit 2 177 rowexch 2 12 2 196 rsmdemo 1 109 2 13 2 197 R square 1 58 rstool 2 11 2 198 S S charts 1 96 sat 2 14 schart 2 11 2 199 Scree plot 1 86 significance level 1 71 si gnrank 2 13 2 201 signtest 2 13 2 202 simulation 2 106 skewness 1 88 skewness 2 9 2 203 SPC 2 2 standard normal 2 155 statdemo 2 13 2 204 statistical plots 1 88 Statistical Process Control 1 95 capability studies 1 98 control charts 1 95 statistical references 1 119 statistically significant 2 15 std 2 9 2 204 stepwise 2 11 2 205 stepwise regression 1 61 Sum of Squares SS 2 15 surfht 2 10
207. odel D 1 1 4 4 1 16 1 2 5 10 4 25 1 3 6 18 9 36 Let x be the first column of x and x be the second Then thefirst column of D is for the constant term The second column is x The 3rd column is x The 4th is X X gt The fifth is x and the last is x 2 rstool cordexch rowexch regstats xbarplot Purpose Syntax Description Example X bar chart for Statistical Process Control xbarpl ot DATA xbarpl ot DATA conf xbarpl ot DATA conf specs outlier h xbarplot xbarplot DATA displays an x bar chart of the grouped responses in DATA The rows of DATA contain replicate observations taken at a given time The rows must bein time order The upper and lower control limits area 99 confidence interval on a new observation from the process So roughly 99 of the plotted points should fall between the control limits xbarplot DATA conf allows control of the the confidence level of the upper and lower plotted confidence limits For example conf 0 95 plots 95 confi dence intervals xbarplot DATA conf specs plots the specification limits in the two element vector specs outlier h xbarplot DATA conf specs returnsoutli er a vector of indices to the rows where the mean of DATA is out of control andh a vector of handles to the plotted lines Plot an x bar chart of measurements on newly machined parts taken at one hour intervals for 36 hours Each row of ther unout matrix contains
208. opdf Geometric pdf hygepdf H ypergeometric pdf Probability Density Functions pdf nor mpdf Normal Gaussian pdf lognpdf Lognormal pdf nbinpdf Negative binomial pdf ncf pdf Noncentral F pdf nct pdf Noncentral t pdf ncx2pdf Noncentral Chi square pdf pdf Parameterized pdf routine poisspdf Poisson pdf rayl pdf Rayleigh pdf t pdf Student s t pdf uni dpdf Discrete uniform pdf uni f pdf Continuous uniform pdf wei bpdf Weibull pdf Inverse Cumulative Distribution Functions betainy Beta critical values binoiny Binomial critical values chi 2inv Chi square critical values expi ny Exponential critical values finv F critical values gami nv Gamma critical values geoi nv Geometric critical values 2 5 Inverse Cumulative Distribution Functions hygeinv Hyper geometric critical values logninv Lognormal critical values nbininy Negative binomial critical values ncfiny Noncentral F critical values nctiny Noncentral t critical values ncx2iny Noncentral Chi square critical values icdf Parameterized inverse distribution routine normi nv Normal Gaussian critical values poissinv Poisson critical values rayliny Rayleigh critical values tiny Student s t critical values unidinv Discrete uniform critical values unifiny Continuous uniform critical values wei binv Weibull critical values Random Number Generators betarnd Beta random numbers bi nornd Binomial random numbers ch
209. osen cumulative distribution function cdf P cdf name X Al A2 A3 cdf is a utility routine allowing you to access all the cdfs in the Statistics Toolbox using the name of the distribution as a parameter P cdf name X Al A2 A3 returns a matrix of probabilities name isastring containing the name of the distribution X is a matrix of values and A A2 and A3 are matrices of distribution parameters Depending on the distribution some of the parameters may not be necessary The arguments X 41 42 andA3 must all be the same size except that scalar arguments function as constant matrices of the common size of the other argu ments cdf Normal 2 2 0 1 a iT 0 0228 0 1587 0 5000 0 8413 0 9772 p cdf Poisson 0 5 1 6 0 3679 0 4060 0 4232 0 4335 0 4405 0 4457 icdf mle pdf random 2 45 chi2cdf Purpose Syntax Description Examples Chi square 2 cumulative distribution function cdf P chi2cdf X V chi 2cdf X V computes the x cdf with parameter V at the values in X The arguments X and V must be the same size except that a scalar argument func tions as a constant matrix of the same size as the other argument The degrees of freedom V must be a positive integer The x cdf is xt V 2 2 t 2 p F x v ha tt 2 T v 2 The result p is the probability that a single observation from the x distribu tion with degrees of freedom v will fall in the interval 0 x The x
210. otsam bootstrp 1000 corrcoef sat gpa bootstat 1 5 ans 1 0000 0 3021 0 3021 1 0000 1 0000 0 6869 0 6869 1 0000 1 0000 0 8346 0 8346 1 0000 1 0000 0 8711 0 8711 1 0000 1 0000 0 8043 0 8043 1 0000 bootstrp bootsam 1 5 ans 4 7 5 12 8 1 11 10 8 4 11 9 12 4 2 11 14 15 5 15 15 13 6 6 2 6 8 4 3 8 8 2 15 8 6 13 10 11 14 5 1 7 12 14 14 1 11 10 1 8 8 14 2 14 7 11 12 10 8 15 1 4 14 8 1 6 1 5 5 12 2 12 7 15 12 hist bootstat 2 250 r r 200 150 100 50 0 1 1 1 0 2 0 4 0 6 0 8 1 The histogram shows the variation of the correlation coefficient across all the bootstrap samples The sample minimum is positive indicating that the rela tionship between LSAT and GPA is not accidental 2 37 box plot Purpose Syntax Description 2 38 Box plots of a data sample boxpl ot X boxplot X notch boxplot X notch sym boxplot X notch sym vert boxplot X notch sym vert whis boxplot X produces a box and whisker plot for each column of X The box has lines at the lower quartile median and upper quartile values The whiskers are lines extending from each end of the box to show the extent of the rest of the data Outliers are data with values beyond the ends of the whiskers boxplot X notch withnot ch 1 produces a notched box plot Notches graph a robust estimate of the uncertaint
211. ou choose these extra runs optimally Suppose we haverun the8 run design below for fitting a linear model to4 input variables settings cordexch 4 8 settings 1 106 This design is adequateto fit thelinear model for four inputs but cannot fit the six cross product interaction terms Suppose we are willing to do 8 more runs to fit these extra terms Here s how augmented X daugment settings 8 i augmented augmented info X X info m ma 2 SS O GOGG GOG GOGO GO m on O O GO gt gt gt 2c mMDNovo Toco CoO CO CO CO D O oo oe OGO GOGG m D OJ OO JG O O OGOGO D O Gi O OGO Jad Go ga Oo oe SS D O O O CO OOO gd O O oOooooo o o oOo oO o D O O O COCO OO DO GO O oOooooooo aoo o o m 1 107 The augmented design is orthogonal since X X is a multiple of the identity matrix In fact this design is the same as a 24 factorial design Design of Experiments with Known but Uncontrolled Inputs Sometimes it is impossible to control every experimental input But you may know the values of some inputs in advance An example is the time each run takes place If a process is experiencing linear drift you may want to include the time of each test run as a variable in the model The function dcovary allows you to choose the settings for each run in order to maximize your information despite a linear drift in the process Suppose we wish torun an
212. ource Ss df MS F Columns 32 93 4 8 232 11 26 Error 14 62 20 0 7312 Total 47 55 24 Values Column Number The following example comes from a study of material strength in structural beams Hogg 1987 The vector str engt h measures the deflection of a beam in thousandths of an inch under 3 000 pounds of force Stronger beams deflect less The civil engineer performing the study wanted to determine whether the strength of steel beams was equal to the strength of two more expensive alloys Steel is coded 1 in the vector al oy The other materials are coded 2 and 3 strength 82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 79 77 78 82 79 alloy 1 111111122222 23 3 3 3 3 3 2 17 anoval Reference 2 18 Though al oy is sorted in this example you do not need to sort the grouping variable p anoval strength all oy p 1 5264e 04 ANOVA Table Source SS df MS F Columns 184 8 2 92 4 15 4 Error 102 17 6 Total 286 8 19 E Values 3 j 1 2 3 Group Number The p value indicates that the three alloys are significantly different The box plot confirms this graphically and shows that the steel beams deflect morethan the more expensive alloys Hogg R V and J Ledolter Enginering Statistics MacMillan Publishing Company 1987 anova2 Purpose Syntax Description Two way Analysis of Variance ANOVA p anova2 X reps anova2 X reps performs a balanced two way AN
213. p X 2 5758 2 5758 The variablex contains the values associated with the normal inverse function with parameters 0 and 1 at the probabilities in p The differencep 2 p 1 is 0 99 Thus the values in x define an interval that contains 99 of the standard normal probability Theinverse function call has the same general format for every distribution in the Statistics Toolbox The first input argument of every inverse function is the set of probabilities for which you want to evaluate the critical values Other arguments contain as many parameters as are necessary to define the distri bution uniquely Random Numbers The methods for generating random numbers from any distribution all start with uniform random numbers Once you havea uniform random number gen erator you can produce random numbers from other distributions either directly or by using inversion or rejection methods Direct Direct methods flow from the definition of the distribution As an example consider generating binomial random numbers Y ou can think of binomial random numbers as the number of heads in n tosses of a coin with probability p of a heads on any toss If you generate n uniform random numbers and count the number that are greater than p the result is binomial with parameters n and p Inversion The inversion method works due to a fundamental theorem that relates the uniform distribution to other continuous distributions If F is a continuous di
214. putes the geometric cdf with probabilities P at the values in X Thearguments X andP must be the same size except that a scalar argument functions as a constant matrix of the same size as the other argument The parameter P is on the interval 0 1 The geometric cdf is floor x y F xjp Y pq i 0 where q 1 p Theresult y is the probability of observing up to x trials before a success when the probability of success in any given trial is p Suppose you toss a fair coin repeatedly If the coin lands face up heads that is a success What is the probability of observing three or fewer tails before getting a heads p geocdf 3 0 5 p 0 9375 geoinv Purpose Syntax Description Examples Inverse of the geometric cumulative distribution function cdf X geoinv Y P geoinv Y P returns the smallest integer X such that the geometric cdf evalu ated at X is equal to or exceeds Y You can think of Y as the probability of observing X successes in a row in independent trials where is the probability of success in each trial The arguments P andy must lie on the interval 0 1 Each x is a positive integer The probability of correctly guessing the result of 10 coin tosses in a row is less than 0 001 unless the coin is not fair psychic geoinv 0 999 0 5 psychic 9 The example below shows the inverse method for generating random numbers from the geometric distribution rndgeo geoinv rand 2
215. r a with b at its default value of 1 a 1 6 b 5 10 prob gamcdf a b a b prob 0 6321 0 5940 0 5768 0 5665 0 5595 0 5543 The mean of the gamma distribution is the product of the parameters a b In this example as the mean increases it approaches the median i e the distri bution gets more symmetric 2 79 ganfit Purpose Syntax Description Example Reference See Also 2 80 Parameter estimates and confidence intervals for gamma distributed data phat gamfit x phat pci gamfit x phat pci gamfit x al pha phat gamfit x returns the maximum likelihood estimates of the parame ters of the gamma distribution given the data in the vector x phat pci gamfit x gives MLEs and95 percent confidence intervals The first row of pci is the lower bound of the confidence intervals the last row is the upper bound phat pci gamfit x alpha returns100 1 al pha percent confidence intervals For example al pha 0 01 yields 99 confidence intervals Note the 95 confidence intervals in the example bracket the true parameter values 2 and 4 respectively a 2 b 4 r gamrnd a b 100 1 p ci gamfit r p 2 1990 3 7426 1 6840 2 8298 2 7141 4 6554 Hahn Gerald J amp Shapiro Samuel S Statistical Modds in Engineering Wiley Classics Library J ohn Wiley amp Sons New York 1994 p 88 betafit binofit expfit normfit poissfit unifit weib
216. r estimates on predictions load reaction betafit nlinfit reactants rate hougen beta betafit 1323 0582 0354 1025 2801 POO OF nlintool nlintool Purpose Syntax Description Example See Also Fits a nonlinear equation to data and displays an interactive graph nlintool x y model beta0 nlintool x y model beta0 al pha nlintool x y model beta0 alpha xname yname nlintool x y model beta0 iS a prediction plot that provides a nonlinear curve fit to x y data It plots a 95 global confidence interval for predictions as two red curves bet a0 is a vector containing initial guesses for the parame ters nlintool x y model beta0 alpha plots a 100 1 al pha percent confi dence interval for predictions nlintool displays a vector of plots one for each column of the matrix of inputs x The response variable y isa column vector that matches the number of rows inx The default value for al pha is 0 05 which produces 95 confidence intervals nlintool x y model beta0 alpha xname yname labelstheplot using the string matrix xname for the X variables and thestring yname for the Y variable You can drag the dotted white reference line and watch the predicted values update simultaneously Alternatively you can get a specific prediction by typing the value for X into an editable text field Use the pop up menu labeled Export to move specified variables
217. rbances LetX Q R whereQ andR come froma QR Decomposition of X Q is orthogonal and R is triangular Both of these matrices are useful for calculating many regression diagnostics Goodall 1993 2 193 regstats Reference See Also 2 194 The standard textbook equation for the least squares estimator of B is b X X X y However this definition has poor numeric properties Particularly dubious is the computation of X X which is both expensive and imprecise Numerically stable MATLAB code for Bis b R Q y Goodall C R 1993 Computation using the QR decomposition Handbook in Statistics Volume 9 Statistical Computing C R Rao ed Amsterdam NL Elsevier North Holland leverage stepwise regress ridge Purpose Syntax Description Example See Also Parameter estimates for ridge regression b ridge y X k b ridge y X k returns the ridge regression coefficients b Given the linear model y X e where X is an n by p matrix y is the n by 1 vector of observations k is a scalar constant the ridge parameter The ridge estimator of B is b X X kl y xy When k 0 b is the least squares estimator F or increasing k the bias of b increases but the variance of b falls For poorly conditioned x the drop in the variance more than compensates for the bias This example shows how the coefficients change as the value of k increases using data from thehal d dataset
218. ridge regression Nonlinear Models For nonlinear models there are functions for parameter estimation interactive prediction and visualization of multidimensional nonlinear fits and confidence intervals for parameters and predicted values Hypothesis Tests There are also functions that do the most common tests of hypothesis t tests and Z tests Multivariate Statistics The Statistics Toolbox supports methods in Multivariate Statistics including Principal Components Analysis and Linear Discriminant Analysis Statistical Plots The Statistics Toolbox adds box plots normal probability plots Weibull proba bility plots control charts and quantile quantile plots to the arsenal of graphs in MATLAB There is also extended support for polynomial curve fitting and prediction Statistical Process Control SPC For SPC therearefunctions for plotting common control charts and performing process capability studies 1 3 1 4 Design of Experiments DO E The Statistics Toolbox supports both factorial and D optimal design There are functions for generating designs augmenting designs and optimally assigning units with fixed covariates Probability Distributions Probability distributions arise from experiments where the outcome is subject to chance The nature of the experiment dictates which probability distribu tions may be appropriate for modeling the resulting random outcomes There are two types of probability distr
219. rubber band line tracks the mouse movement h gline fig returns the handletothelineinh gl ine with noinput arguments draws in the current figure refline gname gname Purpose Syntax Description Example Label plotted points with their case names or case number gname cases gname h gname cases line_handle gname cases displays thegraph window puts up a cross hair and waits for a mouse button or keyboard key to be pressed Position the cross hair with the mouse and click once near each point that you want to label When you are done press the Return or Enter key and the labels will appear at each point that you clicked cases isastring matrix Each row isthecasenameof a data point gname with no arguments labels each case with its case number h gname cases line_ handle returns a vector of handles to the text objects on the plot Use the scalar i ne_handl e to identify the correct line if there is more than one line object on the plot Let s usethe city ratings datasets to find out which cities arethe best and worst for education and the arts 2 93 gname See Also 2 94 load cities education ratings 6 plot education arts gname names arts ratings 7 x 10 6 0 1500 gtext Bascagoyla MS 2000 2500 New York NY 3000 3500 4000 grpstats Purpose Syntax De
220. s analysis design of experiments statistical process control and descriptive statistics All toolbox users should use Chapter 2 Reference for information about spe cific tools For functions reference descriptions include a synopsis of the func tion s syntax as well as a complete explanation of options and operation M any reference descriptions also include examples a description of the function s algorithm and references to additional reading material Usethis guidein conjunction with the software to learn about the powerful fea tures that MATLAB provides Each chapter provides numerous examples that apply the toolbox to representative statistical tasks The random number generation functions for various probability distributions are based on all the primitive functions randn andr and There are many examples that start by generating data using random numbers To duplicate the results in these examples first execute the commands below seed 931316785 rand seed seed randn seed seed You might want to save these commands in an M file script calledi ni t m Then instead of three separate commands you need only typei nit Mathematical Notation This manual and the Statistics Toolbox functions use the following mathemat ical notation conventions B Parameters in a linear model E x Expected value of x E x tf ctyat f x a b Probability density function x is the independent variable a and b are
221. s are the same but are different fromthe second command r chi2rnd 1 6 0 0037 3 0377 7 8142 0 9021 3 2019 9 0729 r chi2rnd 6 1 6 6 5249 2 6226 12 2497 3 0388 6 3133 5 0388 r chi2rnd 1 6 1 6 0 7638 6 0955 0 8273 3 2506 1 5469 10 9197 2 49 chi2stat Purpose Mean and variance for the chi square x2 distribution Syntax M V chi2stat NU Description For the y2 distribution e The meanisn e The variance is 2n Example nu 1 10 nu nu nu mv chi2stat nu m 1 2 3 4 5 6 7 8 9 2 4 6 8 10 12 14 16 18 3 6 9 12 L3 18 21 24 27 4 8 12 16 20 24 28 32 36 5 10 15 20 25 30 35 40 45 6 12 18 24 30 36 42 48 54 7 14 21 28 35 42 49 56 63 8 16 24 32 40 48 56 64 12 9 18 27 36 45 54 63 72 81 10 20 30 40 50 60 70 80 90 v 2 4 6 8 10 12 14 16 18 4 8 12 16 20 24 28 32 36 6 12 18 24 30 36 42 48 54 8 16 24 32 40 48 56 64 72 10 20 30 40 50 60 70 80 90 12 24 36 48 60 72 84 96 108 14 28 42 56 70 84 98 112 126 16 32 48 64 80 96 112 128 144 18 36 54 72 90 108 126 144 162 20 40 60 80 100 120 140 160 180 SDWoO OH DO amp WP oo COCOC COCO OOO O m DAMN classify Purpose Syntax Description Example See Also Linear discriminant analysis class classify sample training group class classify sample training group assigns each row of the data in sample into one ofthe values of the vector group group contains integers from one to the number of groups The training set is t
222. s in X The size of Y is the common size of X and 8 A scalar input functions as a constant matrix of the same size as the other input The Rayleigh pdf is 2 x y f x b x b X h3 p rayl pdf x 1 plot x p 0 8 0 6 F 0 4 F 0 2 F 0 0 0 5 1 1 5 2 2 5 3 raylcdf raylinv raylrnd raylstat raylirnd Purpose Syntax Description Example See Also Random matrices from the Rayleigh distribution R raylrnd B R raylrnd B m R raylrnd B m n R rayl rnd B returns a matrix of random numbers chosen from the Rayleigh distribution with parameter B The size of R is the size of B R raylrnd B m returns a matrix of random numbers chosen from the Rayleigh distribution with parameter B m is a 1 by 2 vector that contains the row and column dimensions of R R rayl rnd B m n returns a matrix of random numbers chosen from the Rayleigh distribution with parameter B The scalars m and n are the row and column dimensions of R r raylrnd 1 5 1 7986 0 8795 3 3473 8 9159 3 5182 random raylcdf raylinv raylpdf raylstat 2 185 raylstat Purpose Mean and variance for the Rayleigh distribution Syntax M raylstat B M V raylstat B Description M V raylstat B returns the mean and variance of the Rayleigh distribu tion with parameter B Nike e The mean is 5 e The variance is 257p Example mn v raylstat 1 mn 1 2533 v 0 4292 See
223. scription Example See Also Summary statistics by group means grpstats X group means sem counts grpstats X group grpstats x group grpstats x group al pha means grpstats X group returnsthemeans of each column of X by group X is a matrix of observations group isa column of positive integers that indi cates the group membership of each row in x means sem counts grpstats x group alpha Supplies the standard error of the meaninsem counts is thesamesizeas the other outputs Thei th row of counts contains the number of elements in the i th group grpstats x group displays a plot of the means versus index with 95 confi dence intervals about the mean value of for each value of index grpstats x group alpha plots 100 1 al pha confidence intervals around each mean We assign 100 observations to one of 4 groups For each observation we measure 5 quantities with truemeans from 1 to 5 grpstats allows us to compute the means for each group group unidrnd 4 100 1 true_mean 1 5 true_mean true_mean ones 100 1 x normrnd true_mean 1 means grpstats x group means 0 7947 2 0908 2 8969 3 6749 4 6555 0 9377 1 7600 3 0285 3 9484 4 8169 1 0549 2 0255 2 8793 4 0799 5 3740 0 7107 1 9264 2 8232 3 8815 4 9689 tabulate crosstab 2 95 harmmean Purpose Syntax Description Examples See Also 2 96 Harmonic mean of a sample of data m harmmean X harmmean
224. servation and each column a variable cov X is the covariance matrix The variance function var X isthesameas di ag cov X The standard deviation function st d X is equivalent to sqrt diag cov X cov x y wherex andy are column vectors of equal length gives the same result ascov x y The algorithm for cov is n p size X X X ones n 1 mean X Y X X n 1 corrcoef mean std var xcov xcorr in the Signal Processing Toolbox cov isa function in MATLAB 2 55 crosstab Purpose Syntax Description Example See Also 2 56 Cross tabulation of two vectors table crosstab col1 col 2 table chi2 p crosstab col1l col 2 table crosstab col1 col2 takes two vectors of positive integers and returns a matrix tabl e of cross tabulations The ijth element of t ab e contains the count of all instances wherecol 1 i andcol2 j table chi2 p crosstab col1 col2 alsoreturns the chisquare statistic chi 2 for testing the independence of the rows and columnst abl e The scalar p is the significance level of the test Values of p near zero cast doubt on the assumption of independence of the rows and columns of t abl e We generate 2 columns of 50 discrete uniform random numbers The first column has numbers from onetothree The second has only ones and twos The two columns are independent so we would be surprised if p were near zero rl unidrnd 3 50 1 r2 unidrnd 2
225. sions of R r lognrnd 0 1 4 3 3 2058 0 4983 1 3022 1 8717 5 4529 2 3909 1 0780 1 0608 0 2355 1 4213 6 0320 0 4960 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 p 102 105 random logncdf logninv lognpdf lognstat lognstat Purpose Syntax Description Example Reference See Also Mean and variance for the lognormal distribution M V lognstat MU SI GMA M V lognstat MU SI GMA returns the mean and variance of the lognormal distribution with parameters MU and SI GMA The size of M and V is the common size of MU and SI GMA if both are matrices If either parameter is a scalar the size ofM and V is the size of the other parameter For the lognormal distribution the mean is eg e The variance is Zu 20 E eZ o mv lognstat 0 1 m 1 6487 7 0212 Mood Alexander M Graybill Franklin A and Boes Duane C Introduction to the Theory of Statistics Third Edition McGraw Hill 1974 p 540 541 logncdf logninv ognrnd ognrnd 2 113 Isline Purpose Least squares fit line s Syntax lsline h I sline Description sline Superimposes the least squares line on each line object in the current axes except LineStyles h sline returns the handles to the line objects Example y 2 3 4 5 6 8 11 12 3 13 8 16 18 8 19 9 plot y lsline See Also polyfit polyval
226. soline prices were between one and six cents lower in J anuary than February 1 75 The box plot gives the same conclusion graphically Note that the notches have little if any overlap Refer back to the Statistical Plots section for more on box plots boxplot prices 1 set gca XtickLabels str2mat January February xl abel Month ylabel Prices 0 01 125 Ra J 120 115 110 J ei Ae January February Month 1 76 Multivariate Statistics Multivariate statistics is an omnibus term for a number of different statistical methods The defining characteristic of these methods is that they all aim to understand a data set by considering a group of variables together rather than focusing on only one variable at a time Principal Components Analysis One of the difficulties inherent in multivariate statistics is the problem of visu alizing multi dimensionality In MATLAB thep ot command displays a graph of the relationship between two variables Thep ot 3 andsurf commands dis play different three dimensional views When there are more than three vari ables it stretches the imagination to visualize their relationships Fortunately in data sets with many variables groups of variables often move together One reason for this is that more than one variable may be measuring the same driving principle governing the behavior of the system In many sys tems there are only a f
227. specs schart DATA conf specs outliers h schart DATA conf specs schart data displays an S chart of the grouped responses in DATA The rows of DATA contain replicate observations taken at a given time The rows must be in time order The upper and lower control limits are a 99 confidence interval on a new observation from the process So roughly 99 of the plotted points should fall between the control limits schart DATA conf allows control of the the confidence level of the upper and lower plotted confidence limits For example conf 0 95 plots 95 confidence intervals schart DATA conf specs plots the specification limits in the two element vector specs outliers h schart data conf specs returnsoutliers a vector of indices to the rows where the mean of DATA is out of control andh a vector of handles to the plotted lines This example plots an S chart of measurements on newly machined parts taken at one hour intervals for 36 hours Each row of therunout matrix contains the measurements for 4 parts chosen at random The values indicate 2 199 schart in thousandths of an inch the amount the part radius differs from the target radius load parts schart runout 0 45 UCL 0 44 0 35 F J 0 3 F J 0 25 F 4 0 2 F 4 Standard Deviation 0 15 J 0 1 F 4 0 05 J 0 LGL L L 1 L 1 1 1 0 5 10 15 20 25 30 35 40 Sample Number Reference Montgomery Douglas Introduc
228. ss these problems The overwhelming advantage of a designed experiment is that you actively manipulatethesystem you are studying With DOE you may generate fewer data points than by using passive instru mentation but the quality of the information you get will be higher The Statistics Toolbox provides several functions for generating experimental designs appropriate to various situations Full Factorial Designs Suppose you wish to determine whether the variability of a machining process is duetothe difference in the lathes that cut the parts or the operators who run the lathes If the same operator always runs a given lathe then you cannot tell whether the machine or the operator is the cause of the variation in the output By allowing every operator to run every lathe you can separate their effects Thisis a factorial approach f ul f act isthe function that generates the design Suppose we have four operators and three machines What is the factorial design d fullfact 4 3 d Ee wWwr rr FWY FW PP WW WW PY PY P DP FY FS fF FS Each row of d represents one operator machine combination Note that there are 4 3 12 rows One special subclass of factorial designs is when all the variables take only two values Suppose you want to quickly determine the sensitivity of a process to high and low values of 3 variables d2 ff 2n 3 d2 1 101 1 102 Pr rr Oo o PROOF FE rPOoOrFrOrF o There are 2 8 combina
229. standard normal distribution written x sets u to zero and o to one x is functionally related to the error function erf erf x 2 xJ 2 1 The first use of the normal distribution was as a continuous approximation to the binomial The usual justification for using the normal distribution for modeling is the Central Limit Theorem which states roughly that the sum of independent samples from any distribution with finite mean and variance converges to the normal distribution as the sample size goes to infinity Mathematical Definition The normal pdf is x u 20 Parameter Estimation One of the first applications of the normal distribution in data analysis was modeling the height of school children Suppose we wish to estimate the mean u and the variance o2 of all the 4th graders in the United States We have already introduced maximum likelihood estimators MLEs Another desirable criterion in a statistical estimator is unbiasedness A statisticis unbi ased if the expected value of the statistic is equal to the parameter being esti mated M LEs are not always unbiased For any data sample there may be more than one unbiased estimator of the parameters of the parent distribution of the sample For instance every sample value is an unbiased estimate of the parameter u of a normal distribution The minimum variance unbiased esti mator MVUE is the statistic that has the minimum variance of all unbiased estima
230. stribution with inverse F and U is a uniform random number then F 4 U has distribution F So you can generate a random number from a distribution by applying the inverse function for that distribution to a uniform random number U nfortu nately this approach is usually not the most efficient Rejection The functional form of some distributions makes it difficult or time consuming to generate random numbers using direct or inversion methods Rejection methods can sometimes provide an elegant solution in these cases Suppose you want to generate random numbers from a distribution with pdf f Touserejection methods you must first find another density g and a constant c so that the inequality below holds f x lt cg x Vx 1 10 You then generate the random numbers you want using the following steps 1 Generate a random number x from distribution G with density g cg x f x 3 Generate a uniform random number u 2 Formtheratior 4 Ifthe product of u andr is less than one return x 5 Otherwise repeat steps one to three For efficiency you need a cheap method for generating random numbers from G and the scalar c should be small The expected number of iterations is c Syntax for Random Number Functions Y ou can generate random numbers from each distribution This function provides a single random number or a matrix of random numbers depending on the arguments you specify in the function call For ex
231. t on the MATLAB numeric computing environment The toolbox supports a wide range of common statis tical tasks from random number generation to curve fitting to design of experiments and statistical process control The toolbox provides two catego ries of tools e Building block probability and statistics functions e Graphical interactive tools The first category of tools is made up of functions that you can call from the command line or from your own applications Many of these functions are MATLAB M files series of MATLAB statements that implement specialized Statistics algorithms You can view the MATLAB code for these functions using the statement type function_name You can change the way any toolbox function works by copying and renaming the M file then modifying your copy Y ou can also extend the toolbox by adding your own M files Secondly the toolbox provides a number of interactive tools that let you access many of the functions through a graphical user interface GUI Together the GU I based tools provide an environment for polynomial fitting and prediction as well as probability function exploration How to Use This Guide If you area new user begin with Chapter 1 Tutorial This chapter introduces the MATLAB statistics environment through the toolbox functions It describes the functions with regard to particular areas of interest such as probability distributions linear and nonlinear models principal component
232. t the first three principal component vectors p3 pes 1 3 p3 0 2064 0 2178 0 6900 0 3565 0 2506 0 2082 0 4602 0 2995 0 0073 0 2813 20 3553 0 1851 0 3512 0 1796 0 1464 0 2753 0 4834 0 2297 0 4631 0 1948 0 0265 0 3279 0 3845 0 0509 0 1354 0 4713 0 6073 The largest weights in the first column 1st principal component are the 3rd and 7th elements corresponding to the variables arts andheal th All the ede ments of the first principal component are the same sign making it a weighted average of all the variables To show the orthogonality of the principal components note that pre multi plying them by their transpose yields the identity matrix p3 p3 1 0000 0 0000 0 0000 0 0000 1 0000 0 0000 0 0000 0 0000 1 0000 The Component Scores Second O utput The second output newdata is the data in the new coordinate system defined by the principal components This output is the same size as the input data matrix 1 81 A plot of thefirst twocolumns of newdat a shows theratings data projected onto the first two principal components plot newdata 1 newdata 2 xlabel Ist Principal Component ylabel 2nd Principal Component 4 1 1
233. tains the lower and upper confidence bounds for parameter A and the second column contains the confi dence bounds for parameter B The optional input argument al pha controls the width of the confidence interval By default al pha is 0 05 which corresponds to 95 confidence inter vals This example generates 100 beta distributed observations The true parame ters are 4 and 3 respectively Compare these to the values in p Note that the columns of ci both bracket the true parameters r betarnd 4 3 100 1 p ci betafit r 0 01 p 3 9010 2 6193 2 5244 1 7488 5 2777 3 4899 Hahn Gerald J amp Shapiro Samuel S Statistical Modas in Engineering Wiley Classics Library J ohn Wiley amp Sons New York 1994 p 95 betalike mle betainv Purpose Syntax Description Algorithm Examples Inverse of the beta cumulative distribution function X betainv P A B bet ainv P A B computes the inverse of the beta cdf with parameters A and B for the probabilities in P The arguments P A andB must all be the same size except that scalar arguments function as constant matrices of the common size of the other arguments The parameters A and8 must both be positive and P must lie on the interval 0 1 The beta inverse function in terms of the beta cdf is x F pyja b x F xja b p where p F xja b ta 1 1 t tdt 1 X B a sil The result x is the solution of the integral
234. tant probability P of success What you want to find out is how many extra trials you must do to observe a given number R of successes x 0 15 p nbincdf x 3 0 5 stairs x p i GS oe 0 8 t E 0 6 0 4 F a 0 0 5 10 15 nbininv nbinpdf nbinrnd nbinstat nbininv Purpose Syntax Description Example See Also Inverse of the negative binomial cumulative distribution function cdf X nbininv Y R P nbi ni nv Y R P returns the inverse of the negative binomial cdf with param eters R andP Since the binomial distribution is discrete nbi ni nv returns the least integer X such that the negative binomial cdf evaluated at X equals or exceeds Y The size of X is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs The negative binomial models consecutive trials each having a constant prob ability P of success The parameter R is the number of successes required before stopping How many times would you need to flip a fair coin to have a 99 probability of having observed 10 heads flips nbininv 0 99 10 0 5 10 flips 33 Note that you have to flip at least 10 times to get 10 heads That is why the second term on the right side of the equals sign is a 10 nbincdf nbinpdf nbinrnd nbinstat 2 129 nbinpdf Purpose Syntax Description Example See Also Negative b
235. tervals for exponential data muhat expfit x muhat expfit x muhat muci expfit x al pha muhat expfit x returns the estimate of the parameter u of the exponen tial distribution given the data x muhat muci expfit x also returns the 95 confidence interval in muci muhat muci expfit x alpha gives100 1 al pha percent confidence intervals For example al pha 0 01 yields 99 confidence intervals We generate 100 independent samples of exponential data with 3 muhat is an estimate of true_mu and muci is a 99 confidence interval around muhat Noticethat muci containstrue_mu true mu 3 muhat muci expfit r 0 01 muhat 2 8835 muci 2 1949 3 6803 betafit binofit gamfit normfit poissfit unifit wei bfit 2 65 ex pinv Purpose Syntax Description Examples Inverse of the exponential cumulative distribution function cdf X expinv P MU expinv P MU computes the inverse of the exponential cdf with parameter MU for the probabilities in P The arguments P and MU must be the same size except that a scalar argument functions as a constant matrix of the size of the other argument The parameter MU must be positive and P must lie on the interval 0 1 The inverse of the exponential cdf is x F p p uln 1 p The result x is the value such that the probability is p that an observation from an exponential distribution with parameter u will fall in the range 0
236. the xbarplot measurements for four parts chosen at random The values indicate in thou sandths of an inch the amount the part radius differs from the target radius load parts xbarplot runout 0 999 0 5 0 5 Xbar Chart 0 57 USL 7 0 4 0 3 F 4 0 2 H gi 25 4 2 UCL 5 0 1 5 al 0 1 L J 0 2 F 4 0 3 F LCL 1 0 4 0 5 F LSL 4 0 5 10 15 20 25 30 35 40 Samples See Also capaplot histfit ewmapl ot schart ztest Purpose Syntax Description Example Hypothesis testing for the mean of one sample with known variance h ztest x m sigma h ztest x m sigma al pha h sig ci ztest x m sigma alpha tail ztest x m sigma performs a Z test at significance level 0 05 to determine whether a sample from a normal distribution in x could have mean m and standard deviation si gma h ztest x m sigma al pha gives control of thesignificancelevel al pha For exampleifal pha 0 01 andtheresult h is 1 you can reject thenull hypothesis at the significance level 0 01 If h O you cannot reject the null hypothesis at theal pha level of significance h sig ci ztest x m sigma alpha tail allows specification of one or two tailed tests tail isa flag that specifies one of three alternative hypoth eses tail 0 default specifies the alternative X u tail 1 specifies the alternative x gt w tail 1 specifies the alternative X lt u sig is the p value associated
237. the data This point may be the result of a data entry error a poor measurement or a change in the system that generated the data e The notches in the box area graphic confidence interval about the median of a sample Box plots do not have notches by default A side by side comparison of two notched box plots is the graphical equivalent of a t test See the section Hypothesis Tests on page 1 71 Normal Probability Plots A normal probability plot is a useful graph for assessing whether data comes from anormal distribution Many statistical procedures make the assumption that the underlying distribution of the data is normal so this plot can provide some assurance that the assumption of normality is not being violated or pro vide an early warning of a problem with your assumptions 1 89 This example shows a typical normal probability plot x normrnd 10 1 25 1 nor mpl ot x Normal Probability Plot 0 99 p T T T T r T 0 98 F id BSG a Rd eee edits tts Ree Bas EG BAS RL ASD ee ei pee 0 95 0 90 f Probability oO oO a N oO o D a 0 10 0 05 0 02 f TE 0 01 L L L L L 8 5 9 9 5 10 105 11 11 5 Data The plot has three graphic elements The plus signs show the empirical proba bility versus the data value for each point in the sample Thesolid line connects the 25th and 75th percentiles of the data and represents a robust linear fit i e insensitive to the extremes of t
238. the other categories of functions Parameter Estimation betafit Parameter estimation for the beta distribution betalike Beta log likelihood function binofit Parameter estimation for the binomial distribution expfit Parameter estimation for the exponential distribution gamfi t Parameter estimation for the gamma distribution gamlike Gamma log likelihood function ml e Maximum likelihood estimation normlike Normal log likelihood function normfit Parameter estimation for the normal distribution poissfit Parameter estimation for the Poisson distribution unifit Parameter estimation for the uniform distribution Cumulative Distribution Functions cdf bet acdf Beta cdf bi nocdf Binomial cdf cdf Parameterized cdf routine chi 2cdf Chi square cdf expcdf Exponential cdf fcdf F cdf game df Gamma cdf geocdf Geometric cdf hygecdf H ypergeometric cdf 2 3 Cumulative Distribution Functions cdf logncdf Lognormal cdf nbincdf Negative binomial cdf ncfcdf Noncentral F cdf nctcdf Noncentral t cdf ncx2cdf Noncentral Chi square cdf normcdf Normal Gaussian cdf poisscdf Poisson cdf rayl cdf Rayleigh cdf tcdf Student s t cdf unidcdf Discrete uniform cdf unif cdf Continuous uniform caf wei bcdf Weibull cdf Probability Density Functions pdf betapdf Beta pdf bi nopdf Binomial pdf chi 2pdf Chi square pdf exppdf Exponential pdf f pdf F pdf gampdf Gamma pdf ge
239. the plot of a cdf or pdf Clicking and dragging a vertical line on the plot allows you to evaluate the function over its entire domain interactively Evaluate the plotted function by typing a value in the x axis edit box or drag ging the vertical reference line on the plot For cdfs you can evaluate the inverse function by typing a value in the y axis edit box or dragging the hori zontal reference line on the plot The shape of the pointer changes from an arrow toa crosshair when you are over the vertical or horizontal line toindicate that the reference line is draggable To change the distribution function choose from the pop up menu of functions at the top left of the figure To change from cdfs to pdfs choose from the pop up menu at the top right of the figure To change the parameter settings move the sliders or type a value in the edit box under the name of the parameter To changethe limits of a parameter type a value in the edit box at the top or bottom of the parameter slider When you are done press the Close button randtool 2 59 dummyvar Purpose Syntax Description Example See Also 2 60 Matrix of 0 1 dummy variables D dummyvar group D dummyvar group generates a matrix D of 0 1 columns D has one column for each unique valuein each column of the matrix group Each column of group contains positive integers that indicatethe group membership of an indi vidual row Suppose we are stu
240. the positive connection between LSAT and GPA but though 0 7764 may seem large we still do not know if it is statistically significant Using thebootstrp function wecan resamplethel sat andgpa vectors as many times as we like and consider the variation in the resulting correlation coefficients 1 49 Here is an example rhos1000 bootstrp 1000 corrcoef sat gpa This command resamples the sat and gpa vectors 1000 times and computes thecorrcoef function on each sample Hereis a histogram of the result hist rhos1l000 2 30 100 80 60 40 PL aell l 0 0 2 0 4 0 6 0 8 1 Nearly all the estimates lie on the interval 0 4 1 0 This is strong quantitative evidence that LSAT and subsequent GPA are posi tively correlated M oreover it does not require us to make any strong assump tions about the probability distribution of the correlation coefficient 1 50 Linear Models Linear models are problems that take the form y XB e where e yisann by 1 vector of observations e X is then by p design matrix for the model e Bisa p by 1 vector of parameters e gisan n by 1 vector of random disturbances One way analysis of variance ANOVA two way ANOVA polynomial regres sion and multiple linear regression are specific cases of the
241. tically independent p Cp Cpk capable data ower upper alsoreturns the capability indices Cp andCpk Cp is the ratio of the range of the specifications to six times the estimate of the process standard deviation oe USL LSL po 60 For a process that has its average value on target a Cp of one translates to a little more than one defect per thousand Recently many industries have set a quality goal of one part per million This would correspond to a Cp 1 6 The higher the value of Cp the more capable the process For processes that do not maintain their average on target Cpk is a more descriptive index of process capability Cpk is the ratio of difference between the process mean and the closer specification limit to three times the estimate of the process standard deviation Z n USL u r Cpk min 30 30 where the process mean is u Imagine a machined part with specifications requiring a dimension to be within 3 thousandths of an inch of nominal Suppose that the machining process cuts too thick by one thousandth of an inch on average and also has a capable Reference See Also standard deviation of one thousandth of an inch What are the capability indices of this process data normrnd 1 1 30 1 p Cp Cpk capable data 3 3 indices p Cp Cpk indices 0 0172 1 1144 0 7053 We expect 17 parts out of a thousand to be out of specification Cpk is less than Cp because the process is not
242. tile Plots A quantile quantile plot is useful for determining whether two samples come from the same distribution whether normally distributed or not 1 91 The example shows a quantile quantile plot of two samples from a Poisson dis tribution x poissrnd 10 50 1 y poissrnd 5 100 1 qqpl ot x y Y Quantiles 2 4 6 8 10 12 14 16 18 X Quantiles Even though the parameters and sample sizes are different the straight line relationship shows that the two samples come from the same distribution Like the normal probability plot the quantile quantile plot has three graphic elements The pluses are the quantiles of each sample By default the number of pluses is the number of data values in the smaller sample The solid line joins the 25th and 75th percentiles of the samples The dashed line extends the solid line to the extent of the sample 1 92 The example below shows what happens when the underlying distributions are not the same x normrnd 5 1 100 1 y weibrnd 2 0 5 100 1 qqplot x y Y Quantiles X Quantiles These samples clearly are not from the same distribution It is incorrect to interpret a linear plot as a guarantee that the two samples come from the same distribution But for assessing the validity of a statistical procedure that depends on the two samples coming from the same distribution a linear quantile quantile plot should be sufficient
243. tion is amp 0 05 For this significance level the probability of incorrectly rejecting the null hypothesis when it is actually true is 5 If you need more protection from this error then choose a lower value of a The p value is the probability of observing the given sample result under the assumption that the null hypothesis is true If the p value is less than a then you reject the null hypothesis F or example if amp 0 05 and the p value is 0 03 then you reject the null hypothesis The converse is not true If the p value is greater than you do not accept the null hypothesis You just have insufficient evidence to reject the null hypoth esis which is the same for practical purposes The outputs for the hypothesis test functions also include confidenceintervals Loosely speaking a confidence interval is a range of values that have a chosen probability of containing the true hypothesized quantity Suppose in our example 1 15 is inside a 95 confidence interval for the mean u That is equiv alent to being unableto reject the null hypothesis at a significance level of 0 05 Conversely if the 100 1 confidence interval does not contain 1 15 then you reject the null hypothesis at the a level of significance Assumptions The difference between hypothesis test procedures often arises from differ ences in the assumptions that the researcher is willing to make about the data sample The Z test assumes that the data represe
244. tion to Statistical Quality Control J ohn Wiley amp Sons 1991 p 235 See Also capaplot ewmaplot histfit xbarpl ot 2 200 signrank Purpose Syntax Description Example See Also Wilcoxon signed rank test of equality of medians p signrank x y al pha p h signrank x y al pha p signrank x y alpha returns the significance probability that the medians of two matched samples x andy are equal x andy must be vectors of equal length al pha is the desired level of significance and must be a scalar between zero and one p h signrank x y alpha also returns the result of the hypothesis test h h is zero if the difference in medians of x andy is not significantly different from zero h is one if the two medians are significantly different p is the probability of observing a result equally or more extreme than the one using the data x andy if the null hypothesis is true p is calculated using the rank values for the differences between corresponding elements inx andy If p iS near zero this casts doubt on this hypothesis This example tests the hypothesis of equality of means for two samples gener ated with nor mrnd Thesamples have the same theoretical mean but different standard deviations x normrnd 0 1 20 1 y normrnd 0 2 20 1 p h signrank x y 0 05 p 0 2568 ranksum signtest ttest 2 201 signtest Purpose Syntax Description Example See Also 2 202
245. tions to check Fractional Factorial Designs One difficulty with factorial designs is that the number of combinations increases exponentially with the number of variables you want to manipulate For example the sensitivity study discussed above might beimpractical if there were 7 variables to study instead of just 3 A full factorial design would require 27 128 runs If we assume that the variables do not act synergistically in the system we can assess the sensitivity with far fewer runs The theoretical minimum number is 8 To see the design X matrix we use the ha da mar d function X hadamard 8 X The last seven columns of d are the actual variable settings 1 for low 1 for high The first column all ones allows us to measure the mean effect in the linear equation y XB e D optimal Designs All the designs above were in use by early in the 20th century In the 1970s statisticians started to use the computer in experimental design by recasting DOE in terms of optimization A D optimal design is one that maximizes the determinant of F isher s information matrix X X This matrix is proportional to theinverse of the covariance matrix of the parameters So maximizing det X X is equivalent to minimizing the determinant of the covariance of the parame ters A D optimal design minimizes the volume of the confidence ellipsoid of the regression estimates of the linear model parameters Th
246. to the base workspace See the section Nonlinear Regression Models in Chapter 1 nlinfit rstool 2 149 niparci Purpose Syntax Description Example See Also Confidence intervals on estimates of parameters in nonlinear models ci nlparci beta r nlparci beta r j returns the 95 confidence interval ci on the nonlinear least squares parameter estimates bet a given the residuals r and the J aco bian matrix at the solution The confidence interval calculation is valid for systems where the number of rows of exceeds the length of beta nl parci uses the outputs of nl infit for its inputs Continuing the example fromnl i nfit load reaction beta resids nlinfit reactants rate hougen beta ci nlparci beta resids ci 1 0798 3 3445 0 0524 0 1689 0 0437 0 1145 0 0891 0 2941 1 1719 3 7321 nlinfit nlintool nlpredci nipredci Purpose Syntax Description Example See Also Confidence intervals on predictions of nonlinear models ypred nil predci model inputs beta r ypred delta nl predci model inputs beta r ypred nipredci model inputs beta r returns the predicted responses y pr ed given the fitted parameters bet a residuals r and the aco bian matrix i nputs is a matrix of values of the independent variables in the nonlinear function ypred delta nlpredci model inputs beta r also returns 95 confidence int
247. tors of a parameter The minimum variance unbiased estimators of the parameters u and o for the normal distribution are the sample average and variance The sample average is also the maximum likelihood estimator for u There are two common text book formulae for the variance They are n 2 6 s bi x x i where X La Equation 1 is the maximum likelihood estimator for 6 and equation 2 is the minimum variance unbiased estimator 1 33 The function nor mf i t returns the MVUEs and confidence intervals for u and o Here is a playful example modeling the heights inches of a randomly chosen 4th grade class height normrnd 50 2 30 1 Simulate heights mu s muci sci normfit hei ght mu 50 2025 S 1 7946 muci 49 5210 50 8841 sci 1 4292 2 4125 Example and Plot The plot shows the bell curve of the standard normal pdf u 0 0 1 0 4 0 3 0 2 0 1 0 1 1 1 re 1 3 2 1 0 1 2 3 Poisson Distribution Background The Poisson distribution is appropriate for applications that involve counting the number of times a random event occurs in a given amount of time distance area etc Sample applications that involve Poisson distribu tions include the number of Geiger counter clicks per second the number of people walking into a storein an hour and the number of flaws per 1000 feet of video tape The Poisson distribution is a one parameter discrete distribution th
248. using random num bers from the exponential distribution with u 700 lifetimes exprnd 700 100 1 muhat muci expfit lifeti mes muhat 672 8207 muci 547 4338 810 9437 The MLE for the parameter u is 672 compared to the true value of 700 The 95 confidence interval for u goes from 547 to 811 which includes the true value In our life tests we do not know the true value of u so it is nice to have a confi dence interval on the parameter to give a range of likely values Example and Plot For exponentially distributed lifetimes the probability that an item will survive an extra unit of time is independent of the current age of the item The example shows a specific case of this special property 10 10 60 Ipd 140 1 deltap expcdf lpd 50 expcdf 50 1l expcdf 50 deltap 0 0020 0 0020 0 0020 0 0020 0 0020 0 0020 The plot shows the exponential pdf with its parameter and mean lambda set to two x 0 0 1 10 y exppdf x 2 plot x y 0 5 0 4 F 0 3 0 27 0 1 0 L L L 0 2 4 6 8 10 F Distribution Background TheF distribution has a natural relationship with the chi square distribution If 1 and x2 are both chi square with v and vz degrees of freedom respectively then the statistic F is F distributed X1 F Vi V4 V yee X2 V2 1 23 1 24 The two parameters v and v are the numerator and denominator degrees of freedom
249. ut if the bacteria counts are the same anova 1 returns the p value from this hypothesis test In this case the p value is about 0 0001 a very small value This is a strong indication that the bacteria counts from the different tankers are not the same An F statistic as extreme as the observed F would occur by chance only oncein 10 000 times if the counts were truly equal The p value returned by anoval depends on assumptions about the random disturbances in the model equation F or the p value to be correct these distur bances need to be independent normally distributed and have constant vari ance You can get some graphic assurance that the means are different by looking at the box plots in the second figure window displayed by anoval 25 o PPS T gt 15 e y J L 1 2 3 4 5 Column Number Since the notches in the box plots do not all overlap this is strong confirming evidence that the column means are not equal Two way Analysis of Variance ANOVA The purpose of two way ANOVA is to find out whether data from several groups have a common mean One way ANOVA and two way ANOVA differ in that the groups in two way ANOVA have two categories of defining character istics instead of one Suppose an automobile company has two factories that both make three models of car It is reasonable to ask if the gas mileage in the cars varies from factory to factory as well as model to model There coul
250. x Let the lifetime of light bulbs be exponentially distributed with mu equal to 700 hours What is the median lifetime of a bulb expinv 0 50 700 ans 485 2030 So suppose you buy a box of 700 hour light bulbs If 700 hours is mean life of the bulbs then half them will burn out in less than 500 hours ex ppdf Purpose Syntax Description Examples Exponential probability density function pdf Y exppdf X MU exppdf X MU computes the exponential pdf with parameter settings MU at the values in X Thearguments X and MU must be the same size except that a scalar argument functions as a constant matrix of the same size of the other argu ment The parameter MU must be positive The exponential pdf is X 1 f x ell Y The exponential pdf is the gamma pdf with its first parameter a equal to 1 The exponential distribution is appropriate for modeling waiting times when you think the probability of waiting an additional period of timeis independent of how long you ve already waited For example the probability that a light bulb will burn out in its next minute of use is relatively independent of how many minutes it has already burned y exppdf 5 1 5 y 0 0067 0 0410 0 0630 0 0716 0 0736 y exppdf 1 5 1 5 0 3679 0 1839 0 1226 0 0920 0 0736 2 67 ex prnd Purpose Syntax Description Examples Random numbers from the exponential distribution iT exprnd M
251. x DATA The function creates a figure with a group of checkboxes that save diagnostic statistics to the base workspace using variable names you can specify regstats responses data model controls the order of the regression model model can be one of these strings e interaction includes constant linear and cross product terms e quadratic interactions plus squared terms e purequadratic includes constant linear and squared terms regstats Algorithm The literature suggests many diagnostic statistics for evaluating multiple linear regression regst ats provides these diagnostics e Q from QR Decomposition e R from QR Decomposition Regression Coefficients e Covariance of regression coefficients Fitted values of the response data Residuals e Mean Squared Error e Leverage Hat Matrix Delete 1 Variance Delete 1 Coefficients Standardized Residuals Studentized Residuals e Change in Regression Coefficients e Change in Fitted Values e Scaled Change in Fitted Values e Change in Covariance e Cook s Distance For more detail press the Help button in ther egst ats window This displays a hypertext help that gives formulae and interpretations for each of these regression diagnostics The usual regression model is y XB e where yis an n by 1 vector of responses X is ann by p matrix of predictors B is an p by 1 vector of parameters gis an n by 1 vector of random distu
252. x The likelihood function reverses the roles of the variables Here the sample values the xs are already observed So they are the fixed constants The variables are the unknown parameters Maximum likelihood estimation MLE involves calcu lating the values of the parameters that give the highest likelihood given the particular set of data The function bet afi t returns the MLEs and confidence intervals for the parameters of the beta distribution H ereis an example using random numbers from the beta distribution with a 5 and b 0 2 r betarnd 5 0 2 100 1 phat pci betafit r phat 4 5330 0 2301 pci 2 8051 0 1771 6 2610 0 2832 The MLE for the parameter a is 4 5330 compared to the true value of 5 The 95 confidence interval for a goes from 2 8051 to 6 2610 which includes the true value Similarly the MLE for the parameter b is 0 2301 compared to the true value of 0 2 The 95 confidence interval for b goes from 0 1771 to 0 2832 which also includes the true value Of course in this made up example we know the true value In experimenta tion we do not Example and Plot Theshapeof the beta distribution is quite variable depending on the values of the parameters as illustrated by this plot 2 5 1 15 1 16 The constant pdf the flat line shows that the standard uniform distribution is a special case of the beta distribution Binomial Distribution Background Thebinomial
253. xample References See Also 2 138 Noncentral T cumulative distribution function P nctcdf X NU DELTA P nctcdf X NU DELTA returns the noncentral T cdf with NU degrees of freedom and noncentrality parameter DELTA at the values in xX The size of P_ is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs Comparethe noncentral T cdf withDELTA 1totheT cdf with thesame number of degrees of freedom 10 xX 5 0 1 5 pl nctcdf x 10 1 p tcdf x 10 plot x p x pl lt _ I I I I uuu 0 8 0 6 f 0 4 F 0 2 F 0 eai L 5 0 5 J ohnson Norman and Kotz Samuel Distributions in Statistics Continuous Univariate Distributions 2 Wiley 1970 pp 201 219 Evans Merran Hastings Nicholas and Peacock Brian Statistical Distribu tions Second Edition Wiley 1993 pp 147 148 cdf nctcdf nctinv nctpdf nctrnd nctstat nctinv Purpose Syntax Description Example References See Also Inverse of the noncentral T cumulative distribution X nctinv P NU DELTA X nctinv P NU DELTA returns the inverse of the noncentral T cdf with NU degrees of freedom and noncentrality parameter DELTA for the probabilities P The size of X is the common size of the input arguments A scalar input func tions as a constant matrix of the same size as the other inputs x nctinv 1 2 10 1
254. y about the means for box to box compar ison The default not ch O produces a rectangular box plot boxplot X notch sym where sym isa plotting symbol allows control of the symbol for outliers if any default boxplot X notch sym vert withvert 0 makes the boxes horizontal default vert 1 for vertical boxplot X notch sym vert whis enables you to specify the length of the whiskers whi s defines the length of the whiskers as a function of the inter quartile range default 1 5 1QR Ifwhis 0 thenboxp ot displays all data values outside the box using the plotting symbol sym box plot Examples x1 normrnd 5 1 100 1 x2 normrnd 6 1 100 1 x xl x2 boxplot x 1 8t J 7r 4 a 6t J S g 5F J 4 d 3t J NDUN 1 2 Column Number The difference between the means of the two columns of x is 1 We can detect this difference graphically since the notches do not overlap 2 39 capable Purpose Syntax Description Example Process capability indices p capable data ower upper p Cp Cpk capable data l ower upper capable data lower upper computes the probability that a sample data from some process falls outside the bounds specified in ower andupper The assumptions are that the measured values in the vector dat a are normally distributed with constant mean and variance and the the measure ments are statis
255. y due to the differences among the row means e The variability due to the interaction between rows and columns if reps is greater than its default value of one e The remaining variability not explained by any systematic source The ANOVA table has five columns e The first shows the source of the variability e The second shows the Sum of Squares SS due to each source e The third shows the degrees of freedom df associated with each source e The fourth shows the Mean Squares MS which is the ratio SS df e The fifth shows the F statistics which is the ratio of the mean squares The p value is a function fcdf of F As F increases the p value decreases The data below comes from a study of popcorn brands and popper type Hogg 1987 The columns of the matrix popcorn are brands Gourmet National and Generic The rows are popper type Oil and Air The study popped a batch of anova2 Reference each brand three times with each popper The values are the yield in cups of popped popcorn load popcorn popcorn popcorn 5 5000 4 5000 3 5000 5 5000 4 5000 4 0000 6 0000 4 0000 3 0000 6 5000 5 0000 4 0000 7 0000 5 5000 5 0000 7 0000 5 0000 4 5000 p anova2 popcorn 3 0 0000 0 0001 0 7462 ANOVA Table Source Ss df MS F Columns 15 75 2 7 875 56 7 Rows 4 5 1 4 5 32 4 Interaction 0 08333 2 0 04167 0 3 Error 1 667 12 0 1389 Total 22 17 The vector p shows the p values for the three brands of popcorn 0 0

Statistics Toolbox User's Guide

Contents

Download Pdf Manuals

Related Search

Related Contents