Home

Using MATLAB for analysing generalised linear models

image

Contents

1. 5 9 npplot This function can be used to produce a normal probability plot which is useful for checking the normality of the data It is called by the glmlab plotting window but can also be used outside of glmlab The general form of the command is npplot y s where y is the data and s is O to include a dotted line corresponding to the standard normal distribution and is 1 too include a dotted line corresponding to a normal distribution with the same mean and variance as the given data y The Statistics Toolbox of MATLAB has its own function for plotting a normal probability plot 5 10 Changing the Fitting Parameters There are three parameters in glmlab that are used when fitting models the maximum number of it erations to use the accuracy tolerance of the parameters and the ill conditioning tolerance which in simple terms refers to the sensitivity of glml ab to XTX being singular where X is the covariate ma trix All of these can be changed from the Options menu by selecting Change Fitting Parameters To change the parameters the new values are entered into the input window shown in Figure 5 4 The de fault settings are maximum number of iterations 20 Parameter tolerance 0 00001 ill conditioning tolerance le 10 The values of the parameters are stored in the file PARVALS mat in the same directory as the file DETAILS m References 1 Murray Aitken Dorothy Anderson Brian Francis and John Hinde Statistical Modelli
2. Vol The volume of toxic by product produced Notice that variable names have been used with the first letter capitalised This is a precaution against using names reserved by MATLAB or glmlab For example length is a reserved word in MATLAB used for defining the length of a vector If length was used as a variable name any MATLAB code that used the function Length would not work When defining variables use names with capitalised first letters since most kd MATLAB Commande contain all lower case letters O Remember that Method is a qualitative or categorical variable To represent Met hod a 1 has been used for Method A and a 2 for Method B To begin fitting start ylmlab the initial screen Figure 4 1 should again appear If glmlab is already running glmlab needs to be told that a new model is to be fitted To do this choose Declare New Model from the Options menu If glmlab is running and a new model is to be fitted or new data is to be loaded gt choose Declare New Model from the Options menu This clears all the internal e settings and resets the default options For demonstration purposes only one variable at a time will be included in the model The vari ables are then typed into their place The response variable is Vol and the first covariate is Temp When these are typed into the appropriate places in the glm1lab Window and the FIT MODEL button is pressed the following parameter estimates appears on the scr
3. at all other times they appear are unavailable and dimmed in the Link menu 5 3 Changing the Scale Parameter Two types of scale parameter can be chosen The mean deviance and a positive fixed value The defaults are given in Table 5 1 When the scale parameter is estimated by the mean deviance glmlab uses D n r where D is the deviance n is the number of data points and r is the number of estimated parameters n r is the number of degrees of freedom The scale parameter can be fixed to any positive value when the Fixed Value option is selected the user is prompted for the value whose default is 1 Although the scale parameter is set to 1 by default for the binomial and Poisson distributions 1t is not uncommon to use the mean deviance This commonly occurs if under or over dispersion is present 30 CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 31 Error Default Link Default Scale Distribution Function Parameter Normal Identity Mean Deviance n A Inverse Gaussian Inverse Quadratic Mean Deviance n 1 4 Poisson Logarithm Fixed at 1 n logu Binomial Logit Fixed at 1 n log z 1 1 Gamma Reciprocal Mean Deviance n 1 u Table 5 1 Default Settings for Chosen Distributions n is the linear predictor and u is the mean T in the binomial case Link Function Identity n u Logarithm n logu Reciprocal n 1 u Square Root n yu Power n p non zero Logit n log x 1 7 Complementary log log n l
4. gt gt Type ans 1 1 2 2 3 3 4 4 5 5 There is one further problem with the data That males cannot be exposed to post abortion trauma This is called a structural zero To circumvent this problem define a vector of prior weights that effectively ignores the cell corresponding to male post abortion trauma gt gt Priorw 1101111111 The prior weights could also have been defined in this manner gt gt Priorw ones size Count Priorw 3 0 CHAPTER 4 EXAMPLES USING glmlab 27 Figure No 1 glmlab Main Window Figure 4 4 Selecting the Poisson Distribution l Figure 4 5 Response and Prior Weights Only Entered Having defined the variables glmlab can be started As discussed before if glmlab is already running declare a new model in the Options menu For this problem the default normal distribution is no longer adequate To change the distribution press the Distribution menu item on the main glmlab window and choose poisson as shown in Figure 4 4 This will cause the link function and the scale parameter to change to the Poisson defaults but no changes will be obvious in the glmlab window Click on the Link menu item and the logarithm link should be selected this is the default link function for the Poisson distribution The variables can now be typed into the main glmlab window First type the name of the response variable Count and the prior weights Priorw as shown in Figure 4 5 After pr
5. B 0 3 3 4 6 6 gt gt BI ans 0 3 6 3 4 6 The semicolon at the end of a line stops MATLAB from displaying the answer on the screen Basic matrix operations can be performed gt A B Error using gt Matrix dimensions must agree CHAPTER 2 STARTING WITH MATLAB 5 gt gt A B ans 1 6 8 2 4 15 gt gt A B ans 1 0 4 4 A 3 gt gt A B ans 21 27 54 Bill gt gt inv A B ans 0 1318 0 0698 0 1395 0 0543 There are other operations in MATLAB apart from the standard operations Preceding an operation with a period causes the operation to be applied on an element by element basis For example consider the following commands gt gt C 0 3 4 5 4 3 Eos 0 ker 4 5 4 3 gt gt D 1 1 9 3 0 1 ER 1 1 9 3 0 1 gt gt G lt D Error using gt Inner matrix dimensions must agree gt G aED ans S 0 3 36 15 0 3 Here s another example of MATLAB s element by element procedure gt gt E 1 4 1 2 3 4 gt Ex 2 Error using gt Inner matrix dimensions must agree gt E 2 22 Error using gt Matrix must be square oe E 2 is the same as E E AZ ans CHAPTER 2 STARTING WITH MATLAB 6 As demonstrated above comments can be included in MATLAB with the character MATLAB ignores any characters after this symbol It is good practice in any progr
6. The fitted values are plotted against the quantile equiv alents of the residuals For quantile residuals this option is disabled 3 1 8 The Help Menu The help menu is far from thorough but should be sufficient when used in conjunction with this manual and the on line manual 3 1 8 1 On line Manual Provided a Web Browser is running this option takes you to the on line version of the manual This allows easy navigation around the manual 3 1 8 2 Help with Main Window Items This presents a screen of basic help concerning the items in the main area of the main glmlab window 3 1 8 3 Help with Menu Items This presents a screen of basic help concerning the drop down menu items 3 1 8 4 Help with Interactions glmlab uses the symbol to specify that two or more variables interact This help screen gives brief instructions on how to specify interactions CHAPTER 3 glmlab REFERENCE 17 3 1 8 5 Help with OUTPUT VARIABLES A brief explanation is provided of all variables returned into the workspace by g1mlab The variables returned are explained in Section 3 5 3 1 8 6 Help with Residuals Items This presents a screen of basic help for the Residuals window 3 1 8 7 Run glmlab Demo This begins a small introductory demonstration of o lm ab While the demonstration is running the user is unable to fit models 3 1 8 8 Where to get glmlab Places where glmlab can be found are listed here 3 1 8 9 Contact the Author This
7. declared and e whenever g1mlab is started Completing the analysis would include looking at some of the residual plots CHAPTER 4 EXAMPLES USING glmlab 26 Type of Counselling Post Relationship Loss of Abortion Breakdown Loved One Trauma Table 4 2 Counselling Service Data 4 2 Example Log Linear Models Table 4 2 is a contingency table containing fictitious data of the patients at a local counselling service over the past year Suppose a log linear model is to be fitted to the data This problem concerns count data and can therefore be modelled by the Poisson distribution We first define the variables in the problem gt gt Count 65 31 0 26 21 49 20 52 39 29 The makefac command can be used to define Gender and Type see Section 3 4 2 Count has been defined to contain the first column of the table then the second column then the third etc Gender then should be a variable of length 10 the same size as Count There are two levels of this variable male and female and they occur one at a time That is according to the ordering of the variable Count Gender is listed as male female male female so that each gender is listed as a block of size one Therefore Gender can be specified as gt gt Gender makefac 10 2 1 gt gt Gender ans 1 2 1 2 1 2 1 2 2 Similarly make fac can be used for the type of counselling Type gt gt Type makefac 10 5 2
8. the glmlog folder directory but this can be changed by moving the file dummy log m to the desired directory 3 1 1 4 QUIT glmlab Quits glmlab and loses all information The DETAILS mfile will retain its information until glmlab is started again This option is the same as pressing the QUIT button 12 CHAPTER 3 glmlab REFERENCE 13 The menus The Response variable The Covariates The Prior Weights variable The Offset variable Pressing this button Pressing this button Pressing this button resets glmlab fits the model quits glmlab Figure 3 1 A Quick Tour of the glmlab Screen For loading data loading and saving models For Se itti ri ion and quitting glmlab For selecting the scale Kales parameter as the mean deviance For help or a fixed value a demonstration links to Web pages For selecting the For selecting the types of residuals For producing error distribution g to calculate various plotsof the For selecting residuals the link function Figure 3 2 A Quick Tour of the Drop Down Menus CHAPTER 3 glmlab REFERENCE 14 3 1 1 5 EXIT matlab Exits MATLAB and hence glmlab The DETAILS m file will retain its information until ylmlab is started again 3 1 2 The Distributions Menu Alters the error distribution The five choices are normal gamma inverse Gaussian binomial and Poisson User defined distributions are also possible see Section 5 6 Each distribution has its own set of defaul
9. 0 328278 Gender 2 Type 5 Scaled Deviance 0 000000 change 39 197423 Residual df 0 change 3 Scale parameter dispersion parameter 1 000000 Notice that the interaction between Gender and Type produces four terms in the model There is no residual deviance or degrees of freedom as there are more parameters than there are observations to estimate The presence of the term aliased indicates that the variable Gender 2 Type 2 contains no new information Chapter 5 Advanced Topics and Examples This section contains some topics more complicated than those previously discussed Further exam ples are also given 5 1 Offsets Offsets are variables with known parameters The name of the offset variable is listed in the Main glmlab window in the appropriate area The output on the screen and in the output file DETAILS m indicates the name of the offset variable used 5 2 Default Settings for Distributions The error distribution type can be altered in ylml ab via the Distributions menu item see Section 4 2 The distributions available are normal inverse Gaussian Poisson binomial and gamma Likewise the type of link function used can be changed in the Link menu The link functions available are summarised in the Table 5 2 Each distribution has a default link function and scale parameter as summarised in Table 5 1 Only when the binomial distribution is chosen are the logit probit and complementary log log links available
10. D EXAMPLES 36 2 The second step is to edit the file named dfourpwr m using any text editor Near the end of the file is a section with the following information SCalculate if strcmp what varfn In here you need lines to find the variance function Sssfrom mu and y The section should return the variance function as answ elseif strcmp what scdev In here you need lines to find the scaled devianc oe oe oe Mh Hi rom mu and y The section should return the scaled deviance as answ oe oe oe It should be clear that there are two sections to alter One section requires information about the variance function and the other about the deviance For the variance function given in Equation 5 1 the MATLAB equivalent code is answ mu 4 This line should be entered into the section requiring information about the variance function The deviance can be found from the integral My u D y u 2 f Ve which for the given variance function is 1 y 1 Ep aa op 6y 3w 2u The equivalent MATLAB code is answ 1 6 y 2 y 3 mu 3 1 2 mu 2 This line of code is placed in the section requiring the deviance Now g1mlab has the infor mation that it requires it now needs to know to use the file dfourpwr m 3 In the dist directory there is a file called dlist m which lists all the files that contain information about distributions There is a corresponding file 11ist m for the link f
11. E EE EE 5 10 Changing the Fitting Parameters References Index 18 19 19 19 19 19 19 30 30 30 30 32 34 35 35 36 37 37 38 38 List of Figures 1 1 1 2 1 3 2 1 3 1 3 2 4 1 4 2 4 3 4 4 4 5 4 6 5 1 5 2 5 3 5 4 The MATLAB Windows Path 2 The MATLAB UNIX Path in tc shell e e iba e ay ja Oe ta Ee 3 The Structure of glmlab ee 3 Graph A IN ee ook PANE ce lve ot an ek 11 A Quick Tour of the glmlab Screen e 13 A Quick Tour of the Drop Down Menus o e 00004 13 The Initial glmlab Screen a 22 Variables Being Entered for Chemical Data o ooo 24 Entering Data Using the fac Command o a 24 Selecting the Poisson Distribution 27 Response and Prior Weights Only Entered 27 Including an Interaction Term ee 29 Entering a Fixed Value forthe Scale Parameter ooo o 33 Variables Entered forExample5 4 o ooo o 33 Variables Entered for the Beetle Mortality Example 35 The Fitting Parameters Window 37 vi List of Tables 4 1 Toxic Chemical Production Data o o 0 002 000 0000 21 4 2 Counselling Service Daa 26 5 1 Default Settings for Chosen Distributions o o e 31 5 2 Link Functions in glmlab 31 5 3 Feigl and Zelen s Leukemia Data 5 4 Beetles Mortality Data vii LIST OF TABLES viii Acknowledgements
12. The author wishes to thank Mr Henry Eastment and Mr Michael Simpson for their help and advice with glmlab and for enduring the many frustrating errors of the earlier versions Thanks also to Jim Albert and especially G Janacek for their encouragement in trying early versions of glmlab Thanks also to Jean Marc Fromentin for help with later problems A Note on What You Read Some small differences may exist between the figures that appear in this document and the figures that appear on your computer screen Most of the figures that appear in this document are the figures as they appear in the UNIX or LINUX versions Windows and Macintosh versions will produce figures that are similar but different glmlab is often updated in small ways Some of the most recent changes may not appear in the figures and output that is shown in this document especially where dates are given This document was produced using IAIEX and IEN with the pictures produced obviously using MATLAB Contact Information There is no charge for using glmlab However if you find it useful it would be greatly appreciated if you could contact the author and let him know how you are using glmlab by emailing him at dunn sci usq edu au or preferred writing to Peter Dunn Faculty of Sciences USQ Toowoomba Q 4350 AUSTRALIA You may also find it useful to visit the glmlab Home Page on the web at http www sci usq edu au staff dunn glmlab glmlab html Chapter 1 I
13. actor to maintain full rank These four variables correspond to levels two three four and five of the Type variable It is common to want to fit interaction terms during modelling To fit interaction terms glmlab uses the character usually SHIFT 2 on the keyboard See Section 3 4 3 In this problem the interaction between the two qualitative variables Gender and Type could be included This can be done in glmlab by entering the following string in the covariate section of the main glmlab window fac Gender fac Type fac Gender fac Type Notice that the fac command has been used again Thus interactions between variables can be specified by separating the variables with the character between the interacting variables Remember to still use fac for qualitative variables To define the interaction between variables in glmlab use the character CHAPTER 4 EXAMPLES USING glmlab 29 Figure 4 6 Including an Interaction Term The interaction variable can be included in the olmlab Covariates area as shown in Figure 4 6 Pressing the FIT MODEL button then gives the following parameter estimates Estimate SE Variable 4 174387 0 124035 Constant 0 740400 0 218272 Gender 2 0 175891 0 265932 Type 2 1 129865 0 251005 Type 3 1 178655 0 255704 Type 4 0 510826 0 202548 Type 5 0 000000 aliased Gender 2 Type 2 1 587698 0 340103 Gender 2 Type 3 1 695912 0 341868 Gender 2 Type 4 0 444134
14. aii dd 35 36 AAA a ed dese o ber 36 ASE Vi Sys EE 35 EE 11 exponential distribution S see distributions Si Ee EE EA e 10 Se 23 28 33 file Maa 32 MES A A ES see data files G gamma distribution see distributions generalised linear model 1 32 34 EINEN eg 1 19 glmlab changing options co 14 27 HomePage vts saa aden viii still ati ers 1 3 Main Window eee 13 22 30 output explained 0 00 23 25 29 HDi pr 6 9 O A A E 16 et 6 AUSTIN iaa 1 histograms ceci ENKEN 16 see hist INDEX Home Page World Wide Web viii PISA acca EEE EE E E 11 identity Hak eiei E a a R a eee eee see link THE 00 EE ET 11 installing glmlab see glmlab installing INTETACH OM Aided ebe NN Eben 28 29 33 SEI ee LIL ete Oe ocho A Whee NIE AS 5 inverse Gaussian distribution see distributions PS SER AAA Meu ah hab we A 36 iterations default number 15 displaying sav as Mata orde 15 maximum number 37 Tabla tardies see xlabel and ylabel line continuation 0 00 ccc eee ooo 2 10 linear predictor 0 0 cece a eee eee eee ee 31 GAK e ebe Age N en Eh E 27 30 31 35 EE 31 complementary log log 14 31 34 EE ao ek 31 o geet eer aunts El 14 21 25 31 logarithm co 14 25 29 31 Jett dE E ERG 14 31 34 MEM e a ad 14 27 30 31 POWER totes IA Ed ana ets 14 31 POD oi Se
15. amming to include comments to explain details that are not immediately obvious 2 2 Obtaining Help from MATLAB General help can be found by using the Help Desk which can be started by typing helpdesk at the MATLAB prompt This opens a Web browser Netscape or Internet Explorer for example that enables the user to search for help on topics In general finding functions for a particular purpose in MATLAB is achieved typing Lookfor and then a relevant word MATLAB will then display any functions that may be relevant For example try typing lookfor cos at the MATLAB prompt My computer produces the following some has been omitted gt gt lookfor cos ACOS Inverse cosine ACOSH Inverse hyperbolic cosine ACSC Inverse cosecant LOGCOST Function which returns the difference between th The MATLAB functions are displayed on the left with a brief description of their purpose on the right Sometimes the information to be displayed will be too much to fit on the screen If this happens type more on atthe MATLAB prompt so that MATLAB will display only one screen at a time Type more off to turn off this feature If information scrolls off the screen type more on atthe MATLAB prompt This will present information one screen at a time Pressing the space bar presents a new screen of information pressing RETURN or ENTER will present one more O line The MATLAB function help can then be used to explain how to use the function For exam
16. be edited Change some of the settings to see what happens To print the plots that are produced on the screen the print command can be used and usually the Print option from the File menu of the Figure works also In some cases it may be best to save the picture to a file and then print the file The print command in still used in this case but in a different form See the help for print for more information You may need to use more on According to the help for print the command print depson will print the current picture on an Epson or Epson compatible 9 or 24 pin printer Also print depson lookatme prt should produce a file called Lookatme prt that can be sent to an Epson printer Type help print to see what other types of printers that MATLAB supports like LaserJets BubbleJets and colour printers 2 5 Other Miscellaneous MATLAB Commands In this section some other useful MATLAB commands are discussed For a full description see the MATLAB help for the function e sort As the name suggests sort sorts the data in ascending order sort x sorts the data in x in descending order e diary Typing diary mine txt at the MATLAB prompt creates a file called mine txt containing all the information printed in the MATLAB command window from the time the command is entered To turn off this feature type diary off This can be useful when submitting assignments to show what was happened at the computer e ones and zeros These
17. belled Method 2 This indicates that the parameter estimate corresponds to the second level of the variable Method that is Method B We could therefore write the model as V 66 45 0 35T 1 17C 13 03M where the volume of the toxic chemical produced the temperature of the process the amount of catalyst used 1if Method B is used and is O otherwise S o rd e Il All the variables are now fitted To record all the important information glmlab produces a file called DETAILS m On some Windows machines the file name may be all lower case The file is stored in the glmlab glmlog directory The DETAILS m file for this session should contain information similar to what follows gt gt type DETAILS Created at 3 06 44pm on 19 Apr 96 Deviance Change d Change Variables 334 000000 6 Vol Const Temp 274 172619 59 827381 5 i Vol Const Temp Cat 594981132 214 191487 4 Vol Const Temp Cat fac Method Whenever a new model is declared in the Options menu or whenever glmlab is started glm lab determines if the file called DETAILS m is in the appropriate directory If so it is deleted without warning so that a new DETAILS m file can be created This file must be copied to another file if the information needs to be kept Remember to copy the file DETAILS m to another file if the information is to be e kept as DETAILS m is overwritten whenever a new model is
18. ber Covariates Dose fitting a constant term intercept Estimate S E Variable 60 717455 5 180701 Constant 34 270326 2 912134 Dose Scaled deviance 11 232231 Link LOGIT Residual df 6 Distribution BINOML Scale parameter dispersion parameter 1 000000 Output variables BETA SERRORS FITS RESIDS COVB COVD DEVLIST LINPRED XMATRIX XVARNAMES The results agree with those given in Dobson An alternative method is to fit the model using the probabilities r n as the response that is one column of probabilities and use n as the prior weights The parameter estimates are identical The residual plotted against the fitted values shows a possible curvature However using the complementary log log link function improves the fit greatly Use this link and see the improvement yourself CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 35 File Distribution Link Scale Parameter Residual Type Options oz Help gimiab Response y _ EE Prior Weights l E enen mur apen Figure 5 3 Variables Entered for the Beetle Mortality Example 5 6 User Defined Links and Distributions glmlab allows the user to define link functions and error distributions that are not included with glmlab This is a more difficult section and requires some knowledge of vectorised programming in MATLAB The files dstyle and 1style give a template for the files to edit and the files dlist m and 11ist m contain the list of f
19. commands create vectors of ones and zeros of given size e eye This command create an identity matrix of a given size usually called Z e size This command returns the number of rows and the number of columns of a given matrix e ans Issuing this command at the MATLAB prompt recalls the last answer that wasn t assigned to another variable name e Ifthe command line won t fit on one line end the line with three dots This tells MATLAB that the line that follows is a continuation of the previous line For example a large vector can be defined like this gt gt HPrices 150000 200000 125000 350000 89000 110000 110000 150000 120000 100900 93500 97000 88000 118500 123300 165000 95000 When defining a long vector it can be very useful to use the line continuation facility by ending the lines with the characters O CHAPTER 2 STARTING WITH MATLAB 11 Ele Windows ep Graph of y exp x 1 T D x values Figure 2 1 Graph of y exp x Some Examples Here are a few examples of using MATLAB showing some important details like how MATLAB deals uses i and 3 to represent one of the square roots of 1 l gt gt conj 1 7i 2 63 ans 1 0000 7 0000i 2 0000 6 00003 2 gt gt w 1 1 2 a 1 0 1 2 gt gt log w Warning Log of zero ans 0 3 1416i Inf 0 0 6931 3 gt gt x 3 0 2 3 gt gt y exp x Xx SOR y exp x 7 2
20. e current variables using clear and then load this file gt gt clear all gt gt load loadme By typing who at the MATLAB prompt the variables that have been loaded can be seen try typing whos also gt gt who Your variables are x y Z Have a look at the variables especially the variable called x by typing variable names at the MATLAB prompt When saving data the best method is to use the syntax save lt filename gt lt variables gt E CHAPTER 2 STARTING WITH MATLAB 9 2 4 Using MATLAB for Plotting MATLAB has many features to produce high quality plots This section in no way explains all the plotting capabilities of MATLAB or discusses all the ways in which things can be changed Hopefully it will supply you with enough information to be able to create plots that are near enough to what you want The basic plotting command is plot If you have defined two vectors of the same length named x and y then plot x y will generate a plot of y versus x A word of warning By default MATLAB joins the points with straight lines in the order in which they are given To plot the points in other ways use the following commands e plot x y will use only plus signs at the points e plot x y x will use only crosses at the points e plot x y will use only dots at the points For both lines and points try these e plot x y will indicate the points with a plus sign and joi
21. een CHAPTER 4 EXAMPLES USING 91m1ab 23 Estimate S E Variable 12 000000 36 091550 Constant 0 200000 0 433023 Temp Deviance 334 000000 Link ID Residual df 6 Distribution NORMAL Scale parameter dispersion parameter 55 666667 Variables names are listed in right column labelled Variable and the corresponding parameter estimates are given on the left in the Est imate column The column labelled S E contains the standard errors for the parameter estimates The deviance listed under the parameter estimates is a measure of the goodness of fit of the model For normal regression models only the deviance is equivalent to the residual sum of squares The scale parameter in the case of a normal distribution only is an estimate of the residual variance The scale parameter is always found by dividing the residual deviance by the residual degrees of freedom Suppose that another variable is included say Cat This variable is then added to the covariate list so that the y1ml ab screen will appear as in Figure 4 2 Pressing the FIT MODEL button produces the following parameter estimates Estimate S E Variable 22 404762 37 179998 Constant 0 172619 0 430573 Temp 5 202381 4 980572 Cat Deviance 274 172619 change 59 827381 Residual df 5 change 1 Scale parameter dispersion parameter 54 834524 The deviance is given again but also the change in the deviance For normal distribution models this change i
22. efer to MATLAB or glm lab commands Symbols A A sare banee odin see interaction E SN i Se fo Ee enh aed see transpose CRE LR ARMA i bei SAS see line continuation a gt SGU eevee ky ies Ee I ae E see period ET EE see colon an Ch Re dea ets ed do see semicolon A ASA SA aaa see caret Ra EE see asterisk EE see comments Stage 2 A peed bah ees 29 analysis of variance 0 0 cece eee eee ee eee 23 O O 10 Aster ae 5 8 11 EE 10 11 CO a A ta ds 11 36 categorical Ludacris Ae see qualitative CLEA Tide a A cee A Sa 8 COLON Ke AS OR i 7 9 11 Comments Mid aa dred Aone 9 36 complementary log log usnesueueunen e see link contingency Loble 0 cece eee eee 26 COS aa rasa need 6 COVA rs EE eet 22 23 28 29 e 3 data lesa a e A 7 8 TOADS EN 7 32 36 SIDEN E tee bee ie oak 8 default data directory siecle eee saree sees 12 36 e EE 31 distribution 0 c cece cece eens 31 Link sir eee ee ne 31 A ele oo EE ERATE a 31 degrees of freedom oooooococccoccoco 23 29 30 39 DE TATE GS amy hie AE 25 30 d vale ir baum oe 8 Mba 23 29 31 36 ALAN rons saan cans OI dr 10 distributionS 0 00 cece cece eee eee 30 31 35 bunomal 0c cee eee 14 30 34 CHAN SING ee dE ee 30 exponential ss esri ic 32 33 et EE 14 30 32 33 inverse GaussiaO ooooooooooommmmoo 14 30 O geed 14 27 30 32 36 DOM oe a 14 21 25 30 37 POIS o oca o ein ia NEEN ANN e 14 25 30 User defined
23. enu 2 0 0 00000 ee ee 14 3 1 4 The Scale Parameter Menu 2 000000 beens 14 3 1 5 The Residuals Type Menu 14 3 1 6 The Options Menu 00000 ee eee eee eee 14 3 137 gt The Plots Men A ei a EE a ees 15 3 1 S The Help Med o a de o a a e ec 16 3 2 The Main Window Edit Areas a 17 3 21 The Response ATC a rt Ge BA A A dE 17 3 2 2 The Covariates Aren a a a a a a a e N 18 3 2 3 The Prior Weights Area o o 18 3 24 ANA uae Gad ek Al An Een Ad E ged a A ce 18 3 3 The Main Window Buttons 0 000 ee 18 SSA QUIE Saa da A ha Rea 18 iv TABLE OF CONTENTS 3 3 2 FIT SPECIFIED MODEL 0 00008 333 NEW MODEL 0d ds A EE Ee E de 3 4 Extra Commande SET LAC a E ebe AE ab Peg lab leet vol se 34 2 kee ebe AE ed ee A En E ee ane JA Te 0 Character seda iiae e wt aha A AE as BALA ES 3 5 R turned Variablesy 2 2 505 84 els a de dee 4 Examples Using glmlab 4 1 Example Multiple Regression 4 2 Example Log Linear Model 5 Advanced Topics and Examples Sel e E CEET 5 2 Default Settings for Distributions 5 3 Changing the Scale Parameter 5 4 Example A Generalised Linear Model o o e 5 5 Example Binomial Distributions 5 6 User Defined Links and Distributions o o e o 5 7 Example User Defined Distributions oo oo 5 8 The Default Data Directory 5 9 ER EE E
24. essing the FIT MODEL button the following parameter estimates are produced CHAPTER 4 EXAMPLES USING 91m1ab 28 Estimate Sik Variable 3 607910 0 054882 Constant Scaled deviance 50 434008 Link LOG Residual df 8 Distribution POISSON Scale parameter dispersion parameter 1 000000 To add Gender to the covariate list remember to use the fac command as Gender is qualita tive Add fac Gender to the covariate list and press FIT MODEL again to produce new estimates Estimate Sick Variable 3 590439 0 083044 Constant 0 031231 0 110653 Gender 2 Scaled Deviance 50 354244 change 0 079764 Residual df 7 change 1 Scale parameter dispersion parameter 1 000000 Now add the last variable Type to the covariate list so the list of variables in the Covariate area includes fac Gender and fac Type Remember to use the fac command again because Type is qualitative Press the FIT MODEL button to produces new estimates Estimate S E Variable 3 817497 0 118512 Constant 0 104671 0 114489 Gender 2 0 664071 0 227643 Type 2 0 315853 0 157170 Type 3 0 287682 0 155902 Type 4 0 344840 0 158501 Type 5 Scaled Deviance 39 197423 change 11 156820 Residual df 3 change 4 Scale parameter dispersion parameter 1 000000 The variable Type appears four times to indicate that there are five levels of this qualitative variable only four variables are needed for a five level f
25. ession e weighted regression e log linear models e logistic regression e other generalised linear modelling techniques and e residual plotting g1mlab also allows the user to save models between sessions and returns a number of variables to the workspace for further analysis see Section 3 5 CHAPTER 1 INTRODUCTION 2 matlabpath C MATLAB toolbox local C MATLAB toolbox matlab datafun C MATLAB toolbox matlab elfun C MATLAB toolbox local C glmlab lt A new line to be included C glmlab fit A new line to be included C glmlab fit link A new line to be included C glmlab fit dist A new line to be included C glmlab misc A new line to be included C glmlab plotting A new line to be included C glmlab data A new line to be included C glmlab glmhelp 4 A new line to be included C glmlab glmlog A new line to be included 1 Figure 1 1 The MATLAB Windows Path 1 3 Installing glmlab glmlab is available from the MathWorks user contributed site at http wwww mathworks com statsv5 html as glmlab zip best for Windows or Macintosh users or glmlab tar best for UNIX users When installing glmlab you will need to have write access to the glmlab glmlog direc tory folder so that it can create a file of the fitting parameters and to place the log fi
26. eter 1 000000 CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 34 Dose Number of Beetles Number of Beetles log CS2mg 17 ni Killed r 1 6907 59 6 1 7242 60 13 1 7552 62 18 1 7842 56 28 1 8113 63 52 1 8369 59 53 1 8610 62 60 1 8839 60 60 Table 5 4 Beetles Mortality Data 5 5 Example Binomial Distributions Because of the special nature of the binomial distribution an small illustrative example will given A binomial response variable consists of two columns the first for the counts and the second for the sample sizes Alternatively the counts can be given as the response data with the sample sizes as prior weights When the data to be analysed is in the form of probabilities only one column is needed The data comes from Bliss 2 cited in Dobson 3 and is shown in Table 5 4 The data involves counting the number of beetles killed after five hours of exposure to various concentration of gaseous carbon disulphide CS2 The analysis concerns estimating the proportion r n of beetles killed by the gas The variables in MATLAB were named Dose Number and Killed for the obvious variables Dobson analyses the data using a logit link function The data is entered into glmlab as shown in Figure 5 3 Note the entry for the response variable which is entered as two columns After choosing the binomial distribution and the logit link function from the menus the results are given below INFO Response Variable Killed Num
27. fused if a matrix is entered or if some covariates are listed by variable name and others using legal MATLAB commands glmlab requires that each column of input into the Covariates window has its own MATLAB variable name or command For example valid strings are 1 2 31 32 92 21 Wt Ln where Wt and Ln are valid MATLAB variables with one column each Invalid entries are 1 2 3 3 4 5 magic 8 Msrments 1 Msrments where Ms rment s is a matrix It is best to define vector variables in the MATLAB workspace and enter the variable names in the glmlab window It is best not to use matrices O Strings of variables can be separated by commas or spaces g1mlab tries to sort out the string if it contains other characters or double commas and so on but it is certainly not foolproof The constant term is included in fitting models by default but can be controlled through the Options menu 3 2 3 The Prior Weights Area If prior weights are to be used the variable name is entered here Prior weights are used for example in weighted regression log linear models where there are structural zeroes or when using the bino mial distribution see Section 5 5 If the variable consists of all zeroes an error message is given Valid MATLAB vectors are allowed 3 2 4 The Offset Area An offset is a variable with a known coefficient The offset variable name is entered here Valid MATLAB vectors are allowed 3 3 The Main Window B
28. glmiab Using MATLAB for analysing generalised linear models Peter K Dunn glmlab Using marLas for analysing generalised linear models Current version 2 5 Peter K Dunn Department of Mathematics and Computing University of Southern Queensland Toowoomba Australia Printed April 26 2000 This manual has been produced using 4TRXand MATLAB converted to HTML using hyperlatex and is 1997 2000 Peter K Dunn iii Table of Contents Acknowledgements ee viii A Note on What You Read a viii Contact Information 2 Vill 1 Introduction 1 LE Why gimbal uns ii Bd al ch dita bee eke you ler E 1 LZ What can gliml ab do 2 5 2 0 8 a Res Ee ee Hae es 1 1 3 Installing glmlab e eke foe a Oe Oe es 2 1 3 1 The PC Version of MATLAB aoaaa aaa 2 1 3 2 The UNIX and LINUX Versions of MATLAB e 3 1 4 Directory Structure ee 3 2 Starting with MATLAB 4 SZT Basic Us OF MATLAB Aa Benne SN Re A DN eae AE eS 4 2 2 Obtaining Help from MATLAB e 6 236 Using Data FUES is a ao Oa A SRA se Re RE 7 2 4 Using MATLAB for Plotting 2 0 2 00 02 200 2 ee 9 2 5 Other Miscellaneous MATLAB Commands 0 000000808 10 some Examples un ver ooh Sed e eet be eed be gob Bence te aoe Eh 11 3 glmlab Reference 12 3 1 The Main Window Items 2 0 0 00 a 12 3 1 1 The File Men isis es a AS RS RRS ne AE BS AS 12 3 1 2 The Distributions Menu 14 3 1 3 The Link M
29. glmlab fit link home myname glmlab fit dist home myname glmlab data home myname glmlab glmhelp home myname glmlab glmlog Figure 1 2 The MATLAB UNIX Pathintc shell glmlab Contains general information and files used in starting glmlab glmlab fit Contains numerous files for fitting the model and parsing the input glmlab fit dist Contains information about the distributions that can be used glmlab fit link Contains information about the link functions that can be used glmlab plotting Contains plotting routines glmlab misc Contains other miscellaneous files used in glmlab including format ting and tricks glmlab glmhelp Contains the glmlab help menu information glmlab glmlog Contains log files and fitting parameters that come with glmlab glmlab data Contains the data files that come with glmlab Figure 1 3 The Structure of glmlab 1 3 2 The UNIX and LINUX Versions of MATLAB To use glmlab with UNIX the file cshrc or its equivalent depending on the shell you are using needs to have a line directing MATLAB to glmlab There should be a line in the cshrc or equivalent file starting with setenv MATLABPATH or equivalent If not then a line beginning with this will need to be created The Systems Administrator will be of great assistance here The appropriate section of the cshrc file or equivalent should include a single line that looks like Figure 1 2 it may include ot
30. gt gt plot x y gt gt title Graph of y exp x 2 gt gt xlabel x values ylabel y values The final graph is shown in Figure 2 1 Notice that the top of the graph is not very smooth What can be done to make the graph smoother It would also be good to have the graph not touching the top border Have a look at the help for the function axis Chapter 3 glmlab Reference This section explains briefly most of the details of glmlab including all of the main menu items Figure 3 1 shows the main g1m1ab window 3 1 The Main Window Items There are eight drop down menus on the main glmlab screen For a quick explanation see Fig ure 3 2 3 1 1 The File Menu 3 1 1 1 LOAD Data File Allows the user to load a data file for use in glmlab using a graphical interface A dialog box is issued confirming that the file has loaded correctly The dialog box initially defaults to the glmlab data directory but the default directory can be altered by moving the dummydt a m file to the required default folder directory See also Section 5 8 3 1 1 2 Load glmlab Model Loads a previously saved yg1m1ab model restoring details such as the error distribution link function scale parameter residual type and variables entered 3 1 1 3 Save glmlab Model Saves a glmlab model saving details such as the error distribution link function scale parameter residual type and variables entered A dialog box defaults to saving in
31. her MATLAB directories besides the glm1ab directories You may have to log out and log in again Type help glmlab at the MATLAB prompt next time MATLAB is started If the message glmlab m not found appears on the screen then the installation did not work and will need to be repeated NOTE glmlab needs to have write permission to the glmlab glmlog directory so that it can create a file of the fitting parameters and to place the log file For this reason it is suggested that glmlab be installed in your own directories 1 4 Directory Structure g1ml ab consists of over 70 MATLAB files in a number of directories or folders They are structured as shown in Figure 1 3 Chapter 2 Starting with MATLAB This section discusses very briefly how to start using MATLAB Only details important for using glmlab are discussed A more thorough introduction is given in http www math utah edu lab ms matlab matlab html The MATLAB manual will also prove useful 2 1 Basic Use of MATLAB MATLAB is designed to work easily with matrices and vectors and a number of methods exist to declare matrices and vectors For example an entire matrix can be entered on one line with rows separated by semicolons ALA SAT St E 90 A 3 2 Sech 0 9 The matrix can also be typed in row by row gt As II 2 1 0 EN B a 3 2 0 9 Notice that the spacing used has no effect The transpose of a matrix can be found using the quote symbol rt gt gt
32. hoods ooooooccococcccoccccr coo 35 radians EE 6 DANN iha da ideo 9 reciprocal link 0 cee cece eee eee eee see link regression Multiple i cece ae rara 21 25 residual sum of squares 00 0 0e cee ee eee 23 TESIS ta AONE oes 23 25 MU a 14 response variable 22 27 INDEX S EE 8 scale parameter 0 eee ee eee 23 27 30 32 MENU fe AE E 14 30 32 ee EE 4 9 SEE a o Re see options chan a dl Ao a A 3T el CN 37 EE Mee o AAAS 10 26 OD EIA AA A te eS 10 square root link 0 e eee eee eee eee ee see link Standard errors mesan eee ccc eee eee eee 23 Statistics Toolbox nsnsununnurunnennen nnen 1 SEO a vada 1 SE e aran tala 36 T Mil ANE EE SEA see title e eee tha ad EE 9 11 Toolbox Statec 1 SA nen aE dae a eee Le 4 5 26 VAD Bhat Ne a 25 U under dispersion 2 cece eee ccoo 30 V variance function 0 0c cece eee eee ee 35 36 EE 8 9 Ww e MESSER 8 21 WAOS cs A A OE LAE SO 8 X SPADES A aen oa sate teased odie Wit SoS Soret peace 9 11 Y e ae Fe varies Mesa aed een aoe eee 9 11 41
33. iduals see Dunn and Smyth 4 or the raw residuals that is y 1 Quantile residuals are only available for the distributions that come with glmlab and not for user defined distributions 3 1 6 The Options Menu There are several options that can be chosen as detailed below 3 1 6 1 Declare New Model After fitting models this option should be selected to restore all options and reset glmlab After selecting this option the DETAILS m file will be overwritten This is the same as pressing the NEW MODEL button CHAPTER 3 glmlab REFERENCE 15 3 1 6 2 Restore Default Settings This option restores all the default settings It can be selected if the user has become lost in setting the options elsewhere 3 1 6 3 Warning Message Status MATLAB Often issues warnings These warnings can be disabled or enabled in this menu For more details type help warning 3 1 6 4 Change Fitting Parameters There are three parameters that the user can change e The number of iterations This determines the maximum number of iterations used in the fitting algorithm The default of 20 iterations will be sufficient for almost all cases e Parameter Tolerance This parameter determines the accuracy of the parameter estimates e Ill conditioning Tolerance In general this parameter determines the sensitivity of glmlab to the singularity of the XTX matrix where X is the covariate matrix If XTX is singular or close to singular the parameter estima
34. ificial data recording the volume of a toxic chemical that is produced as a by product in a certain industrial manufacturing process glmlab is started by typing glmlab at the MATLAB prompt producing the initial glmlab screen given in Figure 4 1 The data is stored in the file chemical mat in the data subdirectory of glmlab This file can be loaded using the LOAD Data File option from the glmlab File menu By default the window opens in the glmlab data folder To load the same file at the MATLAB prompt type the following gt gt load chemical Sif not using the glmlab menu After loading this file check to see what variables have been loaded using MATLAB s who com mand Looking at the variable Chemicalhelp will also be useful type Chemicalhelp at the MATLAB prompt Volume Temperature Weight of Catalyst Method in litres in C in kg 30 39 26 36 22 18 32 26 DG D DSDS py Table 4 1 Toxic Chemical Production Data 21 CHAPTER 4 EXAMPLES USING glmlab 22 Le 1 glm File Distribution Link Scale Parameter Residual Type Options ro Help Covariates X l MU Prior Weights l AAN NEW MODEL FIT nit HOTEL QUIT gimlab Figure 4 1 The Initial ylm1ab Screen gt gt Chemicalhelp Chemicalhelp The file chemical contains these variables Cat The weight of catalyst in kg Method The method used to produce the chemical qualitative Temp The temperature of the manufacturing process
35. iles containing information about the distributions and the links respectively To demonstrate how to include user defined distributions in glmlab see Example 5 7 the next section 5 7 Example User Defined Distributions Nelder and Pregibon 8 and McCullagh and Nelder 7 Chapter 9 discuss quasi likelihood where the complete error distribution need not be specified all that is required is information about the first two moments the mean and the variance As an example consider including a new distribution specified by the variance function Vq u 5 1 This variance function defines one of the Tweedie family of distributions defined in Tweedie 10 sometimes called a positive stable distribution To incorporate such a distribution in o lm lat proceed as follows 1 The first step is to create a file that contains the pertinent information Suppose that the file for the given variance function is called dfourpwr m The d at the start of the file name is mandatory for defining new distributions in a similar fashion new link functions must be created in files that begin with an 1 The first step then is to create this file in the dist subdirectory of the fit directory or the Link subdirectory if a new link is being defined The best way is to copy the file dstyle mto dfourpwr m The file dstyle m contains the template style for making new distributions 1st yle m for new link functions CHAPTER 5 ADVANCED TOPICS AN
36. ion are used though the original paper uses the identity link function The exponential distribution is equivalent to the gamma distribution with the scale parameter set to 1 The first step is to alter the error distribution by selecting the Distribution menu item in the main g1m1ab window and choosing the gamma distribution The scale parameter is then altered by clicking on the Scale Parameter menu CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 33 Figure 5 1 Entering a Fixed Value for the Scale Parameter ERC A Figure 5 2 Variables Entered for Example 5 4 item in the main glmlab window and selecting Fixed Value A new window appears enter in the fixed value of the scale parameter in this case 1 see Figure 5 1 This has effectively selected the exponential distribution The original paper analyses the logarithm of the white blood cell counts with the survival time the AG factor plus their interaction as covariates The complete model can be fitted by typing in the variables in the main glmlab window as shown in Figure 5 2 Note the use of the fac command because Ag is qualitative and the symbol Pressing the FIT MODEL button produces the following estimates Estimate S E Variable 8 478205 1 655453 Constant 0 481829 0 173635 log Wbc 4 137813 2 570290 Ag 2 0 328110 0 266888 log Wbc Ag 2 Scaled Deviance 38 554607 change 0 000000 Residual df 29 change 0 Scale parameter dispersion param
37. le For this reason it is suggested that glmlab be installed in your own directories This will make more sense after reading the rest of this section Please consult your MATLAB documentation to help with setting the appropriate MATLAB path The steps outlined below can change and are sometimes platform dependent 1 3 1 The PC Version of MATLAB If you have the file glmlab zip you will need to unzip the file using a program such as PKunZip or WinZip After loading glmlab onto the computer s hard disk MATLAB must be told where to find the glmlab files There are two ways to do this one is to use the PathTool that comes with MATLAB by typing pathtool at the MATLAB prompt The second method is to edit the file mat labrc m generally found in the matlab directory Load this file using a text editor such as notepad or edit or even a word processor and find the section that begins with matlabpath This section must be edited to include the glmlab directories Use the editor to include some extra lines so that it ends up looking similar to Figure 1 1 many lines have been omitted Type help glmlab at the MATLAB prompt next time MATLAB is started If the message glmlab m not found appears on the screen then the installation did not work and will need to be repeated CHAPTER 1 INTRODUCTION 3 setenv MATLABPATH home myname glmlab home myname glmlab fit home myname glmlab plotting home myname glmlab misc home myname
38. lee a erie e 14 31 Teciprocal nidad 14 31 33 Square FOOL eee eee eee 14 31 user defined eg ik NEE EE A 35 36 LUST 36 e WEE H log linear model 25 29 logarithm oro 33 of negative number 11 logarithm link 0 cece eee eee ee eee see link logistic nk see link logit Tol vii eee see link LOPE Eet Ke e 6 A awihtdnwdunasaa Mette eho eau neens 35 OR 26 MATLAB dad 4 11 35 Mate Fb is 2 EEN 1 ME TEE sitiada GANS SA AIAN Ans 1 menus distributi0N oo 14 27 30 32 36 ME ainda 12 21 32 a E T do Eer EE 16 OPlONS veria dra 14 22 residual a Rda 14 scale parameter oooooooocccoocccom 14 30 32 MIC ar e de ada ee 6 Mew amodel Ze Aarnes Rieger EE a E EE EA E TG 22 normal dstribunon see distributions normal probability plot 37 DPP E tn rod AE A A 37 e EE 30 OS ES SENN d eg es 10 26 Options Me arar a i adexa eaten 14 22 output from g1mlab explained see glmlab output explained Over dispersiOn cr 30 parameter estimates 0 eee eee 23 25 29 KNOW saarin bertsioa es 30 ParvalS Mats cee ias 37 le EE 5 11 36 OL sates ieee aw ee ees 7 PLO ales A SGA Se eae ala tas 8 10 Pl A Whee ies 27 Poisson distribution 8 see distributions EE ci dao EEN see link Pd a 10 AS aA E E A A A NARA 10 A E 10 prior weights Ae das riores 26 27 34 probit link ci EE ENEE see link PROPS Bro 10 0 alte cies inn 22 23 28 33 q a titatiVe NEEN ds 23 quasi likeli
39. n block of size 4 The variable can be generated using makefac 8 2 4 If Gender was recorded as M M F F M M F F then the variable would be generated using makefac 8 2 2 3 4 3 The Q Character The character is used to specify interactions and is usually obtained by holding down the SHIFT key while pressing 2 For example to interact two variables called Height and Gender which is a factor use Height fac Gender 3 5 Returned Variables glmlab returns ten variables to the workspace for further manipulation The variables are e BETA The parameter estimates e SERRORS The standard errors of the parameter estimates e FITS The fitted values e RESIDS The residuals calculated The type of residual calculated is determined by the Resid ual Type menu item see Section 3 1 5 e COVB The covariance matrix for the parameter estimates e COVD The covariance matrix for the differences between the parameter estimates e DEVLIST The deviance at each iteration of the fitting algorithm e LINPRED the linear predictor n XB CHAPTER 3 glmlab REFERENCE e XMATRIX The X matrix used in the fitting e XVARNAMES The names of the X variables as recorded in the glmlab output 20 Chapter 4 Examples Using glmlab This section gives some examples of how to use glmlab for some basic tasks glmlab can do more than is demonstrated here see Chapter 5 for example 4 1 Example Multiple Regression Table 4 1 shows art
40. n the deviance and corresponding change in the degrees of freedom can be recorded in a analysis of variance table for testing The change in deviance is equivalent to a change in the sum of squares for a normal distribution model only The sign of the change in deviance is important The negative sign implies that the deviance has become smaller than the last fitting and therefore that the sum of the squared residuals are smaller The last variable to include is Met hod This variable is qualitative and not quantitative like Vol Temp and Cat So to indicate to glmlab that Method is qualitative type Method in the covariate list using the fac command The fac command indicates to glmlab that the variable is qualitative rather than quantitative the glmlab default See Figure 4 3 When using qualitative variables remember to use the fac command gt Pressing FIT MODEL again produces these estimates 1Some users may recognise this as the GLIM standard CHAPTER 4 EXAMPLES USING glmlab 24 Figure 4 2 Variables Being Entered for Chemical Data Figure 4 3 Entering Data Using the fac Command CHAPTER 4 EXAMPLES USING 91m1ab 25 Estimate Sikes Variable 66 452830 22 668419 Constant 2 0 354717 0 264890 Temp 1 169811 2 814611 Cat 13 028302 3 447181 Method 2 Deviance 59 981132 change 214 191487 Residual df 4 change 1 Scale parameter dispersion parameter 14 995283 The output includes a term la
41. n the points with straight unbroken lines e plot x y o will indicate the points with a circle sign and join the points with dashed lines A full list of the available type of points and lines is available by typing help plot first at the MATLAB prompt You may need to type more on at the prompt to see all the information The plots should be annotated with labels on the x axis the y axis and with a title To place a label on the x axis use x1abel to place a label on the y axis use ylabe1 and to place a title use title Remember to always give pictures meaningful titles and axis labels Let s have a look at using some of this information First define some vectors to manipulate gt gt x 1 10 SThe numbers 1 2 10 gt gt y randn 10 1 10 by 1 vector of normally gt gt plot x y This gives a plot joining the points with lines but crosses would be better gt gt plot x y x The plot and the axes should be labelled gt gt label x values gt gt ylabel y values gt gt title A plot of y versus x CHAPTER 2 STARTING WITH MATLAB 10 The following functions may be useful also e axis Forexample axis 0 10 0 5 sets the axes such that the x axis ranges from 0 to 10 and the y axis ranges from 0 to 5 e propedit The property editor allows the user to make many changes to plots Type propedit at the MATLAB prompt and then click on the Figure to
42. ng in 2 3 4 5 6 7 GLIM Number 4 in Oxford Statistical Sciences Series Oxford University Press 1990 C I Bliss The calculation of the dosage mortality curve The Annals of Applied Biology 22 134 167 1935 Annette J Dobson An Introduction to Statistical Modelling Chapman and Hall 1983 Peter K Dunn and Gordon K Smyth Randomized quantile residuals The Journal of Compu tational and Graphical Statistics 5 1 10 September 1996 Polly Feigl and Marvin Zelen Estimation of exponential survival probabilities with concomitant information Biometrics 21 826 838 December 1965 Brian Francis Mick Green and Clive Payne The GLIM System generalised linear interactive modelling Release 4 Manual Clarendon Press 1993 P McCullagh and J A Nelder Generalized Linear Models Number 37 in Monographs on Statistics and Applied Probability Chapman and Hall second edition 1994 8 J A Nelder and D Pregibon An extended quasi likelihood function Biometrika 74 2 221 9 232 1987 Statistical Sciences Inc S PLUS User s Manual Version 3 3 for Windows 1995 10 M C K Tweedie An index which distinguishes between some important exponential families In Proceedings of the Indian Statistical Institute Golden Jubilee International Conference on Statistics Applications and New Directions pages 579 604 December 1981 38 Index Entries in typewriter font r
43. ntroduction 1 1 Why glmlab glmlab was first written in 1995 to emulate some of the features of GLIM Aitken et al 1 Francis et al 6 industry standard software for analysing generalised linear models However GLIM is an expensive program to purchase even student editions Since MATLAB is inexpensive in student editions especially considering its capabilities and is a widely used numerical computation package MATLAB was considered an ideal platform in which to perform the analyses performed in GLIM glmlab does not attempt to replace packages such as GLIM and S PLUS Statistical Sciences 9 but rather to bring the world of generalised linear models to the MATLAB environment glmlab uses a graphical user interface which means very few additional commands need to be learnt to use glmlab This manual discusses some of the basic skills needed to use glmlab While it assumes some knowledge of MATLAB there is a introduction to MATLAB in Chapter 2 1 2 What can glmlab do MATLAB is a powerful computational tool that can be programmed to perform practically any numeri cal task MATLAB has some basic statistical procedures built in for example hist mean median and std More statistical procedures are available at extra cost in the Statistics Toolbox glm lab extends MATLAB s statistical capabilities without needing the Statistics Toolbox allowing the user to perform tasks such as e simple linear regression e multiple regr
44. og log 1 7 Probit n 7 r Table 5 2 Link Functions in glmlab n is the linear predictor and u is the mean T in the binomial case CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 32 AG Positive White Blood Survival Time Count WBC in weeks AG Negative White Blood Survival Time Count WBC in weeks 2300 65 4400 56 750 156 3000 65 4300 100 4000 17 2600 134 1500 7 6000 16 9000 16 10500 108 5300 22 10000 121 10000 3 17000 4 19000 4 5400 39 27000 2 7000 143 28000 3 9400 56 31000 8 32000 26 26000 4 35000 22 21000 3 100000 1 79000 30 100000 1 100000 4 52000 100000 43 100000 65 Table 5 3 Feigl and Zelen s Leukemia Data 5 4 Example A Generalised Linear Model The data in Table 5 3 is taken from Feigl and Zelen 5 The white blood cell counts for patients who died of acute myelogenous leukemia and their survival times were recorded They were classified as AG positive or AG negative according to the absence or presence of a certain characteristic of white blood cells The data is analysed to determine if survival times can be predicted The data can be loaded from the file leuk mat in the data folder using the MATLAB load command or the LOAD Data File menu item gt gt who Your variables are Ag Time Wbc The variable Ag contains 1 if the patient is AG positive and a O if AG negative Wbc is the white blood cell count Time is the survival time The exponential distribution and reciprocal link funct
45. option allows the user to email the author of glmlab The author encourages you to email him if you are using the program to let him know what you think or to give suggestions for improvements or how you are using glmlab 3 1 8 10 Initial Splash Screen This option redisplays the splash screen that is displayed when g1m1ab is first started 3 1 8 11 Last Revision This is only for information The date of the latest change even if a minor one is given here 3 2 The Main Window Edit Areas 3 2 1 The Response Area The user types the name of the response variable usually designated as y in this area Most le gal MATLAB commands can be used like 1 2 3 or randn 100 1 though glmlab could complain if the commands become too complicated The binomial distribution requires two columns in the Response area one for the observed counts and the other for the sample size If only one column is given it is assumed to be probabilities if all its elements are between 0 and 1 otherwise y1mlab issues a warning See Section 5 5 No model is able to be fitted until a valid string is entered into the Response area CHAPTER 3 glmlab REFERENCE 18 3 2 2 The Covariates Area The user types the name of the covariates usually designated as X in this area It is possible to use most legal MATLAB commands like 1 2 3 3 0 1 orrandn 100 4 though glmlab could complain if they become too complicated In particular glmlab will become con
46. ple gt gt help cos COS Cosine COS X is the cosine of th lements of X gt gt help acosh ACOSH Inverse hyperbolic cosine ACOSH X is the inverse hyperbolic cosine of Although it doesn t say MATLAB always assumes the values are in radians when using trigono metric functions MATLAB commands need to be entered in lower case While the help places the commands in upper case to distinguish commands from the text this can be confusing Always use lower case for MATLAB commands Here is an example of using cos CHAPTER 2 STARTING WITH MATLAB 7 gt gt cos 2 ans 0 4161 gt gt cos pi SNotice that pi is used for 3 141592 ans I gt gt x 0 pi 6 pi 4 pi 3 pi 2 gt gt cos x ans 1 0000 0 8660 0 7071 0 5000 0 0000 2 3 Using Data Files In MATLAB data can easily be loaded from a data file This has many advantages Itis quicker to load a data file than to type in all the numbers loading data at the keyboard is likely to introduce typing errors if there are errors in the data it is easier to alter the file than to re enter all the data again A data file called tester txt in the data subdirectory of glmlab contains the following data E 2 21 10 1 3 45 Sil eee L Here is an example of how to load the file play with it and save another file gt gt SLOAD THE FILE tester txt gt gt load tester txt The data is now loaded as a matrix called tester g
47. ropriate button is pressed The type of residual used depends on the type of residual chosen in the Residuals Type menu and some plots are unavailable for certain distributions and residuals types glmlab labels the plots as best as it is able Apart from the plots listed below g1mlab also has available the facility to plot histograms using MATLAB s hist command The six options are e Residuals vs Response y This option plots the residuals against the response variable y e Residuals vs Covariates If there is more than one covariate a window is displayed where the user can select the covariates against which to plot The constant term is not an option At present only the first 40 covariates can be chosen but this limit should be sufficient for almost all cases If only a constant term is fitted this option is disabled e Normal Probability Plot of Residuals This option plots a normal probability plot of the residuals is given It does nof use the MATLAB function normplot e Residuals vs Fitted Values This option plots the residuals against the fitted values e Residuals vs Transformed Fitted Values This option plots the residuals versus the trans formed fitted values as defined by the constant information criterion discussed by McCullagh and Nelder 7 page 398 For the normal distribution the transformation is simply an identity relationship and so this option is disabled in that case e Fitted Values vs Quantile Equivalents
48. t gt tester tester 1 2 1 0 1 3 4 1 2 3 1 Let s play with this a little First call the first column of data y gt gt SPLAY WITH THE DATA gt gt y tester 1 y 1 1 1 Call the second column x1 the third x2 the fourth x3 gt gt xl tester 2 x2 tester 3 x3 tester 4 CHAPTER 2 STARTING WITH MATLAB 8 Many commands can be placed on one line separated by a semicolon or a comma Notice also how to access particular columns of a matrix using a colon Rows can be accessed in a similar way r tester 1 would place the first row of tester into row vector r In MATLAB it is standard to define variables as column vectors This standardis adopted by glmlab O We can now manipulate the vectors o SL o PAS RS ec 1 8 2 To save y and x in the file prac txt proceed as follows gt gt save prac txt y x ascii That is save lt filename gt lt extension gt lt variables gt ascii where the ascii is so that humans can read it However the best way to save the file is to use save lt filename gt lt variables gt gt gt save prac2 y X This creates a file called prac2 mat which MATLAB can use but humans can t understand To the load the file prac2 mat simply type load prac2 This will automatically create two variables called x and y without you having to declare them explicitly A file called Loadme mat should be in the glmlab data folder First clear all th
49. t link functions and scale parameters as detailed in Table 5 1 User defined distributions are also possible see Section 5 6 The binomial distribution requires two columns in the Response area one for the observed counts and the other for the sample size If only one column is given it is assumed to be probabilities if all its elements are between 0 and 1 otherwise g1mlab issues a warning 3 1 3 The Link Menu Alters the link function The choices are identity logarithm reciprocal square root power logit probit complementary log log The last three are only available when the binomial distribution is selected User defined link functions are also possible see Section 5 6 An edit box is presented for altering the power link function If the power is given as 0 1 0 5 the link function is automatically altered to identity reciprocal or square root respectively Each distribution has its own default link function the distribution s canonical link as shown in Table 5 1 3 1 4 The Scale Parameter Menu The user can chose to use the mean deviance as the scale parameter or to use a fixed value When choosing a fixed value a input dialog box is presented that only allows positive values Each distribu tion has its own default scale parameter as shown in Table 5 1 3 1 5 The Residuals Type Menu Sets the type of residual that is calculated The user can choose from Pearson residual the default deviance residuals quantile res
50. tes can be inaccurate The parameter values are stored in the PARVALS mat file in the same directory as the DETAILS m file usually the glmlog directory 3 1 6 5 Include Constant Term This options toggles the inclusion of the constant term in the fitting of the model By default the constant term is included in the fitting 3 1 6 6 Output Display Information The glmlab output usually contains the parameter estimates and their standard errors and the de viance degrees of freedom and changes in them both The user can decide not to display the pa rameter estimates and standard errors by disabling Display Parameter Values The user can also display more information such as the number of iterations to convergence by selecting the Display Fitting Information 3 1 6 7 Recycle Fitted Values At the start of each fit glmlab usually uses an initial estimate such that u y In some cases adjustments are made in case the response is zero However by selecting this option glmlab will use the previous fitted values for the initial estimate in the iterations 3 1 7 The Plots Menu Produces residual plots from the current fit This menu item is unavailable until a model has been fitted after changes have been made to the model structure altering the distribution link function or scale parameter and after changing the type of residual that is to be calculated CHAPTER 3 glmlab REFERENCE 16 These plots are plotted when the app
51. unctions Add a new line to the end of this file that says only dfourpwr m 4 All the required tasks have been completed Next time g1mlab is started there should be a distribution named fourpwr at the bottom of list of distributions A similar procedure is used for defining new link functions but the corresponding files in the link subdirectory of the fit directory are used Adding new distribution or link functions is not recommended unless the user knows the use of the distribution or link function as odd results may eventuate otherwise 5 8 The Default Data Directory As discussed in sections 4 1 and 5 4 data can be loaded with the graphical interface by selecting LOAD Data File under the file menu The default data directory is initially glmlab data This can CHAPTER 5 ADVANCED TOPICS AND EXAMPLES 37 d Fitting Parameters f Maximum Number of iterations def 20 EI Parameter Accuracy def 0 00005 Se 05 1 Conditioning Tolerance def 1e 010 18 10 Cancel OK Figure 5 4 The Fitting Parameters Window however be easily changed In the glmlab data directory there is a file named dummydta m which contains basically nothing the sole purpose of the file is to define the default data directory To have a different default directory simply move this file into the appropriate directory Next time the graphical load data window appears the default directory will be the directory where this file is located
52. uttons 3 3 1 QUIT Quits glmlab and loses all information The DETAILS mfile will retain its information until glmlab is started again See Section 5 8 The option is the same as choosing Quit from the File menu 3 3 2 FIT SPECIFIED MODEL Pressing this button fits the model currently specified in the edit areas and in the pull down menus If there is no response variable defined or there are no covariates defined including a constant term the button is dimmed since a model cannot be fitted CHAPTER 3 glmlab REFERENCE 19 3 3 3 NEW MODEL This button prepares g1mlab for fitting a new model It performs such tasks as restoring the default options and clearing the DETAILS m file It is the same as choosing the Options menu and selecting Declare New Model 3 4 Extra Commands 3 4 1 fac The fac command indicates that the variable is a factor Factors are also called qualitative variables or categorical variables To fit a model using the factor Gender use fac Gender 3 4 2 makefac makefac is used to generate a factor that has some regularity It is similar to the GLIM command g1 The general syntax of the makefac command is makefac length number_of_levels size_of_the_blocks For example if the variable Gender in a data set was recorded as M M M M F F E F then makefac could be used to generate the numeric equivalent The vector would need to be of length 8 there are two levels M and F and they are repeated i

Download Pdf Manuals

image

Related Search

Related Contents

クルマメ カタログ  Configurer le routeur sans fil  SCAIL-Agriculture: User guide - Centre for Ecology & Hydrology  Graco Blossom PD202869C User's Manual  Starrsed Inversa Instructions For Use  Ryobi RY42110 Blower User Manual  HD Digital Video Recorder(CVR)User Manual  Gardena 08106-20  SECULIFE ESXTRA  

Copyright © All rights reserved.
Failed to retrieve file