Home

MLA. Software for multilevel analysis of data with two levels. User's

1. 1 200 0 10 20 30 40 50 Histogram estimate G2 Count Midpoint 3 Extreme 0 6904 0 7041 0 7061 6 6 7648 xxxxxx 3 6 7971 xxx 18 6 8294 xxxxxxxxxx 17 6 8617 XXXXXXXXXXXXXXXXX 16 6 8940 XXXXXXXXXXXXXXXX 28 6 9263 XXXXXXXXXXXXXXXXXXXX 16 6 9586 XXXXXXXXXXXXXXXX 21 6 9989 24 1 0232 XXXXXXXXXXXXXXXXXXXXXXXX 21 1 0554 XXXXXXXXXXXXXXXXXXXXX 11 1 0877 xxxxxxxxxxx 7 1 1208 xxxxxxx 7 1 1523 xxxxxxx 18 1 1846 xxxxxxxxxx 4 1 2169 xxxx 3 Extreme 1 2318 1 2538 1 2652 1 200 0 10 20 30 40 50 Histogram estimate G3 Figure 4 3 An example of histograms of bootstrap replications 72 Appendix A Technical appendix In this appendix the theory of maximum likelihood estimation used in the MLA program will be discussed in detail Other authors such as Raudenbush and Bryk 2002 and Longford 1987 give some technical detail as well but much is left to the reader We think however that it is useful to explain in much more detail what is actually done in the program and this appendix serves this purpose The opinion of De Leeuw and Kreft 2001 about this appendix in the previous version of this manual was The manual is a bit wordy because it tries to spell out all details especially the technical ones The ultimate example is the 20 page T
2. 8 08 7 1 6 and given by 2 41 The mathematics leading to 2 38 2 41 can be found in the standard jackknife literature For example Shao and Tu 1995 provide a systematic introduction to the theory of the jackknife including a discussion of its theoretical properties Delete m j jackknife Above we discussed the classic jackknife approach to estimating the bias of an estimator and to obtain a bias corrected version of this estimator Using the pseudo values an accompanying estimator for the variance of the original or bias corrected estimator and hence for the standard error can be obtained as well The jacknife version we discussed is based on subsamples obtained from the original sample by successively removing mutually exclusive groups of observations of size m Furthermore it relies on the assumption of independently and identically distributed observations Both features influence the formulation of a jackknife resampling scheme for multilevel data and models The independence assumption restricts the application of the jackknife to the highest level in the data In the two level case independence can only be assumed for the groups Within the groups data are dependent Consequently a multilevel jackknife approach must be based on sub samples obtained by removing complete Level 2 units In fact Wolter 1985 section 4 6 already stated that the delete m jackknife can be used in cluster sampling when t
3. ee 45 3 5 2 minimization ee 45 3 5 3 reparameterization optional 46 3 5 4 warnings opona 4 2 2 22 A 46 3 5 5 miaxiter optional com e s UR AU ROSA EAS RTA is 46 3 5 6 convergence optional ees 46 3 iseed op onab ced EE EX Bg See Ea de a Wr 47 35 8 ft Le optional a on dee Ue oe Be e Ee SX 47 SIMULATION optional sess 47 3 6 1 kind required oe eae A ES a eA E Y 47 3 62 method optional 48 3 63 type optional 49 3 6 4 balancing optional ee 49 3 65 resample optional 22s 49 3 6 6 linking optional 0 s 49 3 6 7 replications optional 49 3 6 8 convergence optional 49 3100 file op onal 4 aoe Xa Race 4x owed as aus 50 INTERVAL optional 2 50 3 1 1 kind required 0E ooo ote ee 50 3 7 2 alpha optional 2 22 2 50 3 7 3 weight A 51 3 7 4 replications
4. g T Zu 2 27 the estimate for o becomes N zu 1 zg F E 2 28 i l This estimate on is the two step OLS estimate of the variance of the elements of The estimated standard error for F becomes analogous to Equation 2 22 2 ay _ 2 29 se 0 0 NG 2 29 As with previously presented standard errors the standard errors given in this section will be supplemented with robust ones in future versions of MLA All the estimators in this section are consistent if J gt and N for each j Al though this may be unrealistic these estimators may be good initial estimators starting values for maximum likelihood estimators In some cases the differences between these estimators and the maximum likelihood estimators is small and therefore these estimators can be used as well Kreft 1996 Van der Leeden amp Busing 1994 2 5 Full Information Maximum Likelihood FIML One of the most important parts of the program consists of the maximum likelihood estima tion This estimation method was chosen for its desirable properties such as consistency and efficiency In maximum likelihood estimation given the observations parameters are found that maximize the likelihood function Mood et al 1974 This is the same as minimizing the minus log likelihood function Assuming normally distributed errors the density of y p given X f and Z p is 1 TO X Z e 20
5. 51 3 7 5 convergence optional 51 3 7 6 file optional 4 om hoo ded ui ek a dd ae 51 PRINT optional 4 cese d om hook ded ua dowd a ae 51 3 8 input optional 2 ns 51 3 8 2 descriptives optional 52 3 8 3 random level 1 coefficients optional 52 3 84 olsquares optional 0 en 52 3 85 residuals optional 20 52 3 8 6 posterior means optional 2 0 0 52 3 8 7 diagnostics optional 52 ZPEOT optional a uio ar Bs 2 Sa ha 53 3 9 histograms optional 2 53 3 9 2 scatters optiondl 0 E 53 vi 4 Worked examples 4 1 Random effects analysis of variance 4 2 Random effects analysis of covariance 4 3 Repeated measures analysis 2 22e 4 4 Multilevel analysis with bootstrap lee Technical appendix A The model and the likelihood function 20 2 Some useful formulas eee A 3 Computational formulas for the function and gradient A 4 The asymptotic covariance matrix of the
6. This substatement specifies the method of bootstrap to be performed It is required whenever kind bootstrap One can choose between three different methods residuals cases and parametric The three methods differ in the way the bootstrap sample is obtained They are described in more detail in chapter 2 A synonym for residuals is error so in the following whenever residuals is mentioned error is also implied residuals or error This method resamples the elements of the estimated Level 1 and Level 2 error vectors Sub sequently a new outcome or dependent variable is computed using these error vectors the orig inal predictor or independent variables and their corresponding parameter estimates With this method the type substatement can be used to choose which type of estimated errors is used for resampling see below cases Using this method a bootstrap sample is created by resampling the original data Thus complete cases are randomly drawn with replacement from the original cases The procedure follows the nested structure in the data by a nested resampling of cases Level 2 units are randomly drawn with replacement and cases within a particular drawn unit are resampled It is also possible to resample only complete Level 2 units where the Level 1 units within a sampled Level 2 units are the same as in the original data set which is useful for repeated measures data or to resample only Level 1 units within Level 2 units
7. 3 4 CONSTRAINTS optional MLA has a limited option for imposing parameter constraints The CONSTRAINTS statement allows parameters to be fixed to a certain value Constraints are imposed as parameter value The parameter is held fixed during estimation and is not used for estimation of the standard errors either The standard error will be zero and no t test is performed for this parameter This feature is 44 still only implemented for the FIML estimation part with the BFGS estimation method It is simply ignored for the various OLS estimators but when requesting constraints with the EM estimation method the program stops with a fatal error Values must be specified as floating point numbers Variances and covariances are specified by connecting the appropriate Level 2 residual terms by an asterisk Example CONSTRAINTS G1 1 0 fix component G1 to 1 0 U1 U1 0 5 fix level 2 variance of U1 to 0 5 U1 U2 0 0 fix level 2 covariance U1 U2 to 0 0 Requests for constraints on the elements of the Level 2 covariance matrix O are only handled well if O is not reparameterized see the reparameterization subcommand of the TECHNICAL command below No error is given if both are requested but the results are typically incorrect This will be fixed in a future version of MLA 3 5 TECHNICAL optional The TECHNICAL statement provides useful possibilities to alter the estimation process It con cerns the estimation method estima
8. A 34 IO sea A 34 and L is the minus log likelihood function Therefore the asymptotic covariance matrix of the estimators is derived from the matrix of second derivatives of L the Hessian matrix From A 29 we have J OL eee XV X y Oy J J J J Thus OL I d 1 v 1 2 xav Xy Xo e 2 EN X dy j j J 23 XIV do OZ v 10 Xy j l J XV X dy using A 21 j l J 117 2 2 yj X do j l J X O j l J v 1 gt dy A 35 j l Therefore L OX X V X A 36 Oy Oy 23 J L 117 2 Dau X XVP Oj A 37 84 and using A 25 and A 26 PL d n 35 08 2 5 Zul ZiVj10 Xjl CV ZAZ V 65 A 38 0 Yay 7 Z Deal ZjV Xv A 39 From A 30 we have z DEL J Xp Thus 1 1 75 5 2s Vies 26 Xy VAO J XyyV X dy 2 q 2 1 y l 34 uv do V Z d6 Z V 1 XW AVDO yj Xy Vi X dy using A 21 My 1 J 1 J 1 v 1 732 zyaexzv tI a Sai Il J x 2 X yy d VDV V7 dvz X jy XyyYVj X dy 2 j l J X 0j Xp Vi Vibo j l J 1 Mas do 5 gt t Ziv Z d6 j l J XyyV X dy 85 y E Hh t2 Il NI Scis m oj
9. at the school level and the slope coefficient for each separate school which is part of the model at the student level As was said above this term refers to the expected influence of size on the regression of math on homework In the terminology of multilevel analysis this term is called a cross level interaction For some researchers this interaction term provides the main attraction to multilevel analysis It is the cross level interaction parameter that leads to the interpretation of slopes as outcomes cf Aitkin Longford 1986 The number of levels Theoretically we can model as many levels as we know the hierarchy has or as we think it will have In practice however most applications of multilevel analysis concern problems with two or three levels Data sets with more than three levels are rare In fact a majority of applications just concerns two level data and can be viewed as within and between analysis problems It should be noted that models with more than three levels show a rapid increase in complexity especially where interpretation is concerned If such models are necessary they should be limited to rather simple cases that is to cases with only a few predictor variables Random Level 1 coefficients In multilevel modeling we are usually not looking for estimates of the regression coefficients within each separate group but for their variances and covariances However there can be
10. Usually however the BC interval is only correct up to order O N and is therefore typically worse than the BC a interval Efron 1987 provided several formulas for a In a one parameter parametric model where 0 is the ML estimator a good approximation for a is a sSKEW where SKEW denotes the skewness of a random variable and is the score function derivative of the loglikelihood with respect to 0 When more parameters are to be estimated which is the case in multilevel analysis these results are no longer valid Efron 1987 gave a formula for a based on reducing the multipa rameter problem to a one parameter problem defined by the so called east favorable direction This formula may be used for the parametric bootstrap in multilevel analysis although the ex pressions will be quite complicated This was done for a simple two level variance components model by LeBlond 2005 sec 7 3 4 but expressions for general implementation in MLA are not yet available For the nonparametric bootstrap the standard formula for a is based on the empirical influence function of 6 This is however not well defined for multilevel data so that this formula cannot be used Tu and Zhang 1992 proposed to estimate a by the jackknife according to the formula N 09 io em 2 6 4 2 71 For multilevel data we would have to replace this jackknife formula with a grouped jackknife method for unequal group sizes s
11. 2 2 e E the Level 1 random term e This term is considered to be a residual or error term The variance of this term has to be estimated from the data The Level 2 equations partly consist of the same terms but also of specific Level 2 equation terms e Bk beta component k corresponding with the Level 1 regression coefficient At this level however Bk is an outcome variable e Gk gamma component k y These are the fixed parameters to be estimated in the multilevel model 43 e Vi one of the variables from the data file as explained above In this case it is a Level 2 predictor variable It means that this variable is considered to have the same value for all Level 1 units within a particular Level 2 unit To be certain that this is the case for each Level 2 variable the average is computed over all Level 1 units within the particular Level 2 unit Note that this feature may be used to create an aggregated Level 1 variable serving as a Level 2 predictor variable simply by specifying a Level 1 variable as a Level 2 variable as well e Uk Level 2 random term k i e u je the k th element of a typical u As with the first level this component is considered a residual or error term but now for the second level The second level may have more than one error term one for each Level 2 equation i e for each element The variances and the covariances of these terms have to be estimated from the data In the eq
12. 869 Hall P 1992 The bootstrap and Edgeworth expansion New York Springer Harville D A 1977 Maximum likelihood approaches to variance component estimation and to related problems Journal of the American Statistical Association 72 320 340 Hox J 2002 Multilevel analysis Techniques and applications Mahwah NJ Erlbaum Kish L 1965 Survey sampling New York Wiley Kreft I G G 1996 Are multilevel techniques necessary an overview including simula tion studies Retrieved August 22 2005 from http www calstatela edu faculty ikreft quarterly quarterly html Kreft I G G amp De Leeuw J 1991 Model based ranking of schools International Journal of Educational Research 15 45 59 Kreft I G G amp De Leeuw J 1998 Introducing multilevel modelling London Sage Kreft I G G De Leeuw J amp Aiken L S 1995 The effect of different forms of centering in hierarchical linear models Multivariate Behavioral Research 30 1 21 Kreft I G G amp Van der Leeden R 1994 Random coefficient linear regression models Tech Rep No PRM 03 94 Leiden The Netherlands Leiden University Department of Psychometrics and Research Methodology Langer W 2004 Mehrebenenanalyse eine Einf hrung f r Forschung und Praxis Wiesbaden Germany VS Verlag f r Sozialwissenschaften LeBlond D J 2005 Methodology for predicting batch manufacturing risk Unpublished master s thes
13. B5x e 8 2 10 Bx ej E 2 11 In the first of these equations the new intercept is 6 x which seems hard to interpret generally whereas in the second the intercept has disappeared Starting from 2 8 the zero intercept in 2 11 is just a special case so nothing to worry about However starting with 2 11 as the model specification i e without intercept 2 8 cannot be written in the same form anymore The intercept has to be introduced again This is not a problem but a logical consequence of the choice with respect to centering In this situation this is obvious but we will see below that similar specification issues occur in multilevel models where they are not so obvious anymore The error term in 2 10 and 2 11 is e 8 The presence of the mean of the errors means that the centered error terms are not independent anymore The covariance matrix of the error terms is not diagonal anymore Moreover it is singular Although this seems like an important problem with centering of the outcome variable it is actually not The OLS estimator of is still efficient and equivalent to the OLS estimator based on 2 8 and the OLS standard errors are also equivalent 2 3 2 Centering around the grand mean in a multilevel model Now let us look at the following two level random coefficients model Vij Bi Bo jij 2 12a By y t Yo ij Uy 2 12b Baj y Y4Waj Uy j 2 12c where 5
14. By Box 8 B B 4 x X amp j B B2 Box Thus by noting that x is a given constant because we condition on x we find Bi B P Bx and a and the model 2 9 satisfies all the usual assumptions Analogously by starting from 2 9 we obtain 2 8 with B 55 B B p5x and o Although the models are equivalent one form may be easier to interpret than the other The difference between them is the intercept parameter In 2 8 6 is the mean of y for x 0 If x Ois not a relevant value then the intercept does not have a useful interpretation For example it is not relevant to know the mean intellectual achievement y of someone with IQ score x equal to zero In 2 9 B is the mean of y for x x Clearly the mean intellectual achievement of someone with average IQ is an easily interpretable and highly relevant number On the other hand if x is gender coded 0 for males and 1 for females then it is not very interesting to know what the mean achievement of someone with average gender is whereas the mean achievement 12 of males x 0 is easily interpretable and highly relevant So it depends on the meaning and coding of the variables whether centering is interpretationally useful Instead of or in addition to the centering of the explanatory variable x we can also center the outcome variable y giving y y y Starting with 2 8 we obtain y B5X
15. This reflects the possible effect of school size on school effectiveness School size may influence the estimated relationship between math and homework At first glance the model presented above seems to lead to a hierarchically structured regres sion procedure which proceeds in two steps First the models for all schools are estimated and then the intercept and slope estimates are used as the dependent variables in the Level 2 model which is then estimated Although such procedures have been proposed in the past this is not what will be discussed here under the heading of multilevel models because there is no statistical connection between the Level 1 and Level 2 models In multilevel models separate regression equations for each level are only formulated because they facilitate insight and understanding The statistical linkage of both levels is created by the Level 2 model which states that Level 1 re gression coefficients intercepts and slopes are treated as random variables at the second level The Level 2 model models intercept and slope estimates as a mean value over all schools plus a school specific deviation or residual It follows that we are not primarily looking for intercept and slope estimates for each separate school but for their means and variances and their covari ance over all schools In this way just as students are considered a sample from a population of students schools are considered a sample from a
16. XY V doy J J 1 EO Oj Xp V Z d OZV Gr e J 1 j do r52 tr V Z d j l 2 Y xps J XWV O 4 REN tr IG XV V Zi Kd OZ V G X I Nile M3 i ll 80 J 1 117 1 2 2 tr V7 IE 22 4 j l J fio Xa yx j l 1 J 17 2 2 2210 Xy de J J 32y 7o Xll j So aL d T ay 7 Xp V X A 29 E and J A 2 D 5 0 XWV O A 30 r and using A 23 and A 24 Sew 7 Zua J v 1 v 1l 2 2 2 7 X Zv o XM fu J J 2 Pu 1 117 1 Xy z v o A 31 and A 32 E 7 3 em 5 2 ziv o Now using A 9 A 12 A 13 and A 15 we find computationally more efficient formulas 81 for the derivatives es s d yd X A j 0 99 j l J te 2 4 ly Xo Iy 6 200 Xj j l J J 2 X9 j l j l J 4 ly7z t07 3 X Z 0G Z y ZIX y j l g y UE Ue ao E 483722 732 017 m V jn J J l l 2 1 30 N Jq 50 J 1 1 2 4 l l 2210 X 1 2 66 Z V 0 J J 1 2 1 2 1 7 50 2 56 J NE VAL 5 25 0 XV j l J 1 4 lz y 1 307 9 0 X ZOG Zi j l J 1 1 N Jq Suet
17. where the Level 2 units are the same as in the original sample but the Level 1 units within each Level 2 units are resampled useful when there are few Level 2 units and many Level 1 units in each Level 2 unit such in studies with many subjects from a few countries parametric This method computes a new outcome or dependent variable using the original predictor vari ables their corresponding parameter estimates and a set of random Level 1 and Level 2 error terms These random terms are obtained as follows New Level 1 errors are drawn from a normal distribution with mean zero and variance G which is the original estimate of the Level 1 vari ance component New Level 2 errors are drawn from a multivariate normal distribution with zero mean vector and covariance matrix which contains the original estimates of the Level 2 variance components 48 3 6 3 type optional The substatement type is only required whenever the substatement kind bootstrap is used in combination with method residuals The type substatement specifies the type of estima tion that is used to determine the Level 1 and Level 2 residuals One can choose between raw shrunken bartlett green and mcdonald More details can be found in Chapter 2 The way in which the Level 1 and Level 2 error terms are estimated from the total residuals is discussed in chapter 2 3 6 4 balancing optional For the bootstrap methods residuals and cases a balanced bootstr
18. y5 is still given by the expression X V X Furthermore assume that all within group sample sizes are equal N x N N J for all j and even and x ye 1 if iis odd and xj 1 if iis even Then TRO ed 0 XV X 0 n where o 0 o is the variance of the random part With independent data with residual variance Bs the variance of the optimal estimator of y which is the OLS estimator is gib N Consequently the effective sample size for y is N 1 N 1 p as before but the effective sample size for y is N 1 p which is larger than N Explanatory variables with random coefficients complicate the situation even further As shown in 2 4 such models can be written as mixed linear models For a typical observation this is written as EE Vij aye where x j and z j are vectors of explanatory variables typically including the constant y is a vector of fixed coefficients u j is a vector of random coefficients with mean zero and e j is a random residual Let as usual O be the covariance matrix of u p and let r j5 jl j 8 be the total residual Then the variance of r j is z Oz j 92 which generally depends on z pb Analogously the covariance of the residuals of observations ij and k is z j9 j and the intra class correlation of these two observations is rd jeu Piiki a V a UA 979 o2 762 o2 38 Thus this depends on z j and z j For this reason it may be called the co
19. 0 0 to 1 0 46 3 5 7 seed optional For diagnostic purposes one can provide an initial number seed for the random number gen erator This is specified by the substatement seed Using the same initial seed the simulation samples will be identical The seed value must be an integer between 1 and 1 073 735 823 The random number generator used is the Mersenne Twister with improved initialization Matsumoto amp Nishimura 1998 Nishimura amp Matsumoto 2002 3 5 8 file optional The technical output can be written to a separate file The file is specified after the file sub statement under the TECHNICAL statement and must be a valid file name Only the essential information is written to this file Its content changes over time and some inspection will show what is written in this file 3 6 SIMULATION optional Several options for simulation are available in MLA These are jackknife bootstrap and permu tation Theoretical details concerning the implementation of these resampling methods for the two level model can be found in chapter 2 With the substatements provided with the SIMULATION statement one can choose between the different kinds of simulation using the keyword kind and specify special simulation features using the keywords method type and resample Additional features are the number of repli cations and the initial seed for the random number generator replications and seed Finally one can specify a s
20. 11 21 1 1 1 1 11 1 1 1 1 1 1 1 3 2004 du mseLceencn2s dedl2n elccnli 2 3101 predictor 2 5676 Scatterplot predict or V5 vs residual E Figure 4 1 An example of Level 1 scatterplots 70 Scatterplots resumed 1 0222 4 2 2 2 2252 9 2 2o 4 desece 2e c2ee RBes 2 1 1 1 r 1 1 6 1 1 5 1 I s 1 i 1 d u a 1 1 1 1 1 1 4021 Ha SRSsa an 1 3677 predictor 1 0436 Scatterplot predictor V6 vs residual U1 0 6003 4 Sate ERS 1 1 1 r 1 e s 1 i 1 d 1 u 1 i 1 a 1 1 1 1 1 1 1 0 5131 45 2 2 9 2 2e oe sooo com Se te tc es aaa 1 3677 predictor 1 0436 Scatterplot predictor V6 vs residual U2 Figure 4 2 An example of Level 2 scatterplots 71 Histograms Count Midpoint 3 Extreme 1 2472 1 4059 1 4347 5 1 5925 xxxxx 7 1 6784 xxxxxxx 6 1 7644 xxxxxx 20 1 8503 XXXXXXXXXXXXXXXXXXXX 17 1 9363 XXXXXXXXXXXXXXXXX 22 2 0223 XXXXXXXXXXXXXXXXXXXXXX 28 2 1082 XXXXXXXXXXXXXXXXXXXX 22 2 1942 XXXXXXXXXXXXXXXXXXXXXX 18 2 2801 XXXXXXXXXXXXXXXXXX 17 2 3661 XXXXXXXXXXXXXXXXX 11 2 4520 xxXXXXXXXXX 11 2 5380 xxxxxxxxxxx 8 2 6240 xxxxxxxx 5 2 7899 xxxxx 4 2 7959 xxxx 3 Extreme 2 8032 2 8839 3 0018
21. 8 3 30 Mood A M Graybill F A amp Boes D C 1974 Introduction to the theory of statistics 3rd ed Singapore McGraw Hill Murphy K amp Topel R H 1985 Estimation and inference in two step econometric models Journal of Business amp Economic Statistics 3 370 379 Nishimura T amp Matsumoto M 2002 A C program for MT19937 with initialization improved 2002 1 26 Computer software Retrieved August 29 2005 from http www math sci hiroshima u ac jp m mat MT emt html Nocedal J amp Wright S J 1999 Numerical optimization New York Springer Parke W R 1986 Pseudo maximum likelihood estimation The asymptotic distribution The Annals of Statistics 14 355 357 Putter H 1994 Consistency of resampling methods Unpublished doctoral dissertation Leiden University Leiden The Netherlands Quenouille M H 1949 Approximate tests of correlation in time series Journal of the Royal Statistical Society B 11 18 84 Quenouille M H 1956 Notes on bias in estimation Biometrika 43 353 360 Rasbash J Steele E Browne W amp Prosser B 2004 A user s guide to MLwiN Version 2 0 London Multilevel Models Project Institute of Education University of London Raudenbush S W 1988 Educational applications of hierarchical linear models A review Journal of Educational Statistics 13 85 116 Raudenbush S W amp Bryk A S 2002 Hierarchical line
22. Analogously from A 32 we have T LO ory 70 2 377 2 zv 6 J and J 2 ZVP zia OZV Za j l Ms 1 24 i NI E M y AZ V Zjd ll x 207 0 X0 XrV zj j nm 1 p ES xpo xmv z a ll an 2 S 5 V X dy Z V 0 Xj J 2 2 V l X dy j l 89 zv o x Zv o ol do fe 9 46 A 47 Combining A 47 with A 26 we have L J 1 2 i 4i Piu E 1 d ZVi Zpy v 1 x Ziv o Xv Xr v zj J 1 117 1 v 1 t3 2 ZV X9 XV Zj x Z V VZ Juk N 1 1 2 326 vy S Y ZVZ din j l v 1 17 1 x Ziv o Xy XV Zhu because the matrices between brackets and parentheses in A 48 are symmetric A 48 Now from A 33 and A 34 we have to take expectations of the second derivatives There fore from A 4 and A 5 the following expectations will be used 0 XN xy Vi From A 36 A 37 4 38 and 4 39 we have 757 j l 2 E OL Oy do 2 E oL 0 Oy O y and 9L E 0 E 26 From A 41 A 42 and A 43 we have A 49 A 50 A 51 A 52 J l sA A 53 j l 2 J xs 2 E J 2 E Ziv 6 XO Xp V7 Z ZVj6 X9 Xo Zah J 2 ZVP Zui A
23. Example DATA file sesame dat data set from Glasnapp and Poggio 1985 variables 3 total of three variables id2 m Level 2 identification given by first variable missing v3 999 v3 999 means missing 3 2 1 file required This substatement indicates the name of the data file The name is given after the equals sign and must satisfy the usual DOS conventions on filenames If the file is in the current directory the complete pathname is not necessary The file itself is a free field formatted numbers only ascii file This means that values of variables must be separated by at least one blank A case may consist of more than one line Cases must be sorted by the Level 2 identifier variable see below 3 2 20 variables required The variables substatement specifies the number of variables in the data file Because the data file is a free field formatted file and one case may consist of more than one line this is necessary information for the program to determine when to start a new case 3 2 3 141 optional With this substatement a case number variable can be given Level 1 units are interchangeable within a Level 2 unit Therefore a Level 1 identifier variable is not necessary However it can be useful in those situations where the output gives specific information about cases at the first level The variable number has to follow the keyword id2 and it must indicate the position of the identifier variable in the data file Th
24. H 2 i l N c3 EE X pu skewness Ha 2 5 ES 1 X HU kurtosis he i 3 3 N where is the measurement of individual i on a typical variable and is the total sample size Another descriptive statistic that is provided is the Kolmogorov Smirnov Z statistic This is a measure of deviation from the normal distribution It tests whether the observed variable has a normal distribution It is defined as the maximum distance between the estimated empirical cumulative distribution function and the best fitting cumulative normal distribution function It is computed as follows Stephens 1974 First sort the values of a given variable X such that X is the smallest value and Xy is the largest Then compute w X 10 8 i 1 N and z P w where is the cumulative distribution function of the standard normal distribution Now Kolmogorov Smirnov s Z is defined as icd e nu may max z NN 3l The asymptotic distribution of Z was derived by Durbin 1973 but this is very complicated and more importantly this derivation does not apply to multilevel data because of the intra class dependency Therefore in MLA a simpler approach is used where p values are reported that are based on the assumption that the normal distribution is completely specified beforehand not estimated This is also not entirely correct because u and are estimated but it is sufficient for descriptive
25. The delete m j jackknife estimator and the delete m j jackknife variance estimator can be used to construct the jackknife normal confidence interval Zaf jm Z1 La7 Jon 2 49 where z D and X is the standard normal distribution function The jackknife normal confidence interval relaxes the normality assumption for the data However the interval relies on the asymptotic normality of the estimators which may in finite samples not be approximately satisfied Busing 1993 Other jackknife confidence intervals are not applicable or are probably worse due to the limited use of the pseudo values 2 10 3 The bootstrap The bootstrap was introduced by Efron 1979 as an alternative to the jackknife The idea of the bootstrap is that the empirical distribution function is a consistent estimator of the distribution function in the population Let Z be a random variable with distribution function F and let z4 Z2 zy be a random sample of size N from F Now the empirical distribution function F N in some point z is the proportion of z that are smaller than or equal to z 2 Fy 2 50 If Z has a multivariate distribution this formula has an obvious generalization and all subsequent formulas will also have obvious generalizations It is known e g Mood et al 1974 p 507 that as N oo F y z F Z Let 0 be a parameter associated with the distribution F 6 O F and
26. before the analysis have been included See section 2 3 e Restricted maximum likelihood REML estimators have been added These tend to give less biased estimators of the random parameters See section 2 6 e Improved grouped jackknife estimators have been implemented These are more efficient than the grouped jackknife estimators that were implemented in MLA 1 0b The correspond ing jackknife variance estimators are much better than the original ones See section 2 10 1 e Balanced bootstrapping has been added for the cases and residual bootstrap This may be slightly more efficient statistically although it is computationally a bit more complicated See section 2 10 3 e For the residual bootstrap linked resampling is added which gives more robust estimates when the Level 1 and Level 2 errors are dependent although it is less efficient when they are independent as assumed in the model specification See section 2 10 3 For all bootstrap methods various types of bootstrap confidence intervals have been added Some of these are typically much better for the random parameters than the ordinary estimate 2s e confidence intervals because they take the skewness of the distribution of the estimators into account See section 2 10 3 e The OUTPUT command has been replaced by the PRINT command and a few options have been changed See section 3 8 e The PLOT has been added which offers some rough plotting options See secti
27. circumstances in which we still want to obtain the best estimates for these coefficients also called random Level 1 coefficients Such questions may arise for example in education when schools are to be ranked in terms of effectiveness using their estimated slope coefficients Kreft amp De Leeuw 1991 The first thing that comes to mind is to simply estimate them by a separate OLS regression for each school However this procedure has the serious disadvantage that the coefficients will not be estimated with the same precision for each school For instance in one school we could have say 45 students whereas in another school we only have 7 students This will definitely influence the accuracy of results Within the framework of multilevel analysis there is a way to obtain best estimates of these coefficients by a method called shrinkage estimation The underlying idea of this estimation is that there are basically two sources of information the estimates from each group separately and the estimates that could be obtained from the total sample ignoring any grouping Shrinkage estimation consists of a weigted combination of these two sources The more reliable the estimates are within the separate groups the more weight is put on them Vice versa the less reliable these estimates are that is the less precise the more weight is put on the estimates obtained from the total sample The result is that estimates are shrunken towards
28. computational advantages as well This practice has also been advocated for multilevel analysis but the consequences for multilevel analysis are not as innocuous as for ordinary linear regression analysis Moreover in multilevel analysis there are two possibilities for centering the data The first is grand mean centering i e the sample average of all observations is subtracted and the second is Level 2 centering or within groups centering where the sample average of only the observations within the same Level 2 unit is subtracted Both forms are implemented in MLA as options Here we will discuss the basic issues involved For a more extensive analysis see Kreft De Leeuw and Aiken 1995 Van Landeghem Onghena and Van Damme 2001 De Leeuw 2005a and the references therein 2 3 1 Centering in linear regression Let us start by considering a simple bivariate regression model y By Box Ep 2 8 where 6 and B are fixed coefficients and the are i i d random errors with zero mean and variance o2 and the other notation is obvious Suppose now that we center the explanatory variable x Then in the model specification x is replaced by x x x where x is the sample average of x This leads to the model y Bi Box E 2 9 with Var Toe It is easy to see that without restrictions on the parameters the models 2 8 and 2 9 are equivalent if 2 8 and its parameters are given we can write y
29. concerning employees nested within industries It should be noted that nested structures naturally arise where explicit hierarchical sampling schemes are used This is often the case in large scale educational research where for instance a set of schools is sampled first followed by the sampling of a set of students within these schools However there are many other cases where data are not explicitly sampled in that way but where it appears to be a fruitful approach to treat them as having a hierarchical structure For instance in a medical study one could consider it to be important that patients can be viewed as nested within general practitioners Apart from this there are several types of data for which it proves to be very useful to apply the concept of hierarchy because it makes their analysis more easy and transparent One example is the hierarchical treatment of repeated measures data where measurements at different points in time are considered nested within individuals Another example is the analysis of data from meta analysis where say p values can be treated as being nested within studies providing a partial solution for the problem of comparing apples with oranges With hierarchical data it is common to have information obtained from the different levels in the hierarchy For instance one has variables describing the individual students but also variables describing their schools When analyzing such data one has to decide in
30. effects analysis of variance To illustrate how to run a random effects ANOVA using MLA we consider part of the Sesame Street data set The original data set from Glasnapp and Poggio 1985 is used in Stevens 1990 for an analysis of covariance The original data set included 12 background variables and 8 achievement variables for 240 children from 5 sites Here we only use the data from the first 3 sites and only consider the achievement variable measuring knowledge of numbers This variable was measured on two occasions In between these occasions the children watched a series of the TV program Sesame Street This series intended to teach pre school skills to 3 to 5 year old children We will now perform an analysis of variance on these data Here Level 2 j indicates the site and Level 1 7 the child The model to be estimated is Vig V U Ejj 4 1 where y is the overall mean on the posttest score u j is the Level 2 deviation from y or Level 2 error component and e j is the Level 1 deviation from y u p the average score of unit j also called the Level 1 error component Equation 4 1 can be divided into two separate equations one for each level ME ETT Bj ytu In this way the deviations or error components for the different levels are easily seen These equations are also the equations that are to be used in MLA to specify the model Along with the other statements the input file is as follows 55 TITLE analys
31. exploratory purposes The formula used cf Mood Graybill amp Boes 1974 p 509 is Pr Z 22 Y EDENE 2 7 j l where the series is truncated after convergence of the sum Note however that normality of the observed variables is by no means essential for the validity of the multilevel model specification even if FIML is used The latter only assumes that the distribution of the error terms is normal which implies that the conditional distribution of y given X and Z is normal This leaves consid erable freedom for the marginal distributions of the observed variables Finally MLA computes a number of quantiles of the variables These are the minimum and maximum and the 5th 25th 50th 75th and 95th percentile The k th percentile is defined as the value such that k of the observations on a variable are smaller than this value and 100 k larger The 25th and 75th percentile are the first and third quartile and the 50th percentile is the median 1 2 3 Centering In social science data the variables typically do not have a natural zero point and even if there is a natural zero it may still not be an important baseline value Therefore in regression analysis and other multivariate statistical analysis methods variables are often centered so that the zero point is the sample average which is an important baseline value This tends to ease the interpretation of the parameters especially the intercept and it sometimes has some
32. if so how to center is an intricate one and may lead to unexpected results It is much more complicated than with non hierarchical data This has led to extensive discussions in the multilevel modeling literature See section 2 3 for an introduction to this topic and references to this literature 3 2 7 level 2 centering optional Level 2 centering means centering within contexts i e subtracting the group mean Following the level 2 centering substatement the variable numbers of the variables that will be centered are given separated by commas These variables will be centered within groups directly after reading the data but before any analysis As mentioned above the choice to center may lead to unexpected results See section 2 3 for a discussion 3 3 MODEL required The MODEL statement is followed by a set of equations that specify the model that has to be estimated Every equation must be on a single line There is only one Level 1 equation but there may be one or more Level 2 equations The order in which the Level 1 and Level 2 equations appear is arbitrary The terms used in the Level 1 equation are e Vi variable i which is the i th variable in the data file Vi may be either indicating the outcome variable or a predictor variable e Bk beta component k i e jk the k th element of a typical p cf 2 1 At Level 1 these are the random regression coefficients which are the outcome variables at Level 2 cf
33. j and j are random coefficients the y s are fixed coefficients w j and w j are two Level 2 explanatory variables and u j and u j are random residuals with zero means and covari ance matrix O We will now try to rewrite this model in a form with x j centered around the grand mean Yj By jx Bo Xj x Ej pk Big t Poty tt Bi 3x YW j aw uy Xu j Y YjW j t YsWaj Ujjs where y y4 and baj is the same as in 2 126 The variance of uy is A 01 9 216 XO and the covariance of ui and u j is x s 65 O5 XO 13 This shows that after the centering a variable that was included in the equation for j but was not included in the equation for B j now is included in the equation for Pip Furthermore even if is restricted to be zero 65 is nonzero This illustrates two practical points for the specification The first is that as noted above the origin of many variables is arbitrary We now see that by a simple change of origin a variable that was not included in the equation for the random intercept becomes part of this equation The conclusion must thus be that all explanatory variables that are included in one or more of the equations for the random slopes must also be included in the equation for the random intercept From 2 3 this can also be interpreted as that for every interaction effect the main effect must also be inclu
34. j l 1 J BOUT 2 0 j Xf Dy 072 98 Zyv J J 1 ly 58 i gt O XZ 0G o 7G ZNO j l 82 J i 2 1 2 1 230 N Ja 50 20 Miti 1 6 lyy 50 Ys Zx OG Ziy Z Ab 1 6 27 50 2 7 ZXp 0G Z y Z j l F l l 1 50 7 N Jq 50 2 Miti y X j l y E X j l jx jn jx jn 1 Toc Sey ZX yy 6d G G 22 ZXy j l 2 j l J 1ZiV 0 XMklZVj 0 XMhi J P 2078 ZZ j l J 2076 Ziy ZX Milo 76 Zy j l and J J m c ly I 2pn l oz 35 73240 G ZiZa 5 2 e Gj Zi J These formulas are implemented in the program Note cf Bryk amp Raudenbush 1992 p 239 that terms 34 Y XX p Zyj p p and ZL j through G p the first of which is a scalar the second a p vector and the third a symmetric p x p matrix The last three are a q vector a q x p matrix and a symmetric q x q matrix for each Level 2 unit The symmetric matrices may be stored linearly the function and the derivatives depend on the data only through the thereby saving additional memory 83 X Mi A 4 The asymptotic covariance matrix of the estimators The asymptotic distribution of the maximum likelihood estimators under appropriate general conditions is given by see Magnus 1978 7 d 10 VN 6 gt N Jim A 33 where N is the sample size PL
35. let 0 be an estimator of 0 from a sample 6 0 z 25 zy OF x The idea of the bootstrap is now to simulate the 28 sampling and estimation process where samples are drawn from F y Which is completely known once the original sample is obtained In the simulation the distribution F N plays th the role of F and 0 plays the role of 6 Simulation samples zi gt 25 zy are drawn from Fy and is estimated by 0 in the same way 0 was estimated by 9 Because F N F it is assumed that the properties of the estimator based on the distri bution F y give information about the properties of based on the distribution F For example the bias of 6 based on the distribution F y is taken as an estimator of the bias of based on the distribution F It has been proved by many authors that this approach works in many cases that is that it leads to consistent estimators of the properties of 0 e 6 Putter 1994 The actual im plementation of the bootstrap is quite simple Drawing samples from F y i equivalent to drawing samples with replacement from z Z2 zy The bootstrap is now implemented as follows B bootstrap samples z5 255 zj y 9 1 B are drawn from F w gt that is these samples are drawn with replacement from z Z Zjy From each of the B samples the parameter 0 is estimated thereby obtaining B estimators 67 b 1 B Now the expectation of 6 given F y is estimated by the mean of
36. literature An incomplete list of text books is Kreft and De Leeuw 1998 Snijders and Bosker 1999 Hox 2002 Raudenbush and Bryk 2002 Goldstein 2003 and Langer 2004 The latter uses MLA in his empirical examples Small example A small imaginary example from education may clarify what is meant by a multilevel model Suppose we have data of students nested within schools and we want to predict the score on a math test from the amount of time spend on doing math homework Furthermore we expect smaller schools to be more effective than larger ones so we collect the school size as another variable Clearly at the student level math is the dependent variable and homework is the predictor variable At the school level size is the predictor variable Now the multilevel model for this example in this case a two level model is specified as follows At the student level Level 1 for each school a regression model is formulated with math as the dependent variable and homework as the predictor This reflects the intra class dependency of the observations the students within each school All models contain the same variables but we expect them to yield different intercept and slope estimates within each school At the school level Level 2 a regres sion model is formulated in which the intercepts and slopes of the first level models are dependent variables predicted by the second level variable size
37. met approximately To be able to get an indication of how severe the finite sample size and possible nonnormality influence the results the MLA program offers simulation options In this section the theory underlying these simulation options will be described This focus will be on the possible bias of the estimates and on the possibly incorrect standard errors More subtle information can however be extracted from the program by using a file to write the simulation results to The bias of an estimator 0 of some parameter 0 is defined as the difference between the ex pected value of the estimator and the true value of the parameter A desirable property of an estimator is unbiasedness which means that its bias is zero In the maximum likelihood theory discussed so far however it was only stated that the FIML estimators are consistent This means that as the sample size gets larger the mean of the estimator converges to the true parameter value and its variance decreases to zero Informally speaking the estimator comes closer to the true parameter value as sample size gets larger This is of course a highly desirable property but it does not ensure that the estimator is unbiased in finite samples In fact maximum likelihood estimators are in many models and situations biased in finite samples For a general class of re gression models including multilevel models however Magnus 1978 proved that the maximum likelihood estimators of the fixed
38. part Z jj t 8j in Equation 2 4 conditional on Z is expressed as _ 2 V ZjZ oly 2 5 A model for the complete data follows straightforwardly from stacking the J groups models in Equation 2 4 Its equation is y x d 5 uy Aye M X 0 0 Z uy EJ or y Xy Zu 2 6 The covariance matrix of the complete data conditional on X and Z is OVO O 0 2 oloo ollo z V f o2ly 0 Z 00 9 Do 0 0 0 V 0 0 c V The parameters of the model that have to be estimated are the fixed coefficients elements of the vector y the covariance matrix O of the random coefficients and the variance o of the errors The elements of y are called the fixed parameters and o and the elements of O are jointly called the random parameters In the following formulas are presented for the various parts of the output of MLA The order of this chapter is similar to the order of the output of MLA as will become clear later on 2 2 Descriptive statistics When requested MLA produces the following descriptive statistics mean standard deviation variance skewness and kurtosis Any statistical package will produce these statistics as well 10 Before looking at the other output it may be useful to inspect these statistics Their formulas are mean Mz E ll X I 9 IH z Mez standard deviation ae X i 1 1 variance Ve
39. population of schools There are several reasons why it may be useful to consider the school specific coefficients as random First the schools in the data set are usually a random sample from the popula tion of schools and scientists are usually interested in the population rather than the specific data set Second with a model that explains part of the variation in the random coefficients the effect of the school level variables on the student level relationships can be assessed and in par ticular the model can give guidance to schools that want to improve their effectiveness Third the relationships between the outcome variable and the student level predictors become clearer Between school variation that may blur these relationships is accounted for and consequently the estimates of the average coefficients are more precise School specific estimates of intercept and slope can however be obtained This will be dis cussed below under the heading of Random Level 1 coefficients Cross level interaction If a school level predictor variable like size is added to the Level 2 model in our imaginary example means and variances change to conditional means and variances It means that part of the variance of intercepts and slopes among schools is explained by size The contribution of this school level variable introduces a term to the model that specifies a relationship between both levels The relationship between size
40. regression coefficients are unbiased On the other hand Busing 1993 showed in a Monte Carlo simulation study that the maximum likelihood estimators of the random parameters in multilevel models are biased The standard errors of the maximum likelihood estimators that are reported by MLA are derived from asymptotic theory This means that they are based on the idea that as the sample size goes to infinity the distribution of the estimators will converge to a multivariate normal distribution with a certain covariance matrix see Appendix A The reported standard errors that are the square roots of the diagonal elements of this matrix The exceedance probabilities of the according t values that are reported are based on the approximation of the distribution of the estimators by the normal distribution In finite samples this approximation may not be very good The true standard errors may be quite different from the reported ones based on asymptotic theory and the distributions of the estimators may not be normal In fact Busing 1993 showed in his simulation study that the distributions of the random parameters can be severely skewed As mentioned above however the focus is on the bias and the standard errors and not on the specific distribution 24 2 10 4 The jackknife The jackknife was originally introduced by Quenouille 1949 1956 to estimate the bias of an estimator and to correct for it Tukey 1958 proposed an accompanying estim
41. the Level 2 residuals to their theoretical mean of zero are calculated by M WO lu b Now residuals are displayed for which M i is larger than the critical value corresponding to a possibly user specified p value of a chi square distribution with q degrees of freedom where q is the dimension of u This p value is the same as for the Level 1 outliers 2 10 Simulation The maximum likelihood theory discussed so far is based on a few assumptions the most impor tant of which are 23 e The model i e the conditional expectation Xy and covariance matrix V is correctly spec ified The standard errors t values exceedance probabilities and likelihood ratio tests were derived under the condition that at least the most general model that is being estimated is correct in the population e The Level 1 and Level 2 u random errors are normally distributed The likelihood function was derived under this assumption and therefore the FIML estimators and the estimators of their standard errors depend on it e The sample size is large More specifically the properties of the maximum likelihood estimators such as their consistency their asymptotic efficiency and their asymptotic normal distribution as well as the formulas for their standard errors were derived under the assumption that the sample size goes to infinity N 9 oo In practice these assumptions will not be completely satisfied One can only hope that they are
42. the error bootstrap If the X and W variables are considered random nonparametric bootstrap samples can be drawn by resampling complete cases This is however somewhat more complicated than in re gression analysis because the hierarchical structure of the data should be respected The bootstrap samples can be drawn in the following way First a sample of size J is drawn with replacement from the Level 2 units This gives a sample j k 1 J of Level 2 unit numbers and accom panying Level 2 variables Wp Then for each k a nonparametric bootstrap a of complete cases from the original nm j jy is drawn giving 7 X2 k L J i GN Pi This is called the cases bootstrap for both levels It is also possible to draw bootstrap samples from the Level 2 units only keeping all the y s X s and W s fixed once a Level 2 unit is drawn This is useful when the data within the unit can not be considered a simple random sample for example with repeated measures data or families Then a complete Level 2 unit is temporarily regarded as a single observation and bootstrap samples are drawn from these observations With repeated measures this implies that for each subject that is drawn in the bootstrap sample the data for all the timepoints are exactly the same as in the original sample For a family this means that the complete family is kept together and that once the family is drawn in the bootstrap sample mother father and children
43. variable The second and third variables are the score on the pretest and the posttest respectivily Formulas defining these descriptives can be found in section 2 2 Part 4 gives the optional OLS estimates which are requested by the olsquares yes option in the PRINT statement As described in section 2 4 ordinary least squares estimation yields two different estimates for the Level 1 variance component one by ignoring the hierarchical data structure and one using this structure These are both displayed in this part of the output with the one step estimate labeled E 1 and the two step estimate labeled E 2 U1 U1 gives the variance estimate for the Level 2 variance component U1 Ordinary least squares estimates Fixed parameters Label Estimate SE G1 31 016760 0 963540 Random parameters Label Estimate SE 1 166 185111 17 615587 U1 U1 29 469076 24 061400 E 2 136 503030 14 469292 E 1 one step estimate of sigma squared ignoring grouping E 2 two step estimate of sigma squared See documentation for further elaboration on these subjects 59 As can be seen the overall mean G1 equals the mean of Variable 3 the score on the posttest 31 02 Ignoring grouping will result in 166 19 for Using the two step procedure lowers the estimate to 136 50 and also gives an estimate of the variance of u j 29 47 Part 5 contains the FIML estimates This part is default and appears in all output Compared to the previous
44. what way the hierarchical structure of the data is taken into account Obviously the easiest approach is simply ignoring the structure and analyzing the data at the student level leaving all school information for what it is Generally however one s intention will be to use all information in the data and use it correctly Thus if one is also interested in school differences and in their possible interaction with effects measured at the student level one has to solve the unit of analysis problem This means that one has to decide whether to analyze the data at the student level incorporating disaggregated variables from the school level or to analyze the data at the school level incorporating aggregated variables from the student level Unfortunately both of these strategies are subject to serious disadvantages De Leeuw 2005b Hence traditional single level analyses fail in the presence of nested data Intra class dependency The basic problem with hierarchical data is that group membership may cause intra class depen dency People from the same group are more alike than people from different groups The reason for this phenomenon is that people within a group share the same environment have the same leader experiences and so forth In other words people within the same group have the same score on a number of variables most of them never measured and thus omitted from any possible model Hence if we fit a common si
45. 02 Fly 2 32 where r contains the total full information maximum likelihood residuals for group j i eT y X f where 7 is the estimator of y and and GO are the FIML estimators of and Ge respectively The formula is computationally rather inefficient Therefore the following more efficient formulas will be used One can write Z 0y and taking V Zj6Z Fily from 2 5 717 1 25 14 ZV O G Z from A 12 where A 7 Ar G I 2 6 2 33 then 217 1 TEA OY 2 167 Iyo OG UP Finally the shrunken residuals for Level 1 follow from 2 17 7 Zu 2 34 2 8 Posterior means The posterior means are the shrunken estimators of 6 f They are the expected values of the p given the data and the maximum likelihood estimates of y O and ge They are derived from the shrunken residuals and their formula is B Wirt 2 35 22 where is the estimate obtained by full information maximum likelihood and u is taken from 2 32 This can easily be shown to be equal to B Q 4 W 2 36 where is the within group OLS estimator 2 15 of B Aj XOX J and the notation is of 2 1 and 2 2 Thus from 2 36 B can be seen as a matrix weighted average of the within group estimator E former being unbiased and the latter being more efficiently estimated The more efficient p is estima
46. 06751 i 1 072559 Random parameters Label Estimate SE Confidence Interval U1 U1 0 611998 0 144656 0 372347 0 724392 U2 U1 0 060488 0 078446 0 060996 0 191503 U2 U2 0 236573 0 053698 0 136783 0 253574 E 1 208053 0 077424 1 075827 3 1 233065 Note confidence intervals computed by bias corrected percentile method Figure 4 1 gives an example of the Level 1 scatterplots that are output figure 4 2 gives an example of the Level 1 scatterplots that are output and figure 4 3 gives an example of the histograms that are output These are not state of the art high resolution graphics but they may be useful for diagnostic purposes 69 Scatterplots 11 1 11 152 1 1 1121 1 1 1 21 2 113 72 21 1 1 11 2213 1 13 31 1 1 112111 113 4211231 3 31232 1 12112 1 amp rt rn amp 0 H 11 1 1 1 1 21 11122 2111 13 231 2 11 1 432 122111 2 1 31151 11 31 1 12 1223 11111 1 213 1 2 122 111 1 1 1 c1 9893 qe Mee SaaS ies rcm Minim 3 5414 Scatterplot predict observed ed vs observed 2 2941 Sa a aaa gar a 1 1 1 1 2 1 11 1 12 11 1 1 1 1 11 11 1 1 1 11 1 21 2 211 21 1 11 r Te 23 22d 1 2 121 111 11 e 1 11311112 1311 1 11 21 s 1 1132112 1 14113 1 1 1 1 i 1 11 1 11 11 1212 113 11 d 113 2 11 113 21131 1122 1 1 1 1 u 1 1 2 11 1 11 11 1 a 1 1 2 11 02 1 1 1 1 111 111 12 1 1
47. 1 Consequently using equation A 10 det V det A det D CA7 B det D det A BD C det o I det I Xo I CZ 9 l o i det I Z Z 0 0 c det G A 11 The factor ZiV In the following the factor ZV will frequently pop up This factor can be written in a computationally more efficient form ziy Zio ly i c Z 0G Z from A 9 Jj a27 0 4 ly Z Z 2 0G Z a27 2 7 2 l gt o Z c ZZ8 o 6 Z 2 2 2 l o Z o U Z Z O a ET Z NEP 2 1z o G 1 G Z from A 7 a27 217 2p l7y o Z G Z v 2nm lu G Z From A 12 it follows that 17 1 1 7 ZV Z 0 G ZZ and vr 727 25 1074 y 1 ZV Z 0 G ZV Z 4 227 G ZZ The traces of y and V7 From equation A 9 we find trV tr c zeG z o0 N c t Z 6G Z 0 wZZ 0G c N o t Z Z 0 0 G o0 N o t ZZ 8 o 1 G o0 N o u G 0 N o tel Gj E oN oq 07 tr G 76 A 12 A 13 A 14 A 15 Similarly using A 9 A 15 and A 12 rV tr oy 0 Z 0G a y d 7 04 o 4 tr Z 0G 07G Z oN q o tG o tr Z Z 0 07 G G o AN g c tr a o N a o Tu Gy tc oce o0 N g o tG7 A 16 Diff
48. 1 02 12 89 166 19 0 07 1 13 1 37 0 05 Var Minimum P5 Q1 Median Q3 P95 Maximum 1 1 00 3 00 3 00 3 00 3 00 3 00 3 00 2 4 00 52 00 52 00 52 00 52 00 52 00 52 00 3 0 00 54 00 54 00 54 00 54 00 54 00 54 00 Data descriptives for level 2 unit 1 Level 1 units 60 Var Mean Stddev Variance Skewness Kurtosis K S Z Prob Z 1 1 00 0 00 0 00 0 00 0 00 3 87 0 00 2 22 22 13 26 175 90 0 60 0 85 1 28 0 08 3 30 08 13 93 194 04 0 09 1 38 1 13 0 16 Var Minimum P5 Q1 Median Q3 P95 Maximum 1 00 1 00 1 00 1 00 1 00 1 00 1 00 2 4 00 52 00 52 00 52 00 52 00 52 00 52 00 3 6 00 54 00 54 00 54 00 54 00 54 00 54 00 58 Data descriptives for level 2 unit 2 Level 1 units 55 Var Mean Stddev Variance Skewness Kurtosis K S Z Prob Z 1 2 00 0 00 0 00 0 00 0 00 3 71 0 00 2 26 04 9 50 90 26 0 51 0 61 0 97 0 30 3 38 53 10 65 113 48 0 77 0 15 0 89 0 41 Var Minimum P5 Q1 Median Q3 P95 Maximum 1 2 00 2 00 2 00 2 00 2 00 2 00 2 00 2 11 00 48 00 48 00 48 00 48 00 48 00 48 00 3 8 00 53 00 53 00 53 00 53 00 53 00 53 00 Data descriptives for level 2 unit 3 Level 1 units 64 Var Mean Stddev Variance Skewness Kurtosis K S Z Prob Z 1 3 00 0 00 0 00 0 00 0 00 4 00 0 00 2 16 56 7 19 51 65 9 55 0 10 1 05 0 22 3 25 44 10 43 108 85 0 12 0 78 0 79 0 56 Var Minimum P5 Q1 Median 03 P95 Maximum 1 3 00 3 00 3 00 3 00 3 00 3 00 3 00 2 4 00 37 00 37 00 37 00 37 00 37 00 37 00 3 0 00 46 00 46 00 46 00 46 00 46 00 46 00 The first variable is the Level 2 identifier
49. 2 which are also used for the function and gradient cf section A 3 so that io memo is required for data storage 92 Let be the matrix defined by these expressions Then H p 109 lm N Noo N where 7 0 is given by equation A 34 and 8 is the parameter vector that has to be estimated So H N is a consistent estimator of the asymptotic covariance matrix of VN 0 or is the estimator of the covariance matrix of 8 A 5 Reparametrization In the formulas of the previous sections all parameters were treated as completely free parameters But o should obviously be nonnegative because it is a variance Similarly 9 should be a positive semi definite matrix because it is a covariance matrix As discussed in chapter 3 MLA offers two options for reparameterization to impose these restrictions root and logarithmic Corresponding formulas will be derived in a next version of this manual Here an old description of a previous reparameterization method based on the Cholesky decomposition and used in earlier versions of MLA is still given instead The parameters can be written in the following way o oy A 59 0 CC A 60 where C is a lower triangular matrix i e with zero elements above the diagonal Equation A 59 states that o should be the parameter used by the program not Equation 60 expresses 6 in its Cholesky decomposition and the elements of C sho
50. 5 j Yj y 2 4 2 so that the contribution of Level 2 unit j to the minus log likelihood function is L log fy IX Z j N l 1 1 v 1 og 27 2108 det V 20 V O X y and the minus log likelihood function for the whole sample is simply the sum of all Level 2 units 11 L This is the function that has to be minimized with respect to the parameters to obtain maximum likelihood estimators Specifically it will produce a set of fixed parameter estimates y and a set of random parameter estimates for the second level and c for the first level Details can be found in Appendix A The asymptotic covariance matrix of the estimators is derived from the matrix of second derivatives of L the Hessian matrix This covariance matrix is used for the standard errors for both fixed and random parameters For a detailed description of the derivations used and an extensive discussion of the computa tional formulas used in the program Appendix A contains all information 20 2 6 Restricted Maximum Likelihood REML Restricted maximum likelihood REML estimators are the maximum likelihood estimators of a transformed version of 2 6 In particular let K be an N x N p matrix of rank N p such that K X 0 Then w K y K Zu K e does not depend on y Its distribution is given by w N 0 K VK and its 2 log likelihood equation is pp 1 L logdet K VK w K VK w Not
51. 54 j l and e OL J say vez do 00 kk E Ziv o X Y Zia j 1 J 1 par 5 DAV um A 55 J From A 45 A 46 and A 48 we have E VIZ ZV A Lu UV v 1 117 1 ZVP Zu ZV Zu PL VZ Vs 1 v 1 GVIZASVZD HNV ZA ZV Z i ZiV7 Z ZV5 Z Ju J D EVT zov zo j l ZVP Z YZV Z pul A 56 and analogously O PL E m and Vj Zw A 58 Usu 7 say tu A 58 91 As with the function and the gradient computationally more efficient formulas will be derived for the covariance matrix of the estimators Combining A 49 A 50 A 51 and A 52 with A 9 it is found that OL 2 4 17 E Sx Iy 0 Z 0G zx j l o X X j l J 4 ly X Z 0G ZX 2 0 L Oy o oL E 0 L E E 26 i Combining A 53 A 54 and A 55 with A 16 and A 14 it is found that E oL N ix 4G o ue 22 7 e and S m l 4 2 d N Jq 407 E oL at 27 x ju and aL b 2 E 2 z 7 fe 25 59 UG J pue Combining A 56 A 57 and A 58 with A 13 it is found that PL 1 lz E 35 55 7 te ZZ viz 1 E G6 ZZ ol ZZ OL ot 1 lz s e a LE AG LY E FL 5 1o zz y 90 00 2 eu Y These formulas are Hepes in the program Note that these expressions depend on the data only through the terms 2e XX Z X Pp and z
52. It uses an input file and an output file as parameters This means that the program can be started by the command MLA hHv input file name output file name where input file name is the name of the file that contains the input and output file name is the name of the file in which the output of the program will be saved Both files are simple text files ascii The output file will be explained in the next chapter The input file will be considered here The options are help h extended help H and verbosity v respectively The latter shows some information on the terminal about what the program is doing The input file consists of statements which are case insensitive Every statement begins with a slash and a keyword e g TITLE Every keyword may be abbreviated but it must be at least of length three to be recognized e g TIT Other text following the keyword and or leading spaces will be ignored The rest of the statements must follow on lines below the keyword and should precede the next statement These lines are called substatements and may also consist of one or more keywords e g file The last statement to be read is the END statement All other statements and corresponding substatements may appear in any order but before the END statement if they are to be executed A substatement may continue on the next line In this case the first line must be ended with two backslashes Finally comments preceded by a
53. Leeden R amp Busing F M T A 1994 First iteration versus final IGLS RIGLS esti mates in two level models A Monte Carlo study with ML3 Tech Rep No PRM 02 94 99 Leiden The Netherlands Leiden University Department of Psychometrics and Research Methodology Van Landeghem G Onghena P amp Van Damme J 2001 The effect of different forms of centering in hierarchical linear models re examined Tech Rep No 2001 04 Leuven Belgium Catholic University of Leuven University Centre for Statistics Wansbeek T amp Meijer E 2000 Measurement error and latent variables in econometrics Amsterdam North Holland White H L 1982 Maximum likelihood estimation of misspecified models Econometrica 50 1 25 Wolter K M 1985 Introduction to variance estimation New York Springer Wu C 1 1986 Jackknife bootstrap and other resampling methods in regression analysis The Annals of Statistics 14 1261 1350 with discussion 100
54. Level 2 random effect and be the variance of the Level 1 error term Then o o is the variance of y j and p O o is the intra class correlation If the sample sizes in all Level 2 units are equal i e N N J N say for all j then the variance of the sample mean of y is e Q De E G On the other hand the variance of the sample mean of an independently distributed sample with mean y and variance e is N Consequently the variance of y based on a multilevel sample of size N is equal to the variance of y based on an independent sample of size N N 1 N 1 p and this expression for is the effective sample size Hox 2002 p 5 also correctly gives this formula for the effective sample size but incorrectly states that the standard error is multiplied by the factor 1 N 1 p whereas our analysis shows that it is the variance that is multiplied by this factor Because p gt 0 and N gt 1 the denominator is larger than 1 and thus the effective sample size is smaller than N If not all sample sizes in the Level 2 units are equal then the variance of the sample mean y of y is J N2 j l o N SS 1 so if y is estimated by the sample mean then the effective sample size is N N2 f D j l However unlike the case of equal group sizes the unweighted sample mean is not the most efficient estimator anymo
55. Universiteit Leiden MLA Software for MultiLevel Analysis of Data with Two Levels User s Guide for Version 4 1 Frank M T A Busing Leiden University Department of Psychology P O Box 9555 2300 RB Leiden The Netherlands Erik Meijer University of Groningen Department of Econometrics Box 800 9700 AV Groningen The Netherlands Rien van der Leeden Leiden University Department of Psychology P O Box 9555 2300 RB Leiden The Netherlands December 2005 Department of Psychology H Faculty of Social and Behavioural Sciences gt E J 1994 2005 Frank M T A Busing Erik Meijer and Rien van der Leeden All rights reserved Permission is granted to copy this publication as a whole or parts of it provided that its origin is properly acknowledged This document can be referenced as follows Busing F M T A Meijer E amp Van der Leeden R 2005 MLA Software for multilevel analysis of data with two levels User s guide for version 4 1 Leiden The Netherlands Leiden University Department of Psychology Preface This manual describes MLA version 4 1 a computer program developed for multilevel analysis of data with two levels The first version of MLA was written in 1993 1994 to empirically evaluate the bootstrap and jackknife estimators for multilevel analysis that we had developed and contin ued to develop extend and improve This original project was supported by a grant S
56. Uy p PE where w is the centered Level 2 regressor and y y yw is the new intercept The analysis is completely analogous to the analysis of the effect of centering in an ordinary regression model if there are no restrictions on y then the centered and uncentered models are equivalent and the advantages and disadvantages of centering are analogous to those for the ordinary regression model It is immediately clear that this carries over to the slope equation as well Note that we have not properly defined w There are two sensible definitions one explicitly recognizing that w is a Level 2 variable and one that takes the simple average over all Level 1 units 2m Wj j l J N J J JN 1 1 1 j WDN DWR UNM EZ UW j 1 i l j l j l For balanced data 1 e if m N J for all j then these two definitions revert to the same average Otherwise Wq attaches proportionally more weight to Level 2 units with more observations Note that these are not necessarily larger units in the population Just as centered and uncentered models are equivalent the models with different forms of centering are statistically equivalent and preference for one or another are purely interpretational MLA uses as a basis for grand mean centering which conforms to centering of a Level 1 variable Users who prefer centering around Wo should compute the centered variable outside of MLA and include this in the data file within MLA this
57. VO project no 93713 from the Institute for Educational Research in the Netherlands SVO Therefore we thank SVO for their financial support In the years following its introduction MLA has been ex tended and improved gradually mainly by implementation of improved and extended resampling estimators that have been the results of our research but also by implementation of user requested features or features that we found useful ourselves The new versions have been accompanied by short readme files that briefly described the new features but a proper new version of the manual has been embarassingly lacking Here we finally provide an up to date manual The MLA program can still be characterized predominantly by our need to have a platform for our research on resampling methods Nevertheless from the feedback that we get we may conclude that it has proven to be useful for a wider audience We expect that this new manual and the latest version of the program 4 1 that it describes enhance its usefulness The preface of the first version of this manual Busing Meijer amp Van der Leeden 1994 gave the following description of the features of MLA The MLA program can be characterized by four major properties e User friendly interface e Extensive options for simulation in particular three options for bootstrapping multilevel models e Simple estimation methods providing an alternative for the complex iterative estimation procedu
58. X and or Z variables are missing the EM algorithm can not be used straightforwardly because it requires that the joint distribution of the exogenous and the endogenous output vari ables is known Standard multilevel modeling only assumes that the conditional distribution of the output variables given the exogenous variables is known This poses severe complications If the amount of data that is missing is relatively small standard ad hoc solutions to the missing data problem can be used such as listwise deletion deletion of cases with one or more missing values pairwise deletion computation of sufficient statistics such as covariances on the basis of all available information for the variables in question mean substitution substitution of the mean of the observed values of a variable for a missing value on that variable or other substitution methods All these methods have their advantages and drawbacks and none is fully satisfactory especially when the number of missing values is large In the current version of MLA no specific means of missing data handling are implemented Listwise deletion and several forms of substitution can be done by the user before the data set is processed by MLA Pairwise deletion can not be done because the program requires raw data In principle pairwise deletion could be done within the program but this is not implemented yet 40 Chapter 3 Commands MLA runs as a stand alone batch program
59. a discussion of the effective sample size The bootstrap estimates as printed as follows Bootstrap estimates unbalanced unlinked shrunken residuals Replications done 200 Replications used 200 Fixed parameters Label Estimate SE G1 0 920206 0 170909 G2 2 037206 0 312473 G3 0 958121 0 118839 G4 0 742056 0 196784 Random parameters Label Estimate SE U1 U1 0 611998 0 144656 U2 U1 0 060488 0 078446 U2 U2 0 236573 0 053698 E 1 208053 0 077424 The bootstrap confidence interval estimates are as follows Confidence interval estimates bias corrected percentile method Fixed parameters Label Estimate Mean Lower Upper G1 0 936989 0 953771 0 648489 1 213440 G2 2 058120 2 079033 1 553716 2 602604 G3 0 956199 0 954277 0 780768 1 163196 G4 0 737448 0 732839 0 406751 1 072559 Random parameters Label Estimate Mean Lower Upper U1 U1 0 467356 0 322714 0 372347 0 724392 U2 U1 0 050104 0 039720 0 060996 0 191503 U2 U2 0 160850 0 085127 0 136783 0 253574 E 1 096541 0 985029 1 075827 1 233065 Note mean refers to average over bootstrap replications See documentation for further elaboration on this subject 68 Bootstrap estimates unbalanced unlinked shrunken residuals Replications done 200 Replications used 200 Fixed parameters Label Estimate SE Confidence Interval G1 0 920206 0 170909 0 648489 1 213448 G2 2 037206 0 312473 1 553716 i 2 602604 G3 0 958121 0 118839 0 780768 1 163196 G4 0 742056 0 196784 0 4
60. ampling can be performed at two levels Consider two level data straightforward implementation of the ungrouped jackknife would be to eliminate one observation from one Level 2 unit at the time to obtain a jackknife sample This resampling scheme is exactly equivalent to the resampling scheme of the standard ungrouped jackknife of section 2 10 1 Another possibility is to implement the grouped jackknife With the grouped jackknife it is most logical to use the Level 2 units as groups The Level 2 units may have different sizes and therefore the grouped jackknife with unequal group sizes should be used The parametric bootstrap can be easily implemented in multilevel analysis If the X j and W variables are considered fixed in 2 1 and 2 2 bootstrap samples y can be obtained in the following way First for each j 1 J draw a bootstrap Level 2 error vector 7 from a normal distribution with mean zero and covariance matrix Then draw a bootstrap Level 1 error vector 8j from a normal distribution with mean zero and covariance matrix c x Finally J the bootstrap sample of y is obtained from B Wiy ur 2 57 31 7 X Bj 2 58 Then bias corrected bootstrap estimators and bootstrap estimators of the covariance matrix of the parameters are obtained in the usual way This is the parametric bootstrap that is implemented in MLA It is also possible to derive a parametric bootstrap estimator in case th
61. ap can be specified on this substatement In that case balancing balanced must be specified Default is balancing unbalanced 3 6 5 resample optional The substatement resample offers the user the choice at which level units will be resampled The default is for the bootstrap methods which means that at both levels units will be resampled If kind bootstrap and method cases the user may choose 1 or 2 which means that only Level 1 units or only Level 2 units will be resampled respectively With kind jackknife there is no default and resample 1 or resample 2 must be chosen by the user Characteristics of the data gathering process and the data structure will determine which choice is appropriate For instance with repeated measures Level 1 nested within individu als Level 2 it is probably not useful to resample Level 1 units with the cases bootstrap With multilevel data the intraclass dependency will typically imply that the jackknife must be applied at Level 2 3 6 6 linking optional The Level 1 and Level 2 residuals can be drawn linked or unlinked during simulation Linking the residuals means that the Level 1 residuals will be drawn from the same unit as where the Level 2 residual was drawn from This is specified with linking linked Specifying linking unlinked has the same result as not using the substatement at all This is the default 3 6 7 replications optional Using the substatement replications
62. ar models Applications and data analysis methods 2nd ed Thousand Oaks CA Sage Raudenbush S W Bryk A S Cheong Y amp Congdon R 2004 HLM 6 Hierarchical linear and nonlinear modeling Chicago Scientific Software International Schucany W R Gray H L amp Owen D B 1971 On bias reduction in estimation Journal of the American Statistical Association 66 524 533 Shao J amp Tu D 1995 The jackknife and bootstrap New York Springer Snijders T A B amp Bosker R J 1999 Multilevel analysis An introduction to basic and advanced multilevel modeling London Sage Stephens M A 1974 EDF statistics for goodness of fit Journal of the American Statistical Association 69 730 737 Stevens J P 1990 Intermediate statistics A modern approach Hillsdale NJ Erlbaum Strenio J E Weisberg H L amp Bryk A S 1983 Empirical Bayes estimation of individual growth curve parameters and their relationship to covariates Biometrics 39 71 86 Tu D amp Zhang L 1992 Jackknife approximations for some nonparametric confidence inter vals of functional parameters based on normalizing transformations Computational Statis tics 7 3 15 Tukey J W 1958 Bias and confidence in not quite large samples The Annals of Mathematical Statistics 29 614 Van der Leeden R 1998 Multilevel analysis of repeated measures data Quality amp Quantity 32 15 29 Van der
63. are all part of the bootstrap sample and for example the mother can not be drawn twice within the same Level 2 unit ik On the other hand it is also possible to keep the Level 2 units fixed and draw bootstrap sam ples only from the Level 1 units within each Level 2 unit This can be useful when the Level 2 units can not be considered a simple random sample for example when several prespecified countries are compared and people within each country are drawn randomly Then in the boot strap samples all countries are present once just as in the original sample Bootstrap samples are drawn from complete cases within each country 32 Once bootstrap samples are drawn bootstrap bias corrected estimators and bootstrap standard errors can be obtained straightforwardly 2 10 7 Bootstrap confidence intervals Up till now we have used the bootstrap only for bias correction and computation of standard errors However an important and nontrivial application of the bootstrap is the computation of confidence intervals We will now discuss a number of different types of bootstrap confidence intervals for a typical parameter 0 with true value 6 We will only discuss two sided intervals one sided intervals are defined analogously The intended nominal coverage of the confidence interval will be denoted by 1 a so that the probability that the interval contains the true parameter value should be approximately 1 o Notation Before we int
64. at is we must include all within group averages of the Level 1 explanatory variables as ad ditional explanatory variables at Level 1 If the parameters are not restricted then the resulting model satisfies the requirement that the models with and without Level 2 centering are equivalent However in many cases it is substantively highly questionable whether inclusion of the within group averages as additional explanatory variables makes any sense If they are excluded then as shown above Level 2 centering leads to a different special case different implicit restrictions on B than not centering and the models are different and centering is not innocuous As with grand mean centering we can also study Level 2 centering of the w j and y j Clearly Level 2 centering of a Level 2 variable like w f is not useful because this transformed variable 16 then becomes zero and drops out of the model Level 2 centering of the outcome variable gives y ij y y j Bo jXij amp j Bij BajXj 8 By Xj Xj Ej j Bo jXjj Eip with obvious notation If x j is replaced by xj in the second of these equations then the result is unaltered Therefore Level 2 centering of the outcome variable leads to a transformed model that is unaffected by Level 2 centering of the Level 1 explanatory variable If one is uncertain whether Level 2 centering of the explanatory variables should be done or not then Level 2 centering of
65. ator for the variance of the estimator and hence for its standard error The idea of the jackknife is as follows Consider an independently and identically distributed sample of size N from some distribution and an estimator Oy of a parameter 0 obtained from this sample Furthermore consider removing a group of m observations from the sample and let Cx be the estimator of the same parameter 0 based on this sample of size m The difference between Oy and Oym can then be used to estimate the bias of Oy and this estimate can be used to obtain the bias corrected jackknife estimator 6 It is known that the bias of 9 is generally of order N if m is relatively small to This is typically much smaller than the bias of Oy which is generally of order N Obviously there are many possibilities for selecting a group of observations of size m from the sample If m is equal for each group the simplest case is obtained for m 1 Now the sample is divided into N groups of size one i e the N observations In all other cases with m gt 1 and N a multiple of m the sample is divided into g mutually exclusive groups of size m with g N m We will first give the details of the standard jackknife procedures for these situations and then describe a grouped jackknife procedure that is more suitable for typical multilevel data Justifications can be found in the standard jackknife literature e g Shao amp Tu 1995 or as special cases of t
66. boration on this subject In the next part we can see that both G2 and G4 indicate that the mother s weight has a positive effect on the rat s weight The rat s weight starts higher and rises faster with a heavier mother Restricted maximum likelihood estimates EM Fixed parameters Label Estimate SE T Prob T G1 18 873668 21 002157 0 90 0 3688 G2 0 545101 0 128963 4 23 0 0000 G3 2 967709 11 848158 0 25 0 8022 G4 0 146866 0 072753 2 02 0 0435 Random parameters Label Estimate SE T Prob T U1 U1 27 819366 21 183863 1 31 0 1891 U2 U1 8 248183 8 639488 0 95 0 3397 U2 U2 5 518709 6 985030 0 79 0 4295 E 91 746606 23 688872 3 87 0 0001 Conditional intra class correlation 27 82 91 75 27 82 0 2327 iterations 315 2 Log L 384 359839 The formula used for the conditional intra class correlation is still vc 2 and in MLA notation Uru P UrUl E It is the intra class correlation for observations if all Level 1 variables except the constant are equal to zero In this case given the centering of the age variable this means the intra class correlation in the middle age period which may be an interesting value See section 2 11 for a further discussion of the conditional intra class correlation 64 Note that the EM algorithm needs a huge number of iterations whereas in the previous ex amples the BFGS algorithm converged after only a few iterations This slow convergence is a general characteristic
67. cially in large samples Nevertheless given these complications it seems easier not to center the outcome variable The interpretation of the model parameters will usually not be more difficult for an uncentered outcome variable 2 3 3 Level 2 centering Level 2 centering means centering within contexts i e subtracting the group mean Sometimes there is a theoretical rationale for assuming that the outcome variable does not depend on the value 15 of an explanatory variable by itself but on the relative value of this variable within the group Some examples of situations in which this may be the case are given in Kreft et al 1995 pp 17 20 and Snijders and Bosker 1999 p 81 Here we will first study the technical consequences of Level 2 centering and then make some remarks about substantive issues of model specification and their consequences Let us again consider 2 13 and see whether we can transform the model such that the Level 2 centered variable X p becomes the explanatory variable where x j denotes the sample mean of x within the j th Level 2 unit Thus we write o Rp M EN EET 9 mu Po t Bate ey bij By jx Bs Xj Eis 2 14 with 6 ja possibly random slope If it is assumed that 2 13 is the true model then B4 a whereas replacing x j by its Level 2 centered counterpart in 2 13 implicitly imposes the assumption B 0 The term B4 P is a random term that varies over Level 2 u
68. cide to resample both levels in the data or only the first or the second level This feature may be useful for instance in analyzing repeated measures data Alternative simple estimation methods Usually complicated iterative estimation procedures are used to estimate the parameters of mul tilevel models From a theoretical and technical point of view these procedures provide the best estimates that can be obtained However in practice some of the algorithms used may be rather slow under certain conditions In other cases serious computational difficulties may arise that are not easy to overcome De Leeuw and Kreft 1995 discuss alternative estimation procedures for both fixed and random parameters in multilevel models that are non iterative and relatively easy to implement Moreover in certain cases the quality of the parameter estimates is rather good Hence one could question the real gain of the complicated iterative procedures over these simpler alternatives Therefore in MLA we have implemented a one step and a two step OLS procedure A simple WLS procedure has still to be implemented Simple procedures can always be used as an addition to complex ones and vice versa Their results can always be compared with the results of the iterative methods It depends on the data which estimation procedures are to be preferred De Leeuw amp Kreft 1995 Kreft 1996 Fast Maximum Likelihood algorithm To maximize the likelihood function t
69. containing fixed coefficients and u j is a vector with error terms Equa tion 2 2 clearly illustrates the slopes as outcomes interpretation because it gives expresses the coefficients in 8 j as outcome variables in a separate Level 2 model However substition of Equation 2 2 into Equation 2 1 gives the total model equation y X Wyy Xu e 2 3 This is a mixed linear model Harville 1977 of the form y Xjy Zu E 2 4 in which X 2 X jWj and Z j Xp Several authors use different notations for the models presented in this chapter and in subsequent chapters We find the separate model equations 2 1 and 2 2 for the two levels most useful for interpretation of the model and its estimates and the program input is therefore based on them see chapter 3 For theoretical purposes we find the form 2 4 most useful where usually Xj will be simply written as X p Therefore in the following both representations will be used where appropriate and it will be clear from the context which form is used For now we will proceed with the form 2 4 Generally it is assumed that Bor N 0 c I N and u y N 0 O where oe the variance of the Level 1 error term is an unknown scalar parameter and 6 the covariance matrix of the Level 2 error terms is a symmetric matrix of unknown parameters The covariance matrix V of y j conditional on X j and Z p that is the matrix containing the variances and covariances of the random
70. d formulas will be introduced 74 A 2 Some useful formulas First we define the matrix M 2 G I ZjZjOla A 7 This matrix will be used frequently in the following The inverse of Vis Wansbeek and Meijer 2000 p 351 state the following formula A BCD Y 2A 5 BC DAB DAS where A and C are square nonsingular matrices and B and D are matrices of appropriate dimen sions This formula can also be written as A BCD A A BI D A BO C p 4A Ad A BCU A 8 By defining A o I B Z C 0O and D it follows that Vj can be written as A BCD J Consequently the inverse of V can be found from equation A 8 1 2 2 2 Se ly e ly zen Ze Iy Z 0 Zi o he 1 2 1 Z Z o z 2 4 ly Iy Z OG Z A N 96 j A 9 The determinant of Vis Based on Maddala 1977 pp 446 447 the following formula for the determinant of a partitioned matrix can be derived al De E PE A B C Dj C DJ 0 I A 0 der peris det A det D CA B where A and D are square nonsingular matrices and B and C are matrices of appropriate orders Similarly A B A B I 0 e t pja IE det D det A BD C Consequently det A det D CA B det D det A A 10 15 Now define A I B Z C Z 0 and D g ly The matrix V can now be written as J V D CA B and the determinant of A is
71. d in se quentially A similar formula holds for differentials of matrices In the following it will be clear how the chain rule can be applied 77 The formulas above can be used to derive some important differentials 2 dV d o L Z 9Z d oly Z d OZ A 17 d det V det V tr V dV A 18 1 1 d Vj 2 Vi dVVj A 19 1 d log det V ddet V og de Vj da V e Vj 1 1 day det V V dV tr V d Vj A 20 Combining equation A 19 with A 17 we find that 1__y l 2 ny l d V V do My Zi d OZ NV __y 24 2 1 y 1 V da Vj 6 Z V A 21 Consider a term of the form dT trAdO where is a symmetric q x q matrix This term can be written as 4 Ay Ay dO Ay d Orgs k 1 151 k 1 SO OT 80 Ay 9 22 Ay if A is symmetric A 23 and OT A A 24 gU where k Similarly consider a term of the form dS A d B where is a symmetric q x q matrix and A and B are matrices This term can be written as q 4 23 Aj d 0 By dS 1 1 4 Ay By Ay Bip d Ow gt Ay Bu d Bp 1 u 1 So M iM Il MN u v 78 5 95 30 A B A B A 25 and os 36 A Bip A 26 where u z v A 3 Computational formulas for the function and gradient The formula A 6 of the minus log likelihood function is computationally inefficient because a matrix of size N j has to be in
72. ded This is a very natural requirement that is also common in the analysis of variance The second practical point is that by a simple change of origin of an explanatory variable a zero covariance between the residuals of the random intercept and a random slope becomes nonzero Therefore the covariances of the residual of the random intercept with the rresiduals of the random slopes should not be restricted to zero or any other fixed value This last point was also stressed by Snijders and Bosker 1999 69 70 If we take these practical points into account then it follows that centering a Level 1 explana tory variable around the grand mean leads to an equivalent model and thus the choice whether or not to center can be confidently based on issues of interpretation Let us now study what happens when we center a Level 2 variable In the light of the previous results we consider the following two level random coefficients model Vij By jXjj 2 134 YW 9 130 Paj ys Y4Wj Us 2 13c i e 2 12 but with the same explanatory variable in both Level 2 equations We could add more regressors to the intercept equation but this does not affect the discussion We want to know what happens if we center w y Because the equation for y j does not explicitly contain w p We focus attention to the Level 2 equations The intercept equation can be rewritten as By yi Y2W yy w w y YW
73. del y at Pxte where is a normally distributed error term with mean zero and variance c Suppose that a sample x4 yy Xy is available Then parameter estimates a B and can be obtained Now if x is considered a random variable nonparametric bootstrap samples can be easily obtained by resampling complete cases Bootstrap samples y x1 yy xy consist of pairs y7 x7 that are also elements of the original sample that is for each i 1 N there exists a j 1 lt j lt N such that 7 37 y p p Then the parameters can be estimated from each bootstrap sample and bias corrected estimates can be obtained as well as an estimate of the covariance matrix of the estimator using the formulas from section 2 10 3 The implementation of the parametric bootstrap depends on whether a specific distribution of x 15 assumed If x is regarded as a random variable with an unspecified distribution the parametric bootstrap should start with drawing nonparametric bootstrap samples of x If on the other hand a specific distribution of x is assumed for example a normal distribution with mean u and variance UT then the parametric bootstrap starts with drawing parametric bootstrap samples of x for example samples from a normal distribution with mean x and variance 8 which are the estimates of u and from the original sample Given a bootstrap sample x of x the parametric bootstrap draws a
74. e or a group of complete cases is removed for each 6 p for the grouped jackknife The jackknife bias corrected estimators and the jackknife estimators of the covariance matrix of the parameters are obtained straightforwardly e g Efron 1982 pp 18 19 The bootstrap and jackknife methods discussed here for regression models are the standard implementations as for example discussed by Efron 1982 These have some drawbacks and therefore alternative resampling methods have been proposed that have some advantages for example that they are robust to heteroskedasticity A thorough discussion can be found in Wu 1986 2 10 6 Resampling multilevel models Because multilevel analysis is based on regression analysis resampling methods for multilevel models can be based on resampling methods for regression models The methods of section 2 10 5 can however not straightforwardly be applied to multilevel models because the usual jackknife and bootstrap theory requires that the different observations be independently distributed This is not the case with multilevel analysis where the observations within the same Level 2 unit are dependent Another difference between regression analysis and multilevel analysis is that in multilevel analysis there can be variables measured at all levels In the two level case for example there are variables describing the Level 1 units and possibly variables describing the Level 2 units This implies that res
75. e X and W variables are considered random This is analogous to 2 54 but it is not implemented in MLA For the nonparametric bootstrap several situations can be studied If the X and W variables can be considered fixed then analogously to regression analysis the errors have to be estimated As explained in section 2 7 the shrunken residuals 2 32 and 2 34 can be used as estimators of the Level 2 and Level 1 errors respectively A drawback of these errors may be that their variances are less than the variances in the population When however sample sizes at both levels increase this difference diminishes But alternatively the raw residuals 2 30 2 31 can be used instead of the shrunken residuals Unlike in regression analysis the estimated residuals in multilevel analysis do not necessarily have a zero mean Therefore the means are subtracted first Otherwise the possibly nonzero mean of the errors would necessarily lead to biased estimators of the constant Once centered estimates uj j L J and 5j j2bLh JAiz l N of the errors are obtained nonparametric bootstrap samples LE j 1 J and 1 j gt 1 N are drawn and nonparametric bootstrap samples of y are obtained from 2 57 and 2 58 Then estimators can be obtained in the usual way and bootstrap bias corrected estimators and standard errors can be obtained straightforwardly This bootstrap procedure of resampling from estimated errors is called
76. e that L does not depend on the fixed parameters y The REML estimators of the random parameters and o are obtained by minimizing L instead of the full 2 log likelihood L as given in section 2 5 It can be shown that the resulting estimators do not depend on the specific choice of K any N x N p matrix K of rank N p such that K X 0 will lead to the same estimators It turns out that REML estimators are typically less biased than FIML estimators This is one of the reasons why REML is often preferred Asymptotically however the difference between FIML and REML becomes negligible Because L does not depend on y REML estimators of y do not exist Usually however it is desirable to estimate y as well A good estimator of y can be obtained as follows In 2 18 the two level model was written as a linear regression model in which the covariance matrix V of the residuals r is not a scalar multiple of the identity matrix If V were known the best linear unbiased estimator of y would be the generalized least squares GLS estimator Vers UV IX V y which is also the maximum likelihood estimator if normality is assumed Because V is unknown this estimator cannot be computed However we can estimate V consistently by plugging in the REML estimators of and g giving the estimator po The resulting feasible generalized least squares FGLS estimator is Yras QU LX X This estimator can also be called two step
77. e variable number must be at least 1 and less than or equal to the number of variables indicated in the variables substatement If this substatement is omitted the order in which the Level 1 units are read from the data file is used as identification 3 2 4 id2 required One of the variables in the data file must contain a code number that identifies the Level 2 units This may be a group number or in case of repeated measurements a subject number The number is essential for a correct discrimination of the Level 2 units The variable number has to follow the keyword id2 and it must indicate the position of the identifier variable in the data file The variable number must be at least 1 and less than or equal to the number of variables indicated in the variables substatement 42 3 2 5 missing optional For every variable one missing value may be specified on this substatement After the equal sign first the variable is indicated followed by the missing value between parentheses More variables and values are separated by commas 3 2 6 centering optional If this substatement is given the grand mean of the variables is subtracted Following the centering substatement the variable numbers of the variables that will be centered are given separated by commas These variables will be centered ignoring grouping directly after reading the data but before any analysis A warning must be given here The choice whether to center and
78. echnical Appendix A which gives in painstaking detail the derivations of the formulae for the likelihood function its first and second derivatives and their expectations It illustrates the effect TEX has on the mind of an individual who has just escaped from the dungeons of WYSIWIG Although there is considerable truth in this observation the primary erason for including these detailed derivations in the first version of this manual was quite different When we started pro gramming most authors referred to the technical appendix of Longford 1987 for the computa tional details When we implemented his formulas however our program did not converge After we spent considerable time in deriving all the formulas it turned out that there was a typo in one of his formulas To save others who might want to do the same a lot of time and to prove that our formulas are correct we included this appendix Our escape from the dungeons of WYSIWIG has now been more than 10 years ago but we still think this technical appendix is useful for some users Apart from the argument above an other important reason is that it occurs frequently that different statistical programs give different answers estimates standard errors test statistics p values in cases where they are expected to do the same analysis Questions about how this can happen are often found on discussion lists and frequently asked questions lists of statistical packages Such qu
79. ectively To obtain estimates for the parameters several estimation procedures have been proposed These procedures are all versions in one way or another of full information FIML or restricted maximum likelihood REML FIML and REML estimators have several attractive properties such as consistency and efficiency A drawback of both approaches however is their relative complexity Generally parameter estimates must be obtained iteratively and serious computational difficulties may arise during such processes Software The flourishing of models and techniques for analyzing hierarchical data has been stimulated by the software widely available for estimating multilevel models Besides dedicated multilevel mod eling software like MLA HLM Raudenbush Bryk Cheong amp Congdon 2004 and MLwiN Ras bash Steele Browne amp Prosser 2004 software on structural equation modeling like LISREL EQS Mplus and Mx also contain options for multilevel modeling with varying degrees of general ity Of these Mx can be freely downloaded from http www vipbg vcu edu mxgui General purpose statistical packages like SAS and SPSS and statistical matrix programming environments like S plus and R also offer procedures for multilevel analysis These typically have names like MIXED GLMM generalized linear mixed modeling or NLME nonlinear mixed estima tion The R program can also be freely downloaded cran r project org The websi
80. ee section 2 10 1 It is however doubtful whether the jackknife for multilevel models will give a reasonable estimate of a third order moment A bootstrap analog of 2 71 would be Ble Me T S alo T B 3j It is still an open question whether this gives reasonable results 36 The BC interval is available in MLA but due to the problems with estimating the acceleration constant the BC is not yet available 2 11 Effective sample size and intra class correlation The intra class dependency in multilevel data generally affects the precision of the estimators of the fixed parameters This means that in comparison with an independent sample with the same variance of the total error term a larger or smaller sample is needed to achieve the same precision Equivalently the precision obtained from a sample containing N Level 1 units with multilevel data is equal to the precision from a sample of say N observations with independent data We can then say that the effective sample size of the multilevel dataset is only N whereas actual sample size is N This terminology is due to Kish 1965 p 259 Here we will analyze this in deeper detail First assume that the data are drawn from a two level random effects ANOVA model without explanatory variables Yj EY tU Ep where y is the grand mean which is equal to the intercept in this case and all other notation is obvious Let O be the variance of the
81. efinite and that the mini mization algorithm may thus be less stable but the advantage that constraints may be imposed on its elements see the discussion of the CONSTRAINTS command 3 5 4 warnings optional If the maximum number of warnings is reached the program terminates execution This substate ment can change the default value of 25 The value must be an integer between 1 and 32767 3 5 5 maxiter optional The maximum number of iterations in the minimization process The default value is 100 This number should be sufficient for reaching convergence if the sample size is large enough and or the number of parameters to be estimated is not too large Changing the minimization method or the convergence criterion see below can make it necessary to raise the maximum number of iterations The value must be an integer between 1 and 32767 3 5 6 convergence optional After each iteration the new function value is compared to the previous function value The obtained difference is compared to a convergence related value If IF 4 Fl lt convergence IF IF 1 2 convergence is said to have been reached In this formula F is the function value after the i th iteration The left hand side of the inequality represents the ratio between the difference of two successive function values and the mean of these values The default value of convergence is 1 0e 10 i e 10 10 and permitted values range from
82. eparate output file for intermediate results of the simulation file Example SIMULATION kind bootstrap use simulation method bootstrap method error resample from error vectors type raw use raw residuals as error vectors resample Sei only resample level 1 units replications 200 repeat simulation 200 times seed 1041245 start with random seed 1041245 file boot out write simulation results to boot out 3 6 1 kind required With this substatement the user can choose from three options namely bootstrap jackknife and permutation simulation All types of simulation work as follows e perform the analysis on the original data e obtain a new sample e repeat the analysis on the new sample e save the new estimates The last three steps together called a replication are repeated a number of times Afterwards bias corrected estimates of model parameters and nonparametric estimates of standard errors are 47 computed These estimates are computed from the set of saved estimates and the original max imum likelihood estimates Furthermore for the bootstrap confidence intervals are computed from the replications The bootstrap the jackknife and the permutation option differ in the way a new sample is obtained The choice between bootstrap jackknife or permutation resampling also determines the way the final simulation estimates are computed More details can be found in Chapter 2 3 6 2 method optional
83. eral diagnostics are printed First it prints the sample sizes at both levels the mean of the within group sample sizes and the effective sample size The latter is computed according to the formula ESS N 1 N J 1 p where p O 0O 07 is the conditional intraclass correlation and O is the diagonal element of corresponding to the Level 1 intercept See section 2 11 for a critical discussion about the usefulness of these statistics Second it prints four experimental pseudo R measures The properties of these are currently unknown They are implemented for experimental reasons only In a future version of MLA they will either be removed or more extensively documented 52 Finally the Level 1 and Level 2 outliers are printed See section 2 9 for the formulas of the statistics and a discussion of their properties Actually the term outlier is a little misleading here because they are printed for all Level 1 and Level 2 units whether they should be considered outliers or not 3 9 PLOT optional The PLOT statement gives the user control over some plot options The output of MLA consists only of ASCII files so these are rough plots but they may be useful for quick diagnostic purposes For high quality graphics the numerical output of MLA and its options for writing all kinds of results to files can be invoked This output can then be used as input to high quality graphics software 3 9 1 histograms optiona
84. erential formulas As was stated in the previous section the maximum likelihood esti mates are obtained by minimizing the minus log likelihood function by the BFGS method which uses the gradient of the function To find the gradient the differential notation of Magnus and Neudecker 1985 1988 will be used The key property of differentials is their relation with derivatives through the following equivalence Let f be a vector or scalar function of a vector or scalar variable x then of Ox The differential of a matrix is defined through the vector that stacks its columns vec dF d vec F Note that the differential of a scalar vector or matrix is a scalar vector or matrix of the A x e df A x dx same Size Some useful formulas are Magnus amp Neudecker 1985 1988 d c 0 d cg c dg d g h dg dh d gh dg h g dh 1 d log f il d det F det F tr F dF d tr F tr dF d F F dry where c is a scalar vector or matrix constant g and h may be scalars vectors or matrices pro vided the expression is a valid expression f is a scalar and F is a matrix There is also a chain rule If f is a function of x and g is a function of f then cf Magnus amp Neudecker 1988 p 91 _ Og Of Of Ax This means the following for the differentials If dg A d f and d f B dx then dg dx which illustrates that informally speaking the formulas for the differentials can be fille
85. estimators A 5 Reparametrization References Vil 55 55 61 62 65 73 74 75 79 84 93 97 viii Chapter 1 Introduction 1 4 Introduction to multilevel analysis Multilevel analysis comprises a set of techniques that explicitly take into account the hierarchical structure in the data In this section a brief introduction to the underlying ideas of multilevel anal ysis is given Several relevant topics such as hierachical data structures intra class correlation the formulation of a multilevel model and the estimation of the model parameters are discussed This introduction does not contain formulas Chapter 2 will discuss the main formulas and the Technical Appendix will give supplementary mathematical details Hierarchical data Hierarchically structured data arise in a variety of research areas Such data are characterized by so called nested membership relations among the units of observation Classical examples of hierarchically structured data are found in educational research where for instance students are nested within classes and classes are nested within schools But in many other instances in the social and behavioral sciences as well as in many other fields of science data are also hierarchically structured For instance in clinical psychology clients can be nested within therapy groups people can be nested within families and so forth A sociological example is given by a study
86. estions can only be answered if it can be assessed what each program does i e which default choices are made which formulas are used which algorithms are implemented This appendix provides this information In this appendix the minus log likelihood function and its gradient function are derived as well as computationally more efficient formulas of them The asymptotic covariance matrix of the maximum likelihood estimators and computationally efficient formulas for it are derived and the explicit imposition of implicit constraints in the model is discussed 73 A 1 The model and the likelihood function To find maximum likelihood estimates we start with the model 2 4 y XytZujt e A 1 E N 0 07ly A 2 J up N O O A 3 where y j is a vector with the endogenous variable for the N Level 1 units in Level 2 unit j X j is an N jXP matrix of exogenous variables for the Level 1 units in Level 2 unit j and Z j is an N matrix of exogenous variables for the Level 1 units in Level 2 unit j The p vector y is a vector of fixed regression coefficients the g vector u j is a vector of random regression coefficients in Level 2 unit j and the N j Vector is a vector of residuals of the Level 1 units in Level 2 unit j j Itis assumed that j andu j are independent of each other and independent of y and u 7 where J i From the model equations A 1 A 3 it is found that conditional on X j and Z pj 15 nor ma
87. f we would consider the x es to be random variables with some distribution x j would converge to X with an increasing sample size within Level 2 unit j and with moderately large within groups sample sizes we would not expect the term x X 10 have an important effect on the results However it leads to a couple of interesting questions from a theoretical perspective The difference between x j and X j leads to a measurement error problem for which it seems to matter whether the sub sample is assumed to be drawn from a finite sub population or from an infinite population However the statistical theory of the multilevel model assumes that the x es are given con stants This means that they are either chosen by an experimental researcher as in agricultural experiments or they are essentially random but the analysis is done conditional on their out comes The latter is common in the social sciences See De Leeuw and Kreft 1995 p 173 for a discussion of this issue The problem with conditioning on the x es is that x ra X j is a constant which is unknown typically nonzero and different for different Level 2 units Except for some special cases this would normally imply that according to the standard multilevel theory the es timators are inconsistent However this is also not the case here because although x m X j 15 considered a constant over repeated sampling with the same sample size it varies with increasing 17 sample size con
88. g quantiles of the normal distribution using bootstrap quantiles may give more accurate results To derive the necessary bootstrap quantiles let us rewrite 2 64 into the following form Pr 21 8 lt 0 9 lt O l a The estimated quantiles q lg cue 3 and ose 3 of the normal distribution have to be replaced by quantiles of the distribution of 6 0 These are estimated by quantiles q and lo of the bootstrap distribution of 0 6 From the definition Pr 0 lt q v 5a it follows that q om 6 H a 1a and the confidence interval for 6 becomes 0 ot qi Lgl which reduces to 20 H 1 a 20 H Ga 2 65 Note that the upper quantile of H ends up in reverse in the lower confidence point and vice versa This tends to give a small bias and skewness correction This percentile interval has not been implemented in MLA because the percentile t interval is generally considered to be an improvement of it Percentile t The percentile t also called bootstrap 7 is a combination of the ideas of the boot strap normal and Hall s percentile intervals It is derived by rewriting 2 64 into the following form 4 0 a lt 2 lt at ESL 2 66 seg 0 2 The quantiles of the normal distribution are now replaced by quantiles of the distribution of t 8 These are estimated by quantiles of the bootstrap distribution of f 0 6 se 0 Let G f be the bootst
89. he PRINT statement Random Level 1 coefficients ordinary least squares estimates per level 2 unit Parameter B1 Unit Size Estimate SE T Prob T 1 5 111 4008 4 9044 22 71 0 0000 2 5 120 2000 2 9967 40 11 0 0000 3 5 119 8000 6 7621 17 72 0 0000 4 5 103 4008 3 8018 27 28 0 0000 5 5 100 0000 3 2701 30 58 0 0000 6 5 99 0000 4 4505 22 24 0 0000 7 5 93 0000 5 5281 16 82 0 0000 8 5 113 6000 1 6391 69 31 0 0000 9 5 90 4000 4 5284 19 96 0 0000 10 5 121 0000 2 4549 49 29 0 0000 Mean 107 1800 Variance 132 6884 Parameter B2 Unit Size Estimate SE T Prob T 1 5 28 8000 3 4679 8 30 0 0000 2 5 28 1000 2 1190 13 26 0 0000 3 5 36 3000 4 7816 7 59 0 0000 4 5 27 2000 2 6882 10 12 0 0000 5 5 23 4000 2 3123 10 12 0 0000 6 5 29 3000 3 1470 9 31 0 0000 7 5 25 6000 3 9090 6 55 0 0000 8 5 19 7000 1 1590 17 00 0 0000 9 5 23 6000 3 2021 7 37 0 0000 10 5 25 6000 1 7359 14 75 0 0000 Mean 26 7600 Variance 19 7138 Parameter SIGMA 63 Unit Size Estimate SE T Prob T 1 5 120 2667 98 1973 22 0 2207 2 5 44 9000 36 6607 222 0 2207 3 5 228 6333 186 6783 522 0 2287 4 5 72 2667 59 0055 22 0 2207 5 5 53 4667 43 6554 22 0 2207 6 5 99 0333 80 8604 22 0 2287 7 5 152 8000 124 7607 22 0 2207 8 5 13 4333 10 9683 22 0 2287 9 5 102 5333 83 7181 C22 0 2287 18 5 30 1333 24 6038 22 0 2287 Mean 91 7467 Variance 64 4782 Note random level 1 coefficients are also referred to as level 2 outcomes See documentation for further ela
90. he MLA program uses the Broyden Fletcher Goldfarb Shanno BFGS method e g Nocedal amp Wright 1999 This is a fast and stable method to optimize arbi trary functions It requires that the function and the gradient the vector of first derivatives of the function with respect to the parameters be programmed It minimizes the function with respect to both fixed and random parameters simultaneously As such it resembles most the algorithm used by VARCL Longford 1990 although the BFGS method does not compute the inverse of the information matrix at each iteration The algorithms of MLwinN and HLM alternately update the fixed and the random parameters 1 3 Changes since version 1 0b The first publicly available version of MLA was version 1 0b released in December 1994 This was accompanied by an extensive manual Busing et al 1994 In the period 1995 2005 a few updates of MLA have been made and released These contained mainly some additional options but also a few improvements and bug fixes These updates have not been accompanied by updates of the complete manual Instead the changes were documented only very briefly in the readme files included in the MLA distribution Consequently the current manual is the first update of the full manual since the original manual Therefore we briefly list here the changes between MLA version 1 00 and MLA version 4 1 as reflected in the changes in the manuals e Two types of centering of variables
91. he data within clusters are dependent In multilevel data however groups are usually not of equal size m Therefore to make the jackknife suitable for multilevel data and models the delete m jackknife needs to be generalized to a grouped jackknife for unequal group sizes called the delete m j jackknife by Busing Meijer and Van der Leeden 1999 To apply the delete m jackknife with m gt 1 and N a multiple of m the sample is divided into g mutually exclusive groups of size m with g N m In multilevel analysis the sample is divided into J groups of usually varying size N P that is N j is not equal for each group and 26 j will not necessarily be equal to J As a result the formulas discussed earlier have to be adapted slightly Let 4 Pp be an estimator of 0 based on a sample from which group j with size N j is removed The delete m j jackknife estimator of 0 is now given by Oin Jy xf 2 42 The estimator 6 can be justified as follows Consider an estimator 6j of a parameter 0 obtained from a sample of size N from some distribution In general the expected value of such estimators can be written as the true value 6 plus a power series expansion in 1 N that is z b b rs 2 43 where b b are unknown constants independent of sample size and frequently not equal to zero see e g Quenouille 1956 Schucany Gray amp Owen 1971 If b 0 the bias in 2 43 is c
92. he discussion in section 2 10 1 below Delete 1 jackknife Suppose Oy is an estimator of 0 based on a sample of size N Now remove the i th observation from the sample and let 6 be the estimator of 0 based on a sample size of N 1 The delete 1 jackknife estimator of 0 is now given by Ora NOy N D g 2 38 where bn N7 6 i Ai The delete 1 jackknife variance estimator Tukey 1958 based on the pseudo values NOy N DO 1 is given by Tay si N 1 S 2 39 Nei iam 85 i As mentioned above and discussed in more detail in section 2 10 1 the bias of Ss is typically 2 whereas the bias of Oy is typically O N Furthermore 855 is a consistent estimator of the asymptotic variance of both Oy and 25 Delete m jackknife Suppose the sample is divided into g mutually exclusive and independent groups of equal size m m gt 1 where m N g Now remove the m observations of group j from the sample and let 9 p be the estimator of 0 based on the corresponding reduced sample of size N m The delete m jackknife or grouped jackknife estimator of 0 is now given by Oimn g y 6 1 8 m 2 40 with 0 g pt 9 jv Hence Tn is based on g estimators 9 of 0 each based on a subsample of size N m for m 1 2 40 reduces to 2 38 The delete m jackknife variance estimator is defined similarly to 2 39 It is based on the pseudo values
93. imulation output file Kind of CI estimation 0 CI alpha 0 025000 CI convergence le 010 CI replications 25 Print input 1 Print explore 1 ALL Print olsq 1 Print outcomes Compute residuals 0 Print residuals 57 Print posterior means Print diagnostics Print intervals Max equations Level 1 size Level 2 X size Z size Parameters Level 2 parameters Input file Output file Verbose Monte Carlo Monte Carlo file Plot histograms Plot scatterplots Response variable Explanatory variables Random level 2 vars Random level 1 coeffs Level 2 outcome 1 OS anova in anova out 0 0 61 01 61 1 1 The single equation shows the integration of the Level 2 equations and the Level 1 equation in the same way as in chapter 2 equations 2 1 2 2 and 2 4 It is displayed directly below the model specification Part 3 consists of the data descriptives optionally given by the use of the keyword descriptives all under the PRINT statement These statistics are displayed in two major blocks and are preceded by the number of Level 1 and Level 2 units Data descriptives Data descriptives for all units Level 1 units S179 missing Level 1 units 0 correct Level 1 units 179 correct Level 2 units 3 Var Mean Stddev Variance Skewness Kurtosis K S Z Prob Z 1 2 02 0 83 0 70 0 04 1 57 3 18 0 00 2 21 37 10 92 119 25 9 72 0 16 1 45 0 03 3 3
94. irical data sets however we think that extensive simulation options in particular options for bootstrapping are a very useful addition to a program for multilevel analysis Therefore four different simulation methods are implemented in MLA 1 A bootstrap method that uses the estimated parameters as true values of the parameters of a multivariate normal distribution from which new outcome variables are drawn This method is called the parametric bootstrap 2 A bootstrap method that uses the observed values of outcome and predictor variables for resampling Thus whole cases are resampled Therefore this is called the cases bootstrap 3 A bootstrap method that uses estimates of the error terms at both levels for resampling In contrast with the cases bootstrap this method leaves the regression design unaffected This is called error bootstrap or residual bootstrap Because the error terms at both levels must be estimated in order to be resampled we need the estimates for the separate Level 1 models random Level 1 coefficients As was explained earlier there are multiple choices for these coefficients These choices account for additional options that can be used when applying the error bootstrap 4 The jackknife With this method one entire group is deleted for each resample There are as many resamples as there are groups Depending on the type of simulation used in MLA and depending on the nature of the data the user can de
95. is Colorado State University Fort Collins Longford N T 1987 A fast scoring algorithm for maximum likelihood estimation in unbal anced mixed models with nested random effects Biometrika 74 817 827 Longford N T 1990 VARCL Software for variance component analysis of data with nested random effects maximum likelihood Princeton NJ Educational Testing Service Maddala G S 1977 Econometrics Singapore McGraw Hill Magnus J R 1978 Maximum likelihood estimation of the GLS model with unknown param eters in the disturbance covariance matrix Journal of Econometrics 7 281 312 Magnus J R amp Neudecker H 1985 Matrix differential calculus with applications to simple Hadamard and Kronecker products Journal of Mathematical Psychology 29 474 492 Magnus J R amp Neudecker H 1988 Matrix differential calculus with applications in statistics and econometrics Chichester Wiley Markus M T 1994 Bootstrap confidence regions in nonlinear multivariate analysis Leiden DSWO Press Mason W M Wong G Y amp Entwisle B 1983 Contextual analysis through the multilevel 98 linear model In S Leinhardt Ed Sociological methodology 1983 1984 pp 72 103 San Francisco Jossey Bass Matsumoto M amp Nishimura T 1998 Mersenne Twister A 623 dimensionally equidistributed uniform pseudo random number generator ACM Transactions on Modeling and Computer Simulation
96. is of variance DATA file sesame dat variables 3 id2 1 MODEL b1 ul v3 bl e PRINT descriptives all olsquares yes END The top of the output is the MLA title page It supplies information about the name and origin of the program and cannot be suppressed MMMM MMMMM LLLL AAAAAAAA MMMMM MMMMMM LLLL AAAAAAAAAA MMMM M MMMMMMM LLLL AAAA AAAA MMMM MM MMM MMMM LLLL AAAA AAAA MMMM MMMM MMMM LLLL AAAA AAAA MMMM MM MMMM LLLL AAAAAAAAAAAAAAAAAA MMMM M MMMM LLLL AAAAAAAAAAAAAAAAAAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA AAAA Multilevel Analysis for Two Level Data AAAA AAAA Version 4 1 AAAA AAAA Developed by AAAA Frank Busing AAAA Erik Meijer AAAA Rien van der Leeden AAAA AAAA Published by AAAA Leiden University AAAA Faculty of Social and Behavioural Sciences AAAA Department of Psychometrics and Research Methodology AAAA Wassenaarseweg 52 AAAA P O Box 9555 AAAA 2300 RB Leiden AAAA The Netherlands AAAA Phone 31 8 71 5273761 AAAA Fax 31 0 71 5273619 AAAA Except for the title page and the optional input part every part contains a header The header is always the same and is made of a few lines of standard text and the title of the analysis as supplied by the user For this first example it reads MLA R Multilevel Analysis for Two Level Data Version 4 1 12 15 2005 Copy
97. l This option is only in effect when SIMULATION is chosen If so all parameters may be specified and histograms will be displayed of the estimates of the different replications of the specified parameters 3 9 2 scatters optional Scatterplots can be obtained for prediction and residuals Specifying prediction produces a scat terplot of the response variable versus the predicted values based on the estimated fixed parame ters Specifying a variable produces a scatterplot of this variables versus all residuals associated with this variable 53 54 Chapter 4 Worked examples In this chapter a few examples will be discussed In each example the input file will be given and a relevant piece of the output file will be shown and discussed The input files and corresponding data files are included in the MLA distribution so these analyses can be repeated by the user The output of MLA consists of a single text file which is the second parameter in the statement that starts program execution see p 41 The output consists of several parts and each part starts with a brief heading The analyses presented here range from a simple analysis of variance to a bootstrap analysis for a complicated two level model It is not our intention to give extensive examples of case studies Rather the examples discussed here are intended to give insight in how to use MLA for different analyses and glance at specific parts of the output 4 1 Random
98. le shows the application of the bootstrap option of MLA as well as several other options TITLE Complete multilevel model DATA file multi dat variables 6 id1 3 id2 2 missing v4 0 6888 center v6 MODEL bl gl g2 v6 ul b2 g3 g4 v6 u2 1 b2 v5 e TECHNICAL estimation fiml minimization bfgs reparameterization root warnings 50 maxiter 500 seed 1041245 convergence 1 0E 12 SIMULATION kind bootstrap method residuals type shrunken balance unbalanced linking unlinked replications 200 INTERVAL kind bias corrected alpha 0 05 PRINT input yes descriptives v4 v5 v6 2 3 olsquares yes random level 1 coeffs all residuals ul u2 posterior means all diagnostics yes PLOT histograms g2 93 94 scatters predicted v6 v5 Note the declaration of the missing value on the dependent variable This removes one observation from the analysis The following part gives the FIML estimates of the parameters Full information maximum likelihood estimates BFGS 66 Fixed parameters Label Estimate SE T Prob T G1 0 936989 0 190592 4 92 0 0000 G2 2 058120 0 327704 6 28 0 0000 G3 0 956199 0 127996 7 47 0 0000 G4 0 737448 0 219790 3 36 0 0008 Random parameters Label Estimate SE T Prob T U1 U1 0 467356 0 198821 2 35 0 0187 U2 U1 0 050104 0 094816 0 53 0 5972 U2 U2 0 160850 0 088703 1 81 0 0698 E 1 096541 0 109506 10 01 0 0000 Co
99. learly of order N7 Let A gu N pP Then the total sample size can be written as N N jh j Hence n By b b E h Oy Oy d 2 44 2 2n73 N hN RN and b b DN a h 13NS ME Une UN M UA Combining 2 44 and 2 45 gives EO 0 2 45 E h Oy hj De by 1 1 b 1 1 SUS I iE 1 Jj J DNG j J b h b h 2h 1 6 t e 2 46 Finally to prevent loss of efficiency the weighted average of the J possible estimators is used cf Quenouille 1956 This gives J N EJ rn G ae hj B hj j J N ME j l The expectation of 2 47 is N J 225 06 fy hj 06 j i hj L 1 by 4 2h 1 Daj E h 1 J j l j l J E l M z so that the bias is of order N if b 0 and if N is relatively small compared to N The corresponding estimator of the variance of Oren gt based on the pseudo values J a 6 2 h Oy h E Doga J 214 is given by J 1 1 x 2 2 Er 2 hd i T ja 1 f gt 5 2 TEN a h 1 2 48 J J 2 NA JO x amp k 1 Note that when all groups are of equal size 2 40 follows from 2 47 that is the delete m j jackknife estimator reduces to the delete m jackknife estimator Analogously 2 48 reduces to the expression for the delete m jackknife variance estimator 2 41 2 10 2 Jackknife confidence intervals
100. lly distributed and the expectation and covariance matrix of y jare Ey Xyy A 4 Vi XN X yy oly ZOZ A 5 Consequently the probability density of y j is 1 10 X y V X y fO jJ e 20 Xjy j vj 2m det vue so that the contribution of Level 2 unit j to the minus log likelihood function is L hina log f y p a l l el ci log 27 2 log det V 20 xy V O z Xy and the minus log likelihood function for the whole sample is Leo j l N 1 5 log 27 z 2 logdet V Ts is XyyV o Xy A 6 where J is the number of Level 2 units and N is the total number of Level 1 units N where j is the number of Level 1 units in Level 2 unit j This is the function that has to be mini mized with respect to the parameters to obtain maximum likelihood estimators To minimize this function the program uses the Broyden Fletcher Goldfarb Shanno BFGS minimization method see e g Nocedal amp Wright 1999 which uses the gradient of the function to be minimized In section A 3 computationally efficient formulas for the function and the gradient will be derived In section A 4 the asymptotic covariance matrix of the estimators will be derived In section A 5 a reparametrization of the model will be discussed in which the restriction of positive semi definiteness of covariance matrices is explicitly imposed But first in the next section some useful notation matrices an
101. lysis Multilevel analysis can often be an appropriate analysis method for repeated measures data see e g Van der Leeden 1998 Here we illustrate the use of MLA for repeated measures data on the frequently analyzed Rat data set The first use of these data with multilevel analysis appeared in Strenio Weisberg amp Bryk 1983 The Rat data consist of the weights of 10 rats These rats were measured five times with four week intervals from birth Also included in the model is the weight of each rat s mother V2 Divided into two levels the equations are given by yg Bij t Posi t Bi YW Uy js Paj Ys YAW Up where X j V5 is the age in weeks divided by 4 minus 2 so that it is in deviation of the mean of the rat and W V2 represents the weight of the mother The input file for the repeated measures example is as follows 62 TITLE Rat Data Repeated Measures 5 timepoints DATA file rat dat variables 4 id2 3 center v4 MODEL b1 g1 g2 v2 ul b2 g3 g4 v2 u2 vl 1 b2 v4 e TECH estimation reml minimization em maxiter 512 PRINT olsquares yes random level 1 coeffs all residuals ul u2 e posterior means all diagnostics yes For each rat a multiple regression analysis is performed and displayed in the Random Level 1 coefficients part This part is optional and displayed through the use of the random level 1 coefficients keyword in t
102. maximum likelihood Murphy amp Topel 1985 because in the first step O and o are estimated by minimizing the restricted 2 log likelihood L and in the second step the likelihood of y is maximized with respect to y conditional on the first step estimators of O and ez In the terminology of Gong and Samaniego 1981 and Parke 1986 this is called pseudo maximum likelihood but this term is ambiguous because other authors use it for maximizing an incorrect likelihood such as maximizing L when the normality assumption is not met This is called quasi maximum likelihood by White 1982 Note that in the output of MLA the two step maximum likelihood estimates of y are somewhat incorrectly printed under the restricted maximum likelihood heading 2 7 Residuals The total residuals are given by Equation 2 18 T y Xy 21 In this equation estimated residuals r are based on the fixed parameter estimates y from the maximum likelihood estimation although other estimates of y could be used as well The raw residuals for the first level are taken from the within group model 2 1 aa E X By 2 30 where the estimates B are the OLS estimates from 2 15 Using the between group model from Equation 2 2 and the OLS estimates from Equation 2 15 the Level 2 raw residuals are u Bj Wy 2 31 where stems from Equation 2 19 The Level 2 shrunken residuals are given by AVI o7 2 i gt i ZO Z
103. me Street data set is used but now a random effects analysis of covariance is performed on these data The model to be estimated is Yj Y Xij Uj Ej 4 2 where y is the intercept X j is the covariate with slope y u j is the Level 2 error component and ej is the Level 1 error component Equation 4 2 can be divided into separate equations one equation for Level 1 and in this case two Level 2 equations yg Bij t Poyi t u p P gc Yos Along with the other statements the input file now becomes TITLE analysis of covariance DATA file sesame dat 3 1 variables id2 MODEL b1 g1 ul b2 g2 v3 bl b2 v2 e PRINT olsquares yes Note that in the equation where b2 is the outcome there is no error term This ensures that b2 is a fixed coefficient Here is the output of the OLS estimation part Ordinary least squares estimates Fixed parameters Label Estimate SE G1 14 672451 1 621040 G2 0 764871 0 067590 Random parameters Label Estimate SE 1 96 968087 10 307591 U1 U1 7 217027 5 892678 2 88 980063 9 458474 E 1 one step estimate of sigma squared ignoring grouping E 2 two step estimate of sigma squared See documentation for further elaboration on these subjects Compared to the previous example a fixed parameter G2 is added in the OLS estimates part This is the regression coefficient of the Level 1 covariate containing the pretest sco
104. minimization method is BFGS which tends to be both fast and stable However for REML estimation only EM is currently supported 45 3 5 3 reparameterization optional The Level 2 covariance matrix should be a positive semi definite matrix To impose this restric tion the parameters can be written in the following way 6 LDL where is the covariance matrix L is a lower triangular matrix with ones on the diagonal and D a diagonal matrix with nonnegative diagonal elements MLA offers two reparameterizations with which the latter can be accomplished The root reparameterization writes D Da The parameters actually used in the minimiza tion are the subdiagonal elements of L and the s i e the square roots of the diagonal elements of D This method is the default and it can be requested explicitly by reparameterization root The logarithm reparameterization writes D exp Then the parameters actually used in the minimization are the subdiagonal elements of L and the s which are now defined as the natural logarithms of the diagonal elements of D This method can be requested by reparameterization logarithm In the output however the reparameterization is reversed and the estimates of are pre sented with their corresponding standard errors By specifying reparameterization none no reparameterization is done This has the drawback that the estimated covariance matrix may not be positive d
105. mulation IP pda D vou wg EE vq duni bue aeg 23 2 10 1 Ehe jackEnife ee tette bed ae tan al 25 2 10 2 Jackknife confidence intervals llle 28 210 3 Fhe bootstrap a dre o eo ee ee A a 28 2 10 4 Balanced bootstrap eA 30 2 10 5 Resampling regression models 30 2 10 6 Resampling multilevel models 31 2 10 7 Bootstrap confidence intervals llle 33 2 11 Effective sample size and intra class correlation 37 2 12 Missin amp d ta exer eere OU d cee a oe s 39 3 Commands 41 3l TITLE optional week spe x9 wee cake 4 A 3 2 DATA required e aS ees ge ae Sa e Sue dede S sat fea sq RA 42 32 1 file required ss 42 3 2 2 variables required dae aec va o3 ba 42 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 2 3 Tdl op onal lt 4 4044 a 4e dues See habe qas em hes 42 3 244 required nass o io e hacks Phe ha 42 3 25 missing optional 43 3 2 06 centering topina ee 43 3 2 7 level 2 centering optional 43 MODEL required a is 43 CONSTRAINTS optional 2s les 44 TECHNICAL optional 4 45 3 5 1 estimation oe c sa
106. nd in those cases in which it is not the percentile t confidence interval performs less well A complicated extension that aims at transforming the parameter to a near pivotal quantity is the variance stabilized percentile t interval see e g Efron and Tibshirani 1993 section 12 6 Efron s percentile interval The idea behind this interval is quite different from the ideas behind the bootstrap normal interval and its extensions It was stated above that H 0 is a consistent estimator of the distribution function of 0 Therefore an asymptotic 1 a confidence interval can be obtained by taking the relevant quantiles from H which leads to the interval Ge H A 1o 2 68 Efron s percentile interval does not rely on the asymptotic normality of 0 Its coverage perfor mance in finite samples is however frequently not very well because the end points of the inter val tend to be a little biased Note the difference with Hall s percentile interval Here percentiles of the distribution of 6 are approximated by percentiles of the distribution of 6 whereas in Hall s percentile interval percentiles of the distribution of 8 are approximated by percentiles of the distribution of 6 6 Efron s percentile interval is simply called percentile in the input and output of MLA Bias corrected BC and bias corrected and accelerated BC percentile intervals The BC and BC intervals have been introduced to correct for s
107. nditional intra class correlation given the values z j and z j In the conditional intra class correlation is computed for the hypothetical case where all elements of z j and z j are Zero except for the constant Then the conditional intra class correlation has the familiar form a zm i 0 702 where is the diagonal element of O corresponding to the constant Whether p is an in teresting value depends on the average and variability of z It is most relevant if z is centered around the grand mean except for the constant of course and it does not vary much Goldstein Browne and Rasbash 2002 p 225 study p n for the case z j and call it variance partition coefficient VPC They show that it may vary strongly with z so a single value like py may not be very relevant The intra class correlation depends on the values of the explanatory variables and so does the variance of the total residual We cannot expect to find a simple expression for the effective sample size involving these characteristics if these do not themselves have a simple expression that is the same for all observations However we can compute a kind of effective sample size in the following way Let be the asymptotic variance of the most efficient estimator of a fixed parameter y if the residuals have the same variances as in the multilevel model but are all independent and let y be the asymptotic variance of the most efficient es
108. nditional intra class correlation 0 47 1 10 0 47 0 2988 iterations 10 2 Log L 720 520970 An extract of the diagnostics is given below Note the reduction of the sample size due to the missing value Diagnostics Level 2 sample size 15 Total sample size 230 Mean Level 1 sample size 15 Effective sample size 44 Squared correlation coefficients Norm based R squared 0 627456 Grand mean based R squared 0 614517 Context mean based R squared 0 240201 Trimmed mean based R squared 0 485713 Level 1 outliers sorted by Prob Level 1 Level 2 Level 1 Unit Unit Unit T Prob 86 6 5 0 013931 0 988885 160 11 9 0 010492 0 991629 45 3 9 0 009729 0 992238 26 2 6 0 000077 0 999938 155 11 4 0 000071 0 999943 147 10 10 0 000013 0 999990 Level 2 Mahalanobis distances sorted by Prob M Unit M Prob M 7 5 052856 0 080910 4 3 803597 0 149863 15 2 866620 0 238823 11 2 709150 0 258324 2 1 940617 0 379085 12 1 340144 0 511717 6 1 095992 0 578134 67 19 1 062185 0 587987 5 0 805649 0 668441 3 0 760907 0 683561 8 0 425112 0 808517 1 0 410341 0 814510 13 0 163562 0 921474 14 0 075244 0 963077 9 0 031498 0 984375 Effective sample size N 1 N J 1 intra class correlation Squared correlation coefficients R squared are highly speculative in nature Prob M probability area under the curve of the chi square distribution See documentation for further elaboration on this subject See section 2 11 for
109. ngle level model to such data intra class dependency has an effect on the error terms It causes the error terms to be correlated The result is that the usual assumption of independent observations is violated if the nested structure of the data is ignored The degree of intra class dependency is reflected in the intra class correlation Obviously this idea of intra class dependency applies to every hierarchical data set Their intra class correlations however may differ substantially Multilevel models For the analysis of hierarchical data hierarchical models or multilevel models have been de veloped Such models can be conceived as linear regression models specified separately for each level of the hierarchy but statistically connected Since each level of the hierarchy has its own regression equation predictor variables measured at either level can be included in the appropriate level model Because hierarchical data structures frequently arise in social and behavioral science research but also in many other scientific areas the application and development of multilevel analysis has in the last decade drawn a lot of attention from numerous researchers Below a brief introduction of some relevant topics concerning multilevel models will be given A more comprehensive intro duction of these topics is given by Kreft and Van der Leeden 1994 For extensive discussions on theory and application of multilevel analysis we refer to the
110. nit Estimates of the regression coefficients and estimates of the error variance including their standard errors t ratios and exceedance probabilities of the t ratios per Level 2 unit are displayed in separate blocks with their Level 2 unit number and Level 2 unit size After the keyword B s and sigma may be specified 3 8 4 olsquares optional This part contains the ordinary least squares estimates for the fixed Gk and random variances and covariances of Uk and E parameters A regression analysis is performed ignoring grouping to obtain the former For the error variance two estimates are displayed the one step EC1 and two step EC2 estimates corresponding to 2 20 and 2 28 respectively The estimate of the covariance matrix of U is obtained from 2 24 3 8 5 residuals optional After the keyword both Level 1 and Level 2 residuals may be specified U and E For the first level three different types of residuals are displayed namely the total raw and shrunken resid uals The Level 2 residuals are the raw and shrunken residuals for the specified Level 2 compo nents These estimates are based on the FIML or REML estimates Formulas are given in section 2 7 3 8 6 posterior means optional Displayed are the posterior means 2 35 which are specified following the keyword These esti mates are based on the FIML or REML estimates of the parameters 3 8 7 diagnostics optional If diagnostics yes is requested sev
111. nits but is constant within each Level 2 unit Therefore as with grand mean centering we may try to include it in the random intercept This gives _ en i By Bij Pat YF YQW uy Y AW Uy jX The penultimate term introduces a Level 2 interaction variable say v which was not present in the original model If the original and transformed model are to be eduivalent then this means that v j should also have been included in the original model Hence this would seem to give the advice that the intercept equation should always include the interactions i e products of all Level 2 variables with all group averages of Level 1 variables just like we argued above that all Level 2 explanatory variables shoud be included in the intercept equation However in the current situation there may be strong substantive reasons why including these products would not be included Moreover inclusion of these products still does not render both models equivalent because the term u je j introduces a Level 2 heteroskedastic error term which does not fit into the MLA model specification Consequently viewing B j as part of the random intercept does not make the uncentered and Level 2 centered model specifications equivalent To obtain a model specification that is equivalent with uncentered and Level 2 centered ex planatory variables we must take 2 14 as the basis with B3j ys Yow Us Th
112. o A Zjv 9 2 J p ZVO Xm ziv o xl do 1 j Ms ZV Zao J Il 17 1 17 1 x Zivj o XV Zh 1 SR Il J 2 ziv o Xy Z v 6 sl do v 1 v l 2 AZ X v zj J aexzy z S T2 ziv x ay Ziv o Xp d f Ziv o Xy Z iv X 47 44 J D Combining A 44 with A 25 and A 26 the partial derivatives are found PL 1 1 1 1 ZV ZG V Z GV ZV 2 00 00 2 k EN L RU SV 2m Mis 417 1 v 1 josi AEV Ziu ZV O xo XYY zj J Il MN 417 1 117 1 1 1 ZA Zivjlo X00 Xy v zj ul Mis y 1 v 1 v l ziv o xpo X v zj Z v Zo s ll y 1 v 1 117 1 ZV O XO XV Z o zv J 117 1 417 1 117 1 v 1 2 v zou zu Z V ZA GS Zal J l J 417 1 ty7 1 7 1 DAG zou ZV O Xo 7 j l 47 1 v 1l v 1 Vj ZA ZV Oj Xo XV Za v 1l v 1l v 1l ZVT Zu ZV O Xo X v zj v 1 v 1 1171 T Z V Z ju z v O m E Xiy V A A 45 88 and e dor Z V Z iZi Zu a IMs x Z Vi Zi Z v o Xjpo Xp v zj gt IZ Xo Xr v zj Z v z9 Z uZ 2 a ll V 1 117 1 yl AA Zi Z V O Xo XV Za m pm ll y 1 v 1 v 1 ZV O X MO XV Zia Vj Zu
113. o PRM 93 04 Leiden The Netherlands Leiden University Department of Psychometrics and Research Methodology Busing F M T A Meijer E amp Van der Leeden R 1994 MLA Software for multilevel analysis of data with two levels User s guide for version 1 0b Tech Rep No PRM 94 01 Leiden Leiden University Department of Psychology Busing F M T A Meijer E amp Van der Leeden R 1999 Delete m jackknife for unequal m Statistics and Computing 9 3 8 Davison A C amp Hinkley D V 1997 Bootstrap methods and their application Cambridge UK Cambridge University Press De Leeuw J 2005a Centering in multilevel analysis In B S Everitt amp D C Howell Eds Encyclopedia of statistics in behavioral science Vol 1 pp 247 249 New York Wiley De Leeuw J 2005b Linear multilevel analysis In B S Everitt amp D C Howell Eds Encyclopedia of statistics in behavioral science Vol 2 pp 1054 1061 New York Wiley De Leeuw J amp Kreft I G G 1986 Random coefficient models for multilevel analysis Journal of Educational Statistics 11 57 85 De Leeuw J amp Kreft I G G 1995 Questioning multilevel models Journal of Educational and Behavioral Statistics 20 171 189 De Leeuw J amp Kreft I G G 2001 Software for multilevel analysis In A H Leyland H Goldstein Eds Multilevel modelling of health statistics pp 187 204 Chichester Wile
114. of the EM algorithm and one of the reasons why MLA uses BFGS by default The posterior means may be compared with the Level 2 outcomes As can be seen the poste rior means tend to be shrunken towards the grand mean and therefore have less variance than the Level 2 outcomes Posterior means Parameter B1 Unit Estimate 1 111 2477 2 122 9878 3 118 7899 4 103 2974 5 102 0644 6 98 6570 7 92 8984 8 110 2970 9 94 9923 10 116 5688 Mean 107 1800 Variance 107 2725 Parameter B2 Unit Estimate 1 28 2171 2 30 9801 3 32 3522 4 26 3462 5 25 4355 6 26 5173 7 24 0949 8 22 4476 9 25 6699 10 25 5391 Mean 26 7600 Variance 9 0631 Note posterior means shrunken estimates of random level 1 coefficients See documentation for further elaboration on this subject 4 4 Multilevel analysis with bootstrap Now we discuss a complex input file for a two level analysis with bootstrap resampling The data file contains computer generated data It contains 231 Level 1 observations in total in 15 Level 2 units There are six variables a constant a Level 2 identifier a Level 1 identifier a Level 1 dependent variable a Level 1 explanatory variable and a Level 2 explanatory variable 65 The model equations are given by yg Big Pop tei Bij yi QW tun Bj Ya YAWj Ug where y j is the dependent variable V4 X j V5 is the Level 1 explanatory variable and W V6 is the Level 2 explanatory variable The following input fi
115. ome bias in the endpoints of Efron s per centile interval 2 68 Assume that there exists a monotonically increasing function g such that me amp 0 N Z 9 1 2 69 The constant z allows for some bias in the estimator g 0 of and the constant a called the acceleration constant expresses the speed at which the standard deviation of the estimator 35 increases with the parameter being estimated In typical estimation problems a 2 and Zo OW From the likelihood based on 2 69 it can now be derived that the exact confidence interval for 6 is equal up to order O N to the BC 4 interval given by a Pelia H Pel ja where Zo Zla z za zo T tz and z 1 5a similarly defined Note that this interval does not depend on the specific trans formation g which follows from the invariance property of H discussed earlier In practice the constants and a have to be estimated but this does not alter the results up to order O N ry Moreover even if 2 69 does not hold the BC endpoints are correct up to order O N whereas in many cases the endpoints of the intervals discussed previously are only correct up to order 132 A simple consistent estimator of z is Qo B The estimation of a is the most important problem with the BC method If it is assumed that a 0 we obtain the BC interval which is discussed e g in Efron 1982
116. ome of the arguments given here were more pressing when we wrote the first version of the program 1993 1994 than they are now because some of the facilities of MLA have also been adopted by other packages at least to some extent Simulation options Much research concerning multilevel analysis has been directed towards the extension and refine ment of multilevel theory including the development of multilevel software and to applications in other domains than educational research At the same time however several relevant questions of a statistical nature concerning this development are still not answered fully satisfactorily One major problem is that estimates of parameters and standard errors as well as hypothesis tests based on them rely on large sample properties of the estimates Unfortunately little is known about the behavior of the estimates when sample size is small Raudenbush 1988 An additional problem is that it is usually assumed that the error terms are normally distributed In practice this assumption will often be violated which has other undesirable consequences for using standard error estimates for hypothesis testing and construction of confidence intervals Fortunately there is an increasing number of simulation studies available which give insight into the quality of estimates of parameters and standard errors under various conditions Busing 1993 Van der Leeden amp Busing 1994 Kreft 1996 Concerning emp
117. on 3 9 Chapter 2 Theory In this chapter the theoretical background of the general two level model used in MLA will be discussed It gives the relevant formulas of the model equations and discusses the assumptions underlying the model and some model specification issues Furthermore it briefly discusses the elements of the output of MLA the estimators standard errors confidence intervals and model fit Statistics that are implemented in the program and their statistical properties A lot of additional output can be optionally requested by the user such as descriptive statistics residuals and diagnostic statistics This is also explained here 2 1 The general two level model In MLA the following general two level model is implemented Suppose data are obtained from N individuals nested within J groups with group j containing N f individuals Now for group j j 1 J y j is a vector containing values on an outcome variable X j isan N jX4 matrix with fixed explanatory variables usually including a constant 8 j is a vector of regression coefficients and j is a vector with random error terms vectors and matrices of appropriate dimensions Then for each group j the Level 1 or within group model can be written as dcos 2 1 The Level 2 or between group model can be written as B Wiytu 2 2 where W is aqX p matrix with explanatory variables usually including a constant obtained at the group level y is a vector
118. ons times The results of the simulation analysis are used to compute the final bootstrap and jackknife estimates The results of a replica tion are not taken into account when the algorithm did not converge or when the estimate or its standard error was fixed to zero because it reached the boundary of its parameter space Further elaboration concerning this subject can be found both in the previous and in the next chapter 3 7 INTERVAL optional Several options for bootstrap confidence interval estimation are available in MLA Hence this state ment only has effect when the bootstrap is selected in the SIMULATION command The specific choices are made through a number of substatements 3 7 1 kind required With this substatement the user can choose from four methods namely normal bootstrap normal interval percentile Efron s percentile bias corrected percentile and bootstrap t percentile t 3 7 2 alpha optional This is i e 1 minus the confidence level of the confidence intervals As usual the confidence intervals are two sided with an estimated probability mass of ja 100 on each side The default value of alpha is 0 05 which gives 9560 confidence intervals Note that in some earlier versions of MLA alpha was half the value it is now i e a 9596 confidence interval was obtained with alpha 0 025 but we think the current specification is more natural 50 3 7 3 weight optional This substatement has im
119. ordinary least squares estimates part T values and probabilities for T are given Here unlike for the OLS estimates these are theoretically justified Full information maximum likelihood estimates BFGS Fixed parameters Label Estimate SE T Prob T G1 31 322474 3 123584 10 03 0 0000 Random parameters Label Estimate SE T Prob T U1 U1 26 935248 23 900119 1 13 0 2597 E 138 833328 14 799679 9 38 0 0000 Intra class correlation 26 94 138 83426 94 0 1625 iterations 5 2 Log L 1398 626571 Note that the estimates are very close to the two step OLS estimates Whenever there are residuals associated with the grand mean correlation is computed and given just below the FIML estimates The formula for the intra class correlation is 2 and in MLA notation UPU P IU E If the technical keyword is omitted from the PRINT statement a short description of the final iteration results is given in the FIML part Here convergence is reached in 5 iterations and yields a 2 Log L value of 1398 63 The final part of the output contains some information about whether everything went well This can also not be suppressed As we can see here the program is terminated correctly in less than 0 01 seconds This job required lots of memory and took 0 00 seconds of processor time warning s issued 6 error s detected End of job 60 4 2 Random effects analysis of covariance For the next example the same Sesa
120. percent sign 26 may appear throughout the input file All text on a line after and including the percent sign will serve as comment and is ignored as program input In the following all statements and substatements implemented are discussed and illustrated with small examples In Chapter 4 where we focus on the program output complete examples are provided 3 1 TITLE optional Following the keyword TITLE the first non blank line contains the title for the analysis Al though the statement is optional it is highly recommended Moments after the analysis all may seem clear but after a few months you may have no idea what you have done The title may be your only clue You may also enrich your input file with comments In contrast to comments the title is repeated on top of every part of the output Example TITLE MLA example 1 analysis of variance 41 3 2 DATA required The DATA statement contains information about the data file This statement has seven sub statements three of which are required The file substatement gives the name of the data file variables the number of variables in the data file 141 the optional variable number of the Level 1 identifier variable and id2 the variable number of the Level 2 identifier variable The missing substatement specifies which value of a variable indicates a missing value and centering and level 2 centering are options for centering the data before further analysis
121. plications for the internal bootstrap performed on the bootstrap t con fidence interval estimation A balanced bootstrap can be specified on this substatement In that case weight balanced must be specified Default is weight unbalanced 3 7 4 replications optional As for the previous substatement this substatement has also implications for the bootstrap t method The number of internal bootstrap replications is specified It must be an integer value between 1 and 32767 The default value is 25 3 7 5 convergence optional See the TECHNICAL statement As for the previous two substatements this substatement has only implications for the internal bootstrap in the bootstrap t method 3 7 6 file optional Results of the interval estimation can be written to a file Using the substatement file a filename may be specified Filenames must satisfy the usual DOS conventions on filenames 3 8 PRINT optional The PRINT statement gives the user control over the output Not all output is optional The default output consists of a title page an echo of the input the maximum likelihood estimates FIML and system information Output for the simulation analysis is generated whenever the SIMULATION statement is used Additional output is controlled by substatements following the OUTPUT statement These will be briefly explained below A more profound elaboration follows in chapter 4 Most theory underlying the different parts of the ou
122. r y are given by y zqQqxylxy 2 19 Using the estimated residuals F y Xy the estimate of the variance of the elements of r can be obtained by N 1 2 5 oO T 2 20 xd 2 20 where p is the dimension of y This estimate on is the one step OLS estimate of the variance of the residuals The usual standard errors for y and are respectively SG f F2AK X 2 21 aa e N p 2 22 These are however only correct if the classical assumptions of i i d normally distributed residuals are correct which is generally not the case for multilevel data Again future versions of MLA will provide more robust standard errors 2 4 3 Two step OLS total model With the two step OLS the same estimates are used as with the one step OLS see 2 19 The total residuals for every group j can be divided into a Level 2 and a Level 1 part This was already done in Equation 2 17 Using ordinary least squares estimates for the Level 2 random components u can be obtained by o NE la uc Zr 2 23 The estimate for the covariance matrix of u becomes O TR 2 24 Wy SIR J j l Under normality the estimated covariances of the elements of can be obtained by Anderson 1958 p 161 cov 0 Onn Opn Pin OnO m 2 25 19 Consequently the estimated standard errors of the elements of O are given by SO 6 0 02 2 26 By first computing the residuals
123. rap estimated distribution function of this quantity i e 0 6 4b lt t sep 8 34 and let D and fia be the ja th and 1 1a th quantiles of G respectively that is D a G 5a and ig G 1 The percentile t interval is obtained by replacing z lo by D and Mle bY Tiie in 2 62 and is thus 8 1 5650 0 15 148 2 67 This confidence interval requires an estimate sep 0 of the standard deviation of 6 for each bootstrap resample b This is usually obtained by performing a small bootstrap within each boot strap resample Thus for example B 1000 bootstrap samples are drawn with replacement from the original sample and within each sample b 1 B B 25 samples are drawn with re placement from the bootstrap sample From the B samples seg 0 is obtained This means that B B bootstrap samples have to be drawn and B B times the estimator of 0 has to be computed In the example this amounts to 1000 25 25 000 bootstrap samples and 25 000 times computing the estimator The percentile t interval tends to perform better than the bootstrap normal and Hall s per centile interval because it uses the nonnormality of the distribution of the estimator as opposed to the former and t is more nearly pivotal than Oo in a number of important cases which means that its distribution depends less on the parameters that are being estimated The quantity 7 is not always nearly pivotal however a
124. re 61 The parameter estimate for the regression coefficient of the covariate is also added to the FIML output part The additional T value and Prob T indicate that the pretest variable explains a significant part of the variance of the posttest variable T 10 18 Prob T 0 0000 Full information maximum likelihood estimates BFGS Fixed parameters Label Estimate SE T Prob T G1 16 196937 2 226478 7 27 0 0000 G2 0 699891 0 068761 10 18 0 0000 Random parameters Label Estimate SE T Prob T U1 U1 6 766783 6 759617 1 00 0 3168 E 89 831170 9 576024 9 38 0 0000 Intra class correlation 6 77 89 83 6 77 0 0701 7 1318 217264 iterations 2 Log L Entering the covariate into the analysis is justified because it has a statistically significant non zero effect The same justification could be made with the use of the likelihood ratio test This test is based on the fact that the difference between minus two times the loglikelihood function value 2 Log L of two nested models follows a chi square distribution with the number of degrees of freedom equal to the difference in the number of free parameters The two models example 1 and example 2 are nested and the likelihood ratio test can be applied The difference between the function values is approximately 1399 1318 81 and the degrees of freedom is equal to 1 The likelihood ratio test corroborates that the effect is highly significant 4 3 Repeated measures ana
125. re If O and o are known the most efficient estimator is the GLS estimator y X V X X V ly where y is the column vector consisting of all yj j X is the matrix of explanatory variables corresponding to the fixed coefficients which in this case is a 37 column vector consisting only of ones and V is a diagonal block matrix consisting of blocks V Z jOZ c I Np where Z j is a matrix of explanatory variables corresponding to the random coefficients which in this case is also a column vector but of length N f instead of N The variance of y is 2 Qv txy y Ys i 1 N Dp and thus the effective sample size is J 1 N Dp Usually of course and are not known and must be estimated However this does not influence the asymptotic variance of the GLS estimator The FIML estimator and two step ML estimator based on the REML estimator are of this form In any case the effective sample size is smaller than the actual sample size Now consider the following simple random effects ANCOVA model Vig Vi YaXij uj which is the random effects ANOVA discussed above extended with the covariate x p which has a fixed coefficient The matrix X is now N x 2 with the first column consisting of ones and the second consisting of the observations x p The covariance matrix V is unaltered and the asymptotic covariance matrix of the GLS FIML two step ML estimators of the fixed coeffi cients y y
126. res that are commonly used to estimate the parameters of multilevel models e A fast algorithm using the Broyden Fletcher Goldfarb Shanno optimization method to obtain maximum likelihood estimates of all model parameters The first point must perhaps be amended a little in light of the graphical user interfaces common today although we still think that the command syntax used by MLA is very easy and intuitive The MLA program runs as a stand alone batch program in command windows on personal computers under Windows It uses simple ASCII text files as input and output The program is easy to use by means of a number of statements starting with a keyword Models are specified by simply formulating the model equations This manual provides the necessary information for the new user to fit multilevel models with two levels to a hierarchical data set It is expected that the user has basic knowledge of regression analysis A brief introduction to multilevel analysis and related concepts is given in the first chapter References to relevant literature can be found in the text iii The MLA program has been developed by Frank Busing Erik Meijer and Rien van der Leeden Although the research subjects that led us to write MLA are not our main subjects anymore and thus work on MLA does not have our highest priority anymore as well we would still appreciate hearing about your experiences using the program and this manual Please contact us by email bu
127. right 1993 2005 Leiden University All Rights Reserved Part 2 Thu Dec 15 10 15 27 2005 analysis of variance The first proper part Part 1 of the output contains an echo of the input file statements This part is always included in an output file 56 Inputfile statements 1 TITLE 2 analysis of variance 3 DATA 4 file sesame dat 5 variables 3 6 id2 mof 7 MODEL 8 b1 g1 ul 9 v3 bl e 10 PRINT 11 input yes 12 descriptives all 13 olsquares yes 14 END 14 lines read from anova in Part 2 is the first optional part of the output It is triggered by the input keyword under the PRINT statement It contains extra information about the input and the output Specifically the input statements are digested and re displayed and all used and unused options are spelled out Input information Required Name of datafile SESAME DAT Number of variables dog Level 2 id column 1 Equation 1 B1 G1 U1 Equation 2 V3 B1 E Single equation Optional Title of analysis V3 E0 G1 U1 analysis of variance Level 1 id column 0 Centering Level 2 centering Estimation method 1 Minimization method 1 Reparameterization 1 Maximum iterations 100 Convergence le 010 Warnings maximum 25 Kind of simulation 0 Simulation method 0 Simulation balance 0 Simulation linking 0 Residuals type 0 Resampling type 0 Initial random seed 0 Simulation convergence 1e 010 Number of replications 0 S
128. roduce the different bootstrap confidence intervals we will introduce some useful notation Let z be the standard normal distribution function Then z is the a th quantile of the standard normal distribution z Let the distribution function of the estimator 0 be H 0 that is H 0 0 A consistent estimator of this distribution function is obtained from the B bootstrap replications 07 b 1 B of 6 b OF 6j H 0 2 59 Note that H is invariant under monotonic transformation in the sense that if is a monotoni cally increasing function of 6 then the estimate of its distribution function is b 6 g 0 B This property has been used in the derivations of some of the confidence intervals described below g 0 H 0 Bootstrap normal confidence interval If the assumptions of the model including the normal ity assumptions hold then the estimators are asymptotically normally distributed with a certain covariance matrix derived from the likelihood function Hence for our typical parameter 0 we have VN NO v 2 60 say The distribution of o 0 can be approximated by the normal distribution with mean zero and variance Y N where v is a consistent estimator of y derived from the likelihood function The usual confidence intervals for 6 are therefore zio 86 0 0 211 58 yO 2 61 where vy Vain is the estimator of the asymp
129. s i IM N my M a Il Ms i M ll Therefore 1 J wae 24 and using A 22 A 23 and A 24 36 Mey J j J 1 E 32 eZ z del Xy V V do 10 17 2 XVX dy J 2 3 tr Z V Z j l XV Oj Xm do Joc Xy VP ZIA OZ V y 2 X yy V X dy J 72 4 j l 117 3 2 Xiy V Xiy do ls 0 Xy V z ae XyyYVj X dy A 40 v Y6 XVP A 41 j l Vi Zyu v 1 117 2 AV 0 Xp XjYV7 z v 1l 117 2 Ziv o XV Zi gh A 42 86 aL 1 j l J X Ziv O Xp Xp V Z A 43 j l From A 31 we have J J Z 117 1 B Zu 2 Xn ziv o x J ge Thus OL d E 1 ss y 52 ja J 1 1 117 1 2 zavo Xj Zjv J J 2 zivi o xy ZV 0 x an T i 7 gt ZiV X dy Ziv o Xj a Il 2 nn E Xy Zv x dy M t dc V Z dO Z V Ze Sa Il gt 21 do V Z d ZiV vio Xn a Il x Z v o J v 1l 2 zv o Xp 2 2 1 117 1 x ZIV do V Z d Z V lo J Yn Zro xp j l S 23 z An o Xy z v x dy using A 21 j l 87 J Wy 7 2 245 Zu j l J d
130. sample of from a normal distribution with mean zero and variance a where G is the estimate of o from the original sample Then a bootstrap sample yy of y is computed from the follow ing equation y Axi 2 54 where and B are the estimates of and f from the original sample The situation is different if x is regarded as a fixed design variable chosen by the exper imentor This happens for example if x is the dose of some drug administered to rats by the experimentor Then each bootstrap sample should have exactly the same x values that is x7 x for each i in each bootstrap sample The parametric bootstrap is in this case simply obtained by 2 54 with x The nonparametric bootstrap is in this case however completely differ ent from the nonparametric bootstrap with random x In this case first the errors are estimated 30 from the original sample by 8 y d Bx 2 55 Then bootstrap samples lej are drawn from fei ane and bootstrap samples of y are obtained analogously to 2 54 y Bx ef 2 56 Then bootstrap estimates of the parameters and bootstrap estimates of the covariance matrix of the parameters are obtained in the usual way e g Efron 1982 pp 35 36 The jackknife can also be implemented straightforwardly in regression models One com plete case is removed from the sample for each for the ungrouped jackknif
131. sing fsw leidenuniv nl e meijer eco rug nl or vanderleeden fsw leidenuniv nl We would like to thank Jan de Leeuw Ita Kreft Wolfgang Langer and Tom Wansbeek for helpful discussions comments references etc Frank M T A Busing Erik Meijer Rien van der Leeden Leiden and Groningen December 2005 Contents 1 Introduction 1 1 1 Introduction to multilevel analysis 1 1 2 ThepositionofMLA 2 0 0 ss 5 13 Changes since version 1 0b 0 000000 eee eee 7 2 Theory 9 2 1 The general two level model 000000000000 9 2 2 Descriptive Statistics azione UR RR Sh nee Sea ea a 10 2 3 uos soe heed Meg Veo EP Nee PRONUS A s 12 2 3 1 Centering in linear regression ele 12 2 3 2 Centering around the grand mean in a multilevel model 13 23 3 Rosen od S Repo reb RUP pue won 15 2 4 Ordinary least squares estimators 18 24 11 Within group models les 18 2 4 3 One stepOLS 1 19 2 4 3 Two step OLS total model 19 2 5 Full Information Maximum Likelihood 20 2 6 Restricted Maximum 1 1 4 21 s Geet ME Soc PEGE SN EV D ES 21 2 8 Posteriormeans ee 22 2 9 ODIaPIOSUCS t 3s To ovo em US Rd rea eu eium 23 2 10 Si
132. te of the Centre for Multilevel Modelling http multilevel ioc ac uk cur rently August 2005 contains the following list of programs that can be used for multilevel mod eling aML EGRET GENSTAT HLM LIMDEP LISREL MIXREG MLwiN Mplus R SAS S Plus SPSS STATA SYSTAT WINBUGS Remarkably they do not include MLA Terminology In the literature multilevel models are referred to under various names One may find the terms multilevel mixed effects models Goldstein 1986 multilevel linear models Mason Wong amp En twisle 1983 hierarchical linear models Raudenbush amp Bryk 2002 random coefficient regres sion models De Leeuw amp Kreft 1986 random parameter models Aitkin amp Longford 1986 contextual effects models Blalock 1984 full contextual models Kreft amp Van der Leeden 1994 and variance components models Aitkin amp Longford 1986 Although there are minor differ ences all these models are basically the same In one way or another they are versions of the multilevel model discussed here or straightforward extensions thereof 1 2 The position of MLA This manual describes the use and capabilities of MLA This program has been developed to ana lyze data with a two level hierarchical structure In this section we will explain the features that distinguish MLA from other programs mentioned above In other words we are concerned with the question What is special about MLA It should be noted that s
133. ted the more matrix weight it gets and the closer the posterior means are to the within group estimates A j can be called an estimated reliability matrix cf Bryk amp Raudenbush 1992 p 43 and the estimated prior expectation Wiy of B p the 2 9 Diagnostics Currently the only option for diagnostics performed by MLA apart from the descriptive statistics is outlier detection Although the term outlier seems to be unambiguous this is not completely true An outlier is considered to be a deviant observation in the data and not a deviant residual after model estimation However a procedure fitting outliers in the data as residual outliers is considered to be a robust procedure Outliers are in MLA detected using residuals So we expect MLA to be robust against data outliers and therefore look for residual outliers More research in the field of robustness for multilevel models would be useful however The detection of outliers differs for Level 1 and Level 2 outliers For both levels the shrunken residuals are considered For the first level the quotients 2 37 2 are calculated where 1 2AL z2 o N A Ejj ij is the variance of the Level 1 residuals Residuals will be displayed whenever the quotient 2 37 when compared to a standard normal distribution has a p value less than some possibly user specified value The default value is 0 1 For the Level 2 outliers the Mahalanobis distances of
134. the estimators 6 namely 6 De 16 8 The variance of 0 given F y is estimated by the variance of the estimators 6 var 0 gt gt 6 B The bias of 6 is estimated by the estimated bias of 6 bias bias 9 6 2 51 and the bias corrected estimator of 0 is therefore 0 6 bias 20 8 2 52 The variance of 0 is simply estimated by the variance of 67 2 53 Me mmt I cs EUN 1 var var 6 7 b 1 The bootstrap as described above can also be termed the nonparametric bootstrap because the distribution the bootstrap samples are drawn from is the nonparametric empirical distribution function Fy Frequently however it is assumed that F is a specific distribution F only de pending on a parameter or parameter vector which may or r may not be the same parameter as 0 Then if is estimated by F can also be estimated by F 9 instead of Fy If the distributional assumption about F is correct this parametric empirical distribution ee will generally be a better more efficient estimator of F The parametric bootstrap is defined exactly analogous to the nonparametric bootstrap except that bootstrap samples are drawn from F y instead of F y This means that no longer samples are drawn with replacement from the original data but from a generally more smooth distribution function Hence the values of the z in the bootstrap sample will usually not be values also enco
135. the mean of the estimates over all groups The amount of shrinkage depends on the reliability of the estimates from the separate groups The less precise the estimates are the more they are shrunken towards the mean over all groups Technically the shrunken estimators are the expectations of the random coefficients given the parameter estimates and the data of all groups Estimation Fitting a multilevel model amounts to fitting one combined model instead of separate models for each level It is the translation of the idea that although separate models for each level may be formulated they are statistically connected as was mentioned above The combined model contains all relevant parameters In the next chapter we will further clarify this subject Combined models or multilevel models can be viewed as special cases of the general mixed linear model cf Harville 1977 Such models are characterized by a set of fixed and a set of random regression coefficients The parameters that have to be estimated are the fixed coefficients and the variances and covariances of the random coefficients and random error terms The fixed coefficients are informally called fixed parameters and the variances and covariances of the ran dom coefficients and random error terms are informally called random parameters although all these parameters are technically nonrandom They are the parameters associated with the fixed and random parts of the model resp
136. the number of bootstrap replications is specified It must be an integer value between 1 and 32767 2P 1 The default value is 100 This number is usually considered sufficient for bias correction and computation of standard errors but for computing bootstrap confidence intervals a value of 1000 or more is needed This has also been concluded in another context by Markus 1994 3 6 8 convergence optional See the TECHNICAL statement Specifying the convergence substatement within the SIMULATION statement has only implications for the convergence during simulation 49 3 6 0 file optional Results of the simulation analysis can be written to a file Using the substatement file a filename may be specified Filenames must satisfy the ususal DOS conventions on filenames For each replication the following results are written to the file in ascii space separated 1 global information e replication number e luxury level obsolete e seed e 0 obsolete e 0 obsolete e number of iterations until convergence e the minimum of the 2 log likelihood function 2 estimation results triplets containing e parameter number e estimate e estimated variance of estimator square of standard error of each parameter The parameters are in the following order oe c Yp Op Oj 610 O4 O g Where p is the dimension of y and q is the dimension of each pb qq The estimation results are thus repeated replicati
137. the outcome variable solves the problem In stating this we have ignored the difference between ej and EF as usual The price of Level 2 centering of the outcome variable is that the intercept j has dropped out of the equation In many social science applications the intercept is not very interesting but in a multilevel model the Level 1 intercept is usually important if only because the main effects of the Level 2 explanatory variables are channeled through the random intercept Therefore in many cases one would not be willing to reduce the model by Level 2 centering of the outcome variable Then given the non equivalence of the resulting models the decision whether or not to use Level 2 centering must primarily be based on substantive arguments There are some additional issues with Level 2 centering that have not been stressed in the cited literature Even if there is a good substantive reason to assume that the group mean centered variable is the most relevant explanatory variable one would assume that the really relevant vari able would not be x j with the sample average subtracted but with the population mean X j Say subtracted Except for some kinds of laboratory experiments it does not make sense to assume that the outcome only depends on the average of x for the units in the sample and not on the units not included in the sample Thus the correct model would be Yy Bij X98 By By x X Ba I
138. timator of y with the intra class dependency present With these definitions the effective sample sizes in the cases studied above can all be written as ij 1k ESS 2 72 As stated above and proved in section A 4 Y is the k th diagonal element of the matrix X Similarly Yy is given by the same formula but with V replaced by the matrix that has the same diagonal elements as V but has all its off diagonal elements equal to zero With these values of WY and yng we define 2 72 as the effective sample size for the k th fixed parameter Note that we have only considered effective sample sizes for the fixed parameters If one is interested in the random parameters it does not make much sense to make comparisons with an independent sample because the random parameters are mainly substantively interesting for modeling the intra class dependencies Therefore effective sample sizes for these are not par ticularly interesting However most random parameters can be estimated consistently from an independent sample because they induce a certain parametric form of heteroskedasticity Only the Level 1 variance o and the Level 2 variance corresponding with the constant cannot be disentangled Only their sum can be estimated consistently from an independent sample So for all other random parameters an effective sample size could be computed but this is not done in MLA because it is not very interesting because it is considerabl
139. tion minimization algorithm minimization the repa rameterization of the parameters to ensure positive definiteness of estimated covariance matrices reparameterization the maximum number of warnings warnings the maximum number of iterations maxiter the convergence criterion convergence the random seed to be used for the simulations seed and the possibility of writing intermediate iteration results to disk file If this statement and subsequent substatements are not specified the program will run using default values Example TECHNICAL estimation fiml estimation method fiml maxiter 18 maximum number of iterations equals 10 convergence 0 00001 function convergence set to 0 00001 file tech out technical results will be written to tech out 3 5 1 estimation optional The substatement estimation provides the opportunity to set the estimation method One can choose between fiml and reml The default method is fiml which represents full information maximum likelihood estimation reml is restricted maximum likelihood estimation Both proce dures are described in chapter 2 3 5 2 minimization optional This substatement sets the minimization method One can choose between BFGS using the Broyden Fletcher Goldfarb Shanno variant of the quasi Newton minimization method e g No cedal amp Wright 1999 chap 8 and EM the Expectation Maximization algorithm Dempster Laird amp Rubin 1977 The default
140. totic standard deviation of 0 Under mild regularity conditions the estimators are asymptotically normally distributed even if the random terms in the model are not In that case S y may not be a consistent estimator of the standard deviation of the estimators of the variance components although it is still consistent for the fixed parameters This suggests replacing s in 2 61 by a bootstrap estimator This gives the boot strap normal confidence interval 0 21 58 0 O zi 1 585 0 2 62 33 in which e is the bootstrap estimator of the standard deviation of 9 Alternatively one might use 05 zia Seg 0 Og Zi 1o Seg 0 j 2 63 where 9 is the bootstrap bias corrected estimator of 0 The bootstrap normal confidence interval relaxes the assumption of normality of the data but still heavily relies on the asymptotic normality of the estimators In finite samples however the estimators may not be approximately normally distributed Busing 1993 Hall s percentile interval Hall s percentile interval Hall 1992 p 12 takes the bootstrap nor mal interval 2 62 as its starting point That interval is based on the idea that Pr 8 21 950g lt 0 21 4058 l a 2 64 because 0 is asymptotically normally distributed and s KO is a consistent estimator of its standard deviation In finite samples however the distribution of may not be approximately normal Busing 1993 Therefore instead of usin
141. tput can be found in chapter 2 Example PRINT input descriptives random level 1 coefficients olsquares residuals posterior means diagnostics yes display digested input statements V1 display variable descriptive statistics Bl sigma no U2 E B1 yes 3 8 1 input optional The subcommand input yes requests extra information about the input and the output Specif ically the input statements are digested and re displayed and a single equation form of the model as in 2 3 is displayed and all used and unused options are spelled out from which the input can be checked 51 3 8 2 descriptives optional If this is requested a few simple summary statistics are displayed After the keyword descriptives the user may specify both variables and Level 2 identification codes descriptives all means all variables and all Level 2 units For the total sample and for every Level 2 unit specified the following statistics are com puted and displayed mean standard deviation denoted by Stddev variance skewness kurto sis Kolmogorov Smirnov s Z denoted by K S significance level of Z denoted by Prob Z minimum 5th percentile P5 first quartile Q1 median third quartile Q3 95th percentile and the maximum Formulas have been given in section 2 2 3 8 3 random level 1 coefficients optional The random Level 1 coefficients or Level 2 outcomes consist of ordinary least squares estimates per Level 2 u
142. uations each term is followed by a number except for the Level 1 random term E For the Vi term this number is the variable number the position of the variable in the data file e g V4 the fourth variable in the data file The other terms only use a number for identification without any additional meaning e g G3 one of the fixed parameters The Bk terms have meaning in the equations of both levels Every equation consists of one term before and at least one term after the equals sign Example MODEL B1 G1 G2 V6 U1 random intercepts dependent on level 2 predictor B2 G3 G4 V6 U2 random slopes dependent on the same level 2 predictor V4 B1 B2 V5 E X level 1 equation dependent on level 1 predictor As shown above terms on the right hand side of the equations are connected by plus signs A variable and a corresponding parameter are connected by an asterisk This is used to connect a fixed parameter and an observed predictor variable in Level 2 equations and to connect a Level 1 regression coefficient and an observed predictor variable in the Level 1 equation In chapter 4 several variations of the two level model will be presented and discussed in more detail Because a Level 1 equation and at least one Level 2 equation are required the minimal spec ification of a model is MODEL Bl Gl fixed intercept V4 B1 E level 1 variation or MODEL B1 V4 random intercept B1 E level 1 variation
143. uld be the parameters used by the program This reparametrization may have some drawbacks cf Gill Murray amp Wright 1981 pp 268 269 but we think that it may generally be useful for multilevel analysis See also Longford 1987 who uses a similar reparametrization of a restricted model Note that the reparametrization A 60 cannot be easily used if some elements of are restricted In order to minimize the reparametrized function the gradient vector should be reparametrized accordingly This is done by using the chain rule of partial derivatives If the original parameter vector is denoted by 0 and the reparametrized parameter vector by then L _ aL 06 Ob 00 Ob A 61 Therefore the formulas from section A 3 have to be postmultiplied by 00 Og The relevant formula for is To form the relevant expression for C consider the k and k k elements of O where k gt I q 9 p Chu Cin u 1 l gt Cu Cu u 1 q 2 eu 2 Cy u 1 k 67 u 1 So 96 SSS tas 0 if u gt l add 00 aC C ifusl 0 if u gt l aC s 96 LI 20 ifuzkanduzl OC 00 2C ifuxk 00 ifu gt k 96 A Q ifu k Consequently if u gt v These formulas are implemented in the program It is possible to transform the second derivatives in a similar way to obtain an estimator of the covariance matrix of the estimators But in general the
144. untered in the original sample For example if it is assumed that F is a normal distribution function with mean and variance c then bootstrap samples are drawn from a normal distribution with mean x and variance s where x and s are the mean and variance of the original sample 29 2 10 4 Balanced bootstrap As discussed above the nonparametric bootstrap draws B samples of size N with replacement from the observed values z z5 zy Taken together these form NB drawings Let f be the number of times z is drawn among the NB drawings Clearly E f B However f will generally not be equal to B This difference will lead to a nonzero estimate of the bias of an estimator even if the estimator is unbiased The balanced bootstrap is a resampling method that ensures that all N values z are drawn exactly B times in the B samples A simple way to achieve this if memory is sufficient is to make a supersample consisting of B copies of the original sample and draw B samples of size N without replacement from this supersample However more efficient algorithms to achieve the same goal are implemented in MLA In many cases the balanced bootstrap is statistically somewhat more efficient than the or dinary bootstrap especially with bias estimation See e g Davison and Hinkley 1997 sec tion 9 2 for an extensive discussion of balanced bootstrapping 2 10 5 Resampling regression models Consider a simple linear regression mo
145. user will be more interested in the original parameters and therefore the estimates of the transformed parameters are retransformed to estimates of the original parameters and the covariance matrix of section A 4 is used This procedure is correct because the transformation is a one to one mapping from the feasible region of the original parameters to the domain of the transformed parameters except for some trivial equivalent solutions such as c and which lead to the same retransformed solution Only 94 when the estimates are near the boundary of the feasible region the asymptotic covariance matrix may not be correct but the usual statistical theory only applies to interior points so boundary solutions are a problem in any parametrization 95 96 References Aitkin M amp Longford N 1986 Statistical modelling issues in school effectiveness studies Journal of the Royal Statistical Society A 149 1 43 with discussion Anderson T W 1958 An introduction to multivariate statistical analysis New York Wiley Blalock H M 1984 Contextual effects models Theoretical and methodological issues Annual Review of Sociology 10 353 372 Bryk A S amp Raudenbush S W 1992 Hierarchical linear models Applications and data analysis methods Newbury Park CA Sage Busing F M T A 1993 Distribution characteristics of variance estimates in two level models A Monte Carlo study Tech Rep N
146. variable should then not be centered of course Centering of the outcome variable is more complicated Consider the model 2 13 Then Vij Vip 7 Y By By amp j By Pox E Bx By jx Ej pk Bij t Bopp ij where Bj Box amp j 8 and eee Box N Dy 2 i 1 The equation for the intercept in the transformed model becomes Bij Yi YoW Mg YoWay Hy V3 yY4Wx wx YW V3 Y4WX yaw j Uy Hy wx yi Yow Uy js where yy Yo YaX yAWx eet E ees Uy Uy ux and wx and u x are defined analogous to B2x and i is defined analogous to w If the fixed pa rameters are unrestricted the transformation leaves the fixed part unchanged We are not certain that the estimation results when treating Ej and uy as 1 1 4 random variables with zero means are equivalent to the estimation results using the uncentered outcome variable with the appropriate transformation of the fixed intercept y into y In the OLS regression case the results are equiv alent as stated above but in this model they may not be However even if they are different the differences will typically be very small because 8 and uy still have mean zero and thus their sample averages will also be close to zero so that the differences between e j and e p and the differences between uy and u j will be negligible espe
147. verging to zero at a rate of 1 VN So the estimators seem to be consistent after all but we are unaware of the fine details of its proof A more important issue associated with Level 2 centering is that it may well be the case that instead of x j mean x p the relevant variable is another relative measure such as x j median x p Xj mean x pi median x p x jT mean x j sd x p and so forth Most theory that suggests that a Level 2 centered variable should be used is not strong enough to rule out these alternative specifications Because it matters which one is taken one should ideally either come up with a good reason why a specific form should be chosen or perform a detailed specification analysis However MLA currently only supports centering with respect to the mean so other transformations must already have been made in the data file that is read by MLA 2 4 Ordinary least squares estimators 2 4 1 Within group models In this section we will use the notation 2 1 2 2 Based on 2 1 ordinary least squares esti mates for 8 j are given by B QCX 2 15 and the estimated standard errors of the elements of B are the square roots of the diagonal ele ments of the covariance matrix given by FXX 5 2 16 where T magee j regression estimators the usual regression theory applies within each Level 2 unit given that j y X 3 and q is the dimension of B Because these are standard is no
148. verted and its determinant calculated Therefore in this section a computationally efficient formula will be derived based on Longford 1987 and using formu las from the previous section Along the same lines computationally efficient formulas for the derivatives of this function with respect to the parameters will also be derived Combining A 6 A 9 and A 11 we find the following formula for L J 1 5 log 2z uo 2 log o2 det G 1 J yp 2 4 ly Xylo 1y o z 08 Zl Nie Ms Il gt log 2 X log o log detG J je 25 Oj 0 j l J EE NS 7597 15 6 Xjy Z 68 Zi Fl J N N 1 log 2 7 log o f log det G 1 2 7 To o a s Ty Sex j l j l j l J r TROP 25 DZ ZIX yyy 66 Ziy ZX jy A 27 Formula A 27 is a computationally efficient formula and this is the formula that is implemented in the program 79 To find the gradient of L we start with the differential of L J 1 dL 5 dlog det V j l 1 J 2420 XY V CX dy 1 J 1 5240 X AVDO A 28 a Il Combining A 28 with A 20 A 21 and A 17 we find o XjyyV X dy j Ms dL 2 j Il 1 58 Xyy V do 10 Xj tr ys oly Z d Z Il NI 9d E NI be T S Il J 2 0 X YVj X dy j l 1 J z 2 0
149. w regarded as a parameter vector instead of as a random vector In particular this means that under the usual assumptions B is unbiased with covariance matrix c QCX TS where o is consistently and unbiasedly estimated by These results do not require normality and as is clear from this discussion the variances of the Level 1 residuals are allowed to be different in different groups In typical multilevel datasets the sample size within a given Level 2 unit may be quite small compared to ordinary regression analyses Thus although unbiased and consistent these OLS estimators may not be very precise and the estimated covariance matrices may also not be very precise Although it is not very common in linear regression we can also compute standard errors for 95 Unlike the covariance matrix of B the variance of also depends on whether the normality assumption holds The standard error that is printed in MLA is 4 2 j N q se F which is only correct when the normality assumption holds Future versions of MLA will also compute robust standard errors which are correct under nonnormality 18 2 4 2 One step OLS total model From Equation 2 6 the term Zu can be considered the random part of the equation Taking the total residuals r Zurte 2 17 leaves after substitution y Xy r 2 18 Now y can be estimated consistently using ordinary least squares Notice that grouping is ignored Estimates fo
150. y Dempster A P Laird N M amp Rubin D B 1977 Maximum likelihood for incomplete data via the EM algorithm Journal of the Royal Statistical Society B 39 1 38 Durbin J 1973 Distribution theory for tests based on the sample distribution function Philadelphia SIAM Efron B 1979 Bootstrap methods Another look at the jackknife The Annals of Statistics 7 1 26 Efron 1982 The jackknife the bootstrap and other resampling plans Philadelphia SIAM Efron B 1987 Better bootstrap confidence intervals Journal of the American Statistical Association 2 171 200 with discussion 97 Efron B amp Tibshirani R J 1993 An introduction to the bootstrap New York Chapman and Hall Gill E Murray W amp Wright H 1981 Practical optimization London Academic Press Glasnapp D R amp Poggio 1 P 1985 Essentials of statistical analysis for the behavioral sciences Columbus OH Charles Merrill Goldstein H 1986 Multilevel mixed linear model analysis using iterative generalized least squares Biometrika 73 43 56 Goldstein H 2003 Multilevel statistical models 3rd ed London Arnold Goldstein H Browne W amp Rasbash J 2002 Partitioning variation in multilevel models Understanding Statistics 1 223 231 Gong G amp Samaniego F J 1981 Pseudo maximum likelihood estimation Theory and applications The Annals of Statistics 9 861
151. y more complicated and because it relies heavily on strong assumptions like normality of the residuals and homoskedasticity at Level 1 2 12 Missing data Missing data are a frequently occurring phenomenon For instance in repeated measures designs the points in time at which the different subjects are measured may not be the same or the number of points in time the subjects are measured may differ This situation leads to missing time points that is all time specific variables of a subject are missing at some point in time However the 39 time invariant variables such as sex are of course known This situation is easily handled by a multilevel model in which the subjects are the Level 2 units and the time points are the Level 1 units As was discussed for a usual multilevel model the number of Level 1 units may be different for different Level 2 units and so the missing timepoints give no problems An example of repeated measures is given in chapter 4 If however in a multilevel model be it an application in repeated measures or not for some Level 1 unit some Level 1 variables are measured but others are not or for some Level 2 unit some Level 2 variables are measured but others are not there are missing values that can not be handled by the standard model If only output variables are missing the EM algorithm provides a standard way of dealing with the missing values in a satisfactory way If however some ex ogenous

MLA. Software for multilevel analysis of data with two levels. User's

Contents

Download Pdf Manuals

Related Search

Related Contents