Home

Software for MultiLevel Analysis of Data with Two Levels User's

1. J vitx ZVT Xi J M Vi y Xi ziv 1X dy using A 21 62 J do gt gt KUVIZIA IZI VI ZA j l ZjVi 27 i kl II oq EN Ms II Jj c m AV w Xi vi t E de Me ZiV gt Z d0 1 he ll x ZV wo Xj XV z J Ete xl tet x J eX Vit Xem Xov x GONZ V ZI zv Vix dy Zi V vj X9 J V ui X Zi V Xa A 44 Combining A 44 with A 25 and A 26 the partial derivatives are found L 004 Ou M Givi Zye GV Zu ZV Ziel ZiVj Zu Ll v x ZV AZ ku ziv y Xi yj Xi Vj jj tO Zvi yj Xy Xy V ZI Jo MV i Xii Xov zi ZV Zu ZIV i Xy Zi GV Zu J Y ZV Zi al Zu ZIV ST Zu j l J UV zou zv tw XGN XV j l ZVT Zaa ZV Muy Xiu KV Z ZV Z a ZVT ij Xoii Xov Zi t ZV Za ZV us Xo wi Xov Zs y A 45 63 and 32L BSAA 17 ZVZ dOr Oaa i ku jul J 4 MU Vi 17 i ku ziv y X57 yj XV Zi J ZV wi Xii Xov zj AV Za j l J M Uv zv zu j l J EUNT 1z j ku ziv y j Xy Xo V Zi iz ZV i Xi Xov zi Z V Za A 46 Analogously from A 32 we have 1 J 1 1 J 1171 2 jos ii Zi me 3 DILI uj X9 j l j and zo ZIV Zi kk do 00 ki 3245 Disk 1d E 7 23 iV ZINA NZI ZA 1 J ben uj Xj vr xe pan j l 1 J 5 lUi zoe j l
2. o Z o ZZO0GZ o Z o ZiZ 0 o G Z o Z o l ZiZj a 1 G Z 0 7 Za Gj 1 G Z from A 7 oZ oZ GIZ o 7G Z A 12 From A 12 it follows that TV Zi 0 Gi ZZ A 13 and 2 2p 1 1 7 ZVi Z 2e G ZVZ o0 1G Zi A 14 51 The traces of vr and Vi From equation A 9 we find trV d tr c7 In c Z0 Z1 a N o tr Z jOG t Zi 07 Nj o tr Z1Z OG7 0 N c tZ Oa G 1 0 N o t I ZiZ ojo 1 G5 o Nj o tr Gj L GT o N o tr I po a 7N o0 7q o twG A 15 Similarly using A 9 A 15 and A 12 trv tr es c zj0a Zi veu o uV o w Z OG ZiV o 4 N q otra o tr Zj OGF a7 G5 Z1 o 4 N q otra cw LZ O a G 7 G7 1 0 N q ao trG7 or tU ae a N q 0 t lg t lig trG 2 a 4 N q o tG A 16 Differential formulas As was stated in the previous section the maximum likelihood estimates are obtained by minimizing the minus log likelihood function by the BFGS method which uses the gradient of the function To find the gradient the differential notation of Magnus and Neudecker 1985 1988 will be used The key property of dif ferentials is their relation with derivatives through the following equivalence Let f be a vector or scalar function of a vector or scalar variable x then of Ou The differential of a matrix is defined
3. The amount of shrinkage depends on the reliability of the estimates from the separate groups The less precise the estimates are the more they are shrunken towards the mean over all groups Technically the shrunken estimators are the expectations of the random coefficients given the parameter estimates and the data of all groups Estimation Fitting a multilevel model amounts to fitting one combined model instead of separate models for each level It is the translation of the idea that although separate models for each level may be formulated they are statistically connected as was mentioned in a previous subsection The combined model contains all relevant parameters In the next chapter we will further clarify this subject Combined models or multilevel models can be viewed as special cases of the general mixed linear model cf Harville 1977 Such models are characterized by a set of fixed and a set of random regression coefficients The parameters that have to be estimated are the fixed coefficients and the variances and covariances of the random coefficients and random error terms The fixed coefficients are informally called fixed parameters and the variances and covariances of the random coefficients and random error terms are informally called random parameters although all these parameters are technically nonrandom They are the parameters associated with the fixed and random parts of the model respectively To obta
4. j 1 J and ex j l J 7 1 N are drawn and nonparametric bootstrap samples of y are obtained from 2 50 and 2 51 Then estimators can be obtained in the usual way and bootstrap bias corrected estimators and standard errors can be obtained straightforwardly This bootstrap procedure of resampling from estimated errors is called the error bootstrap Whether the shrunken or raw residuals are to be preferred in bootstrapping multilevel models is unclear yet They are both implemented as options in MLA It is also unclear whether these bootstrap methods are satisfactory or other bootstrap methods should be used instead Wu 1986 If the X and W variables are considered random nonparametric bootstrap samples can be drawn by resampling complete cases This is however somewhat more compli cated than in regression analysis because the hierarchical structure of the data should be respected The bootstrap samples can be drawn in the following way First a sam ple of size J is drawn with replacement from the Level 2 units This gives a sample 77 k 1 J of Level 2 unit numbers and accompanying Level 2 variables W Then for each k a nonparametric bootstrap sample of complete cases from the original unit 23 j ji is drawn giving y X5 k 2 1 J i L o Nix This is called the cases bootstrap for both levels It is also possible to draw bootstrap samples from the Level 2 units only keeping all the y s X s a
5. 2 j indicates the site and Level 1 i the children The model to be estimated is Yij yt uj ei 4 1 where y is the overall mean on the posttest score u is the Level 2 deviation from y or Level 2 error component and j is the Level 1 deviation from y u the average score of unit 7 also called the Level 1 error component Equation 4 1 can be divided into two separate equations one for each level Yi Bj ea bj 7 Uj In this way the deviations or error components for the different levels are easily seen These equations are also the equations that are to be used in MLA to specify the model Along with the other statements the input file is as follows TITLE 37 MLA example 1 analysis of variance DATA file sesame dat vars 3 id2 1 MODEL bi gi ui v3 bite OUTPUT inpu desc olsq END All output contains the MLA title page It is the first part of the output It only supplies information about the name and origin of the program It is not possible to leave this part out MMMM MMMMM LLLL AAAAAAAA MMMMM MMMMMM LLLL AAAAAAAAAA MMMM M MMMMMMM LLLL AAAA AAAA MMMM MM MMM MMMM LLLL AAAA AAAA MMMM MMMM MMMM LLLL AAAA AAAA MMMM MM MMMM LLLL AAAAAAAAAAAAAAAAAA MMMM M MMMM LLLL AAAAAAAAAAAAAAAAAAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA AAAA MULTILEVEL
6. 2 32 where b1 b2 are constants that do not depend on N Now consider removing a group of m observations from the sample and reestimating 0 based on this sample of size N m The resulting estimator may be called On m The estimator On m is of the same sort 17 as On The only difference is that it is based on a sample of size N m instead of N Therefore the bias formula 2 32 also holds for this estimator with N m substituted for N that is by b2 biasy m E N m 0 RN Nm Nom N m 2 33 Now consider the difference between 2 33 and 2 32 given by m m 2N m E N m 0N bi l ba LR On v sio IERI From this equation it can be seen that an estimate of the leading term of the bias of On can be obtained from m cs ch 1 fnm 8x 2 34 m biasw 4 Now a bias corrected estimator of the parameter is 81 n as ay 1 ON lt n 2 35 From 2 32 and 2 33 it is found that the bias of this estimator is bs EG 8 NUN AA which is of order 1 N if m is relatively small compared to N This is a much smaller order than the bias of On which is of order 1 N The estimator n m was obtained by removing one group of size m from the sample There are however many groups that can be used for this Consider for example the case that m 1 Then there are N groups of size 1 that could be remov
7. ANALYSIS FOR TWO LEVEL DATA AAAA AAAA VERSION 1 0b AAAA AAAA DEVELOPED BY AAAA FRANK BUSING AAAA ERIK MEIJER AAAA RIEN VAN DER LEEDEN AAAA AAAA PUBLISHED BY AAAA LEIDEN UNIVERSITY AAAA FACULTY OF SOCIAL AND BEHAVIOURAL SCIENCES AAAA DEPARTMENT OF PSYCHOMETRICS AND RESEARCH METHODOLOGY AAAA WASSENAARSEWEG 52 AAAA P O BOX 9555 AAAA 2300 RB LEIDEN AAAA THE NETHERLANDS AAAA PHONE 31 0 71 273761 AAAA FAX 31 0 71 273619 AAAA Except for the title page and the optional input part every part contains a header The header is always the same and is made of two lines of standard text and the title of the analysis supplied by the user For this first example it reads MLA U COPYRIGHT 1993 1994 LEIDEN UNIVERSITY MULTILEVEL ANALYSIS FOR TWO LEVEL DATA VERSION 1 0b 09 10 1994 ALL RIGHTS RESERVED PART 2 MLA EXAMPLE 1 ANALYSIS OF VARIANCE The second part of the output contains an echo of the input file statements This part is always included in an output file INPUTFILE STATEMENTS 1 TITLE 2 MLA example 1 analysis of variance 3 DATA 38 4 file sesame dat 5 vars 3 6 id2 1 7 MODEL 8 bi gi ui 9 v3 bi e 10 OUTPUT 11 inpu desc olsq 12 END 12 LINES WERE READ FROM INPUTFILE EXAMPLE1 IIN The third part is the first optional part of the output It is triggered by the input keyword under the OUTPUT statement It contains extra information about the input and the output Specifically t
8. The substatement type is only required whenever the substatement kind bootstrap is used in combination with method error The type substatement specifies the type of estimation that is used to determine the Level 1 and Level 2 residuals One can choose between raw and shrunken More details can be found in Chapter 2 3 5 4 resample optional The substatement resample offers the user the choice at which level units will be re sampled The default is 0 which means that at both levels units will be resampled If kind jackknife or kind bootstrap and method cases the user may choose 1 or 2 which means that only Level 1 units or only Level 2 units will be resampled respec tively The kind of nested structure in the data will determine which choice is appropriate For instance with repeated measures Level 1 nested within individuals Level 2 it is probably not useful to resample Level 1 units with the cases bootstrap 3 5 5 replications optional Using the substatement replications the number of bootstrap replications is specified It must be an integer value between 1 and 32767 2 1 The default value is 300 and this number is usually considered sufficient although Markus 1994 suggests 1000 in another context 3 5 6 seed optional For diagnostic purposes one can provide an initial number seed for the random number generator This is specified by the substatement seed Using the same initial seed the simulation
9. X97 j l J 59 ASV u Xi Z 9G Zi yj X5 j l EE 2 nna glo 53 det G 773 og 27 2 22 og de 1 J J J 50 Zuu 2Y o Xi Y AG 7 j l j l j l 1 J 57 2 Uwi Z TAGI OGI Ziyi ZiXjy A 27 j l Formula A 27 is a computationally efficient formula and this is the formula that is implemented in the program To find the gradient of L we start with the differential of L 53 d log det V 2 1 1 J 5 dL 2lyy XipYvj Xjdy j l 1 J 5 So yj Xj9 a V y X59 A 28 j l Combining A 28 with A 20 A 21 and A 17 we find J Eue V dVj X y VX jdy j l J 320 Vi de VT Z dG Z Vi yj X 7 55 J 3M tr V do Iw Z 40 Z7 p J i Xj V7 Xjdy 32 0 j X V7 do u X v J E Yu Xa Vj OVA y Xy qum v ll 1 uy 7 dc lv Zi d9 Z7 Xjy Vi X dy E v ll uj Xjy Vj y Xj7 de NI v II NI Ms IMs n oe ll tr Xj V ZINAA VI v X53 v i Il ATTN NI we Il a trv Je Sin ZIV 17 406 Me yj Xj Vj Xj dy v ll X NV y X5 do l Nm mnra NI Yu 1 ZV yj X 7 AV vj xa 46 So J ah dij XV x A 29 and 1 4 1 g J273 Deve 5 5 i XV yy Xj A 30 J 3 j l J M zvi wi Xy Zi Vy yj X5 A 31 j l and 1 B mE eu ion 2 2 U5Vj Zu 3 4 gt
10. and the grouped jackknife estimator of the bias bias is the average of the estimators bias j The grouped jackknife bias corrected estimator of is On biasy which is equal to 0 gn g 1 2 39 which is completely analogous to 2 37 The according grouped jackknife variance esti mator is also completely similar to the ungrouped jackknife case 2 38 It is given by 2 g ig Y o o 2 40 j l It is also possible to have g groups of possibly different sizes In this case let m be the size of group j and let 6 be the estimator reestimated from the sample from which group j was removed and let bias j be the accompanying estimator of the bias of On from 2 34 An unweighted bias estimator is now LA bias 5 bias jj ga The according bias corrected estimator of is 0 On bias p 9 m 9 x mi gy gt Sm g 1 D ES io 2 41 j l 9 which reduces to the standard grouped jackknife bias corrected_estimator if the group sizes are all equal The unweighted estimator of the variance of 0j is Ji IA 2 j l The formulas for the grouped jackknife estimators in the case that the group sizes are un equal are experimental The bias corrected estimator 2 41 should be relatively unbiased though possibly not optimally efficient It is unclear whether the variance estimator 2 42 is approximately correct More research is needed to shed light on these issues 19 2 8 2
11. bootstrap estimators and bootstrap estimators of the covariance matrix of the parameters are obtained in the usual way This is the parametric bootstrap that is implemented in MLA Note that this simulation option is also provided by ML3 Prosser et al 1991 It is also possible to derive a parametric bootstrap estimator in case the X and W variables are considered random This is analogous to 2 47 but it is not implemented in MLA For the nonparametric bootstrap several situations can be studied If the X and W variables can be considered fixed then analogously to regression analysis the errors have to be estimated As explained in section 2 5 the shrunken residuals 2 26 and 2 28 can be used as estimators of the Level 2 and Level 1 errors respectively A drawback of these errors may be that their variances are less than the variances in the population When however sample sizes at both levels increase this difference diminishes But alternatively the raw residuals 2 24 2 25 can be used instead of the shrunken residuals Unlike in regression analysis the estimated residuals in multilevel analysis do not necessarily have a zero mean Therefore the means are subtracted first Otherwise the possibly nonzero mean of the errors would necessarily lead to biased estimators of the constant Once centered estimates u j 1 J and amp j 1 J a 1 N of the errors are obtained nonparametric bootstrap samples u
12. can be viewed as an outcome variable e Gi gamma component i y These are the fixed parameters to be estimated in the multilevel model e Vi one of the variables from the data file as explained above In this case it is a Level 2 predictor variable It means that this variable is considered to have the same value for all Level 1 units within a particular Level 2 unit To be certain that this is the case for each Level 2 variable the average is computed over all Level 1 units within that particular Level 2 unit e Ui Level 2 random term i u the ith element of a typical u As with the first level this component is considered a residual or error term but now for the second level The second level may have more than one error term one for each Level 2 equation i e for each 8 element The variances and the covariances of these terms have to be estimated from the data Example MODEL Gi G2 V6 U1 random intercepts dependent on level 2 predictor G3 G4 V6 U2 random slopes dependent on the same level 2 predictor Bi B2 V5 E level 1 equation dependent on level 1 predictor w N nou ou In the equations each term is followed by a number except for the Level 1 random term E For the Vi term this number is the variable number the position of the variable in the data file e g V4 the fourth variable in the data file The other terms only use a number for identification without any additional me
13. different if z is regarded as a fixed design variable chosen by the experimentor This happens for example if z is the dose of some drug administered to 21 rats by the experimentor Then each bootstrap sample should have exactly the same zx values that is 7 x for each in each bootstrap sample The parametric bootstrap is in this case simply obtained by 2 47 with z7 z The nonparametric bootstrap is in this case however completely different from the nonparametric bootstrap with random x In this case first the errors are estimated from the original sample by amp y fiu 2 48 Then bootstrap samples e7 are drawn from 21 En and bootstrap samples of y are obtained analogously to 2 47 yf re 2 49 Then bootstrap estimates of the parameters and bootstrap estimates of the covariance matrix of the parameters are obtained in the usual way e g Efron 1982 pp 35 36 The jackknife can also be implemented straightforwardly in regression models One complete case is removed from the sample for each 8q for the ungrouped jackknife or a group of complete cases is removed for each 8 for the grouped jackknife The jackknife bias corrected estimators and the jackknife estimators of the covariance matrix of the parameters are obtained straightforwardly e g Efron 1982 pp 18 19 The bootstrap and jackknife methods discussed here for regression models are the standard
14. likelihood theory discussed so far is based on a few assumptions the most important of which are e The model i e the conditional expectation Xy and covariance matrix V is cor rectly specified The standard errors t values exceedance probabilities and likeli hood ratio tests were derived under the condition that at least the most general model that is being estimated is correct in the population e The Level 1 and Level 2 u random errors are normally distributed The likeli hood function was derived under this assumption and therefore the FIML estimators and the estimators of their standard errors depend on it e The sample size is large More specifically the properties of the maximum likeli hood estimators such as their consistency their asymptotic efficiency and their asymptotic normal distribution as well as the formulas for their standard errors were derived under the assumption that the sample size goes to infinity N oc 16 In practice these assumptions will not be completely satisfied One can only hope that they are met approximately To be able to get an indication of how severe the finite sample size and possible nonnormality influence the results the MLA program offers simulation options In this section the theory underlying these simulation options will be described This focus will be on the possible bias of the estimates and on the possibly incorrect standard errors More subtle informatio
15. of the Level 2 units Level 1 units are interchangeable within a Level 2 unit A Level 1 identifier variable is not necessary The variable number has to follow the keyword id2 and it must indicate the position of the identifier variable in the data file The variable number must be at least 1 and less than or equal to the number of variables indicated in the variables substatement 28 3 3 MODEL required The MODEL statement is followed by a set of equations that specify the model that has to be estimated Every equation must be on a single line There is only one Level 1 equation but there may be one or more Level 2 equations The order in which the Level 1 and Level 2 equations appear is arbitrary The terms used in the Level 1 equation are e Vi variable which is the i th variable in the data file Ve may be either indicating the outcome variable or a predictor variable e Bi beta component i ij the ith element of a typical 8 cf equation 2 1 At Level 1 these are the regression coefficients that seem to be outcome variables at Level 2 cf equation 2 2 e E the Level 1 random term This term is considered to be a residual or error term The variance of this term has to be estimated from the data The Level 2 equations partly consist of the same terms but also of specific Level 2 equa tion terms e Bi beta component i corresponding with the Level 1 regression coefficient At this level however B
16. samples will be identical The seed value must be an integer between 1 and 1 073 735 823 2 1 If results from bootstrap analyses are to be reported it is advised to save the seeds 3 5 7 file optional Results of the simulation analysis can be written to a file Using the substatement file a filename may be specified Filenames must satisfy the ususal DOS conventions on file names For each replication the following results are written to the file in ascii space separated 1 global information e replication number 32 e seed e number of iterations until convergence e the minimum of the 2 log likelihood function 2 estimation results pairs containing e estimate e standard error of each parameter The parameters are in the following order o2 y1 Yp O11 O21 O22 O31 Oq4 where p is the dimension of y and q is the dimension of each 6 The estimation results are thus repeated replications times and displayed with a maximum of eight values per line four estimates and their corresponding standard errors The results of the simulation analysis are used to compute the final bootstrap and jackknife estimates The results of a replication are not taken into account when the algorithm did not converge or when the estimate or its standard error was fixed to zero because it reached the edge of its parameter space Further elaboration concerning this subject can be found both in the previous and i
17. x AV vj X57 yj XDZ 1 J hx ziv uj Xj ziv uy x j l 1 J 5 gt ZV i Xin Xv zj j l x 46 ZI V Zj 1 J 523 V X dye ZiV i Xi j l 1 J D Z i Xie ZV X done A47 j l 64 Combining A 47 with A 26 we have 8 L iJ te DOr OO wn 3 LBV Zi x n Xi Xy Vi Z 1 J US yi Xy Xm V Zi x lnc Xiy y XVZ As A 48 because the matrices between brackets and parentheses in A 48 are symmetric Now from A 33 and A 34 we have to take expectations of the second derivatives Therefore from A 4 and A 5 the following expectations will be used E y Xjy 20 Ely Xjy yj X y Vj From A 36 A 37 A 38 and A 39 we have E zs SX A 49 Oyo 3 7 7 PL PL rss 0 A51 and PL From A 41 A 42 and A 43 we have OL 1 E za g2 UV E ui Xj Vi y X59 j l j l 1 J J 5 Luv DEt V y XN u X57 j j 1 32 uVj A 53 J E ZV yy X57 yj XjYv Z t ZIV i Xjy uj NIV Zi J UV Zi A 54 and 1 5 ZV Zi A 55 From A 45 A 46 and A 48 we have L Je 06 00 M uvy Z5 ku ZjVj Z ut j 1 tZ ZI ju J vat zov zou j l E ZVI Za Zu ZV IZ Jul ZV Zi ke ZV AZ Vi a u J j l TOP A 56 and analogously PPL Jed wa EJE X ZV Zi V Zi dua A 57 and PL 14 t Jonoa 3 A oA BAN A 58 66 As with t
18. 05 ji ao lox ji variance a Woy i N fi 3 skewness fia L 2 J N ji 4 kurtosis Ha ls x M 3 where X is the measurement of individual 7 on a typical variable X and N is the total sample size Another descriptive statistic that is provided is the Kolmogorov Smirnov Z statistic This is a measure of deviation from the normal distribution It tests whether the observed variable has a normal distribution It is defined as the maximum distance between the estimated empirical cumulative distribution function and the best fitting cumulative normal distribution function It is computed as follows Stephens 1974 First sort the values of a given variable X such that X is the smallest value and Xy is the largest Then compute w X 8 i 1 N and z amp w where P is the cumulative distribution function of the standard normal distribution Now Kolmogorov Smirnov s Z is defined as i 1 i Z PA max 2 ED The asymptotic distribution of Z was derived by Durbin 1973 but it is too complicated to be implemented in MLA it requires numerical integration and Fourier transformation Stephens 1974 however provides a table of critical values of a transformed statistic VN 0 01 0 85 VN Z which can be used to obtain a range of probability levels p values indicating the significance of the deviation from normality In MLA p values 11 are reported that are based o
19. 17027 5 892678 E 2 88 980063 9 458474 E 1 ONE STEP ESTIMATE OF SIGMA SQUARED IGNORING GROUPING E 2 TWO STEP ESTIMATE OF SIGMA SQUARED SEE DOCUMENTATION FOR FURTHER ELABORATION ON THESE SUBJECTS The parameter estimate for the regression coefficient of the covariate is also added to the FIML output part The additional T value and PROB T indicate that the pretest variable explains a significant part of the variance of the posttest variable T 10 18 PROB T 0 0000 FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE T PROB T Gi 16 196937 2 226470 7 27 0 0000 G2 0 699891 0 068761 10 18 0 0000 RANDOM PARAMETERS LABEL ESTIMATE SE T PROB T U1 U1 6 766701 6 759616 1 00 0 3168 E 89 831188 9 576026 9 38 0 0000 INTRA CLASS CORRELATION 6 7667 89 8312 6 7667 0 0701 42 CONVERGENCE CRITERION REACHED ITERATIONS 2 LUG L 7 1318 217264 Entering the covariate into the analysis is justified because it has a statistically signif icant non zero effect The same justification could be made with the use of the likelihood ratio test This test is based on the fact that the difference between minus two times the loglikelihood function value 2 LOG L of two nested models follows a chi square distribution with the number of degrees of freedom equal to the difference in the num ber of free parameters The two models Example 1 and Example 2 are nested and the likelihood ratio te
20. 360 Raudenbush S W 1988 Educational applications of hierarchical linear models A review Journal of Educational Statistics 13 85 116 SAS Institute 1992 SAS STAT software Changes and enhancements Release 6 07 SAS Technical Report No P 229 Cary NC Author Schluchter M D 1988 BMDP5V unbalanced repeated measures models with struc tured covariance matrices Tech Rep No 86 Los Angeles BMDP Statistical 16 Software Stephens M A 1974 EDF statistics for goodness of fit Journal of the American Statistical Association 69 730 131 Stevens J P 1990 Intermediate statistics A modern approach Hillsdale NJ Lawrence Erlbaum Strenio J F Weisberg H IL amp Bryk A S 1983 Empirical Bayes estimation of individual growth curve parameters and their relationship to covariates Biometrics 39 71 86 Tukey J W 1958 Bias and confidence in not quite large samples Annals of Mathe matical Statistics 29 614 Van Der Leeden R amp Busing F M T A 1994 First iteration versus final IGLS RIGLS estimates in two level models A Monte Carlo study with ML3 Tech Rep No PRM 02 94 Leiden The Netherlands Leiden University Department of Psychometrics and Research Methodology Wu C F J 1986 Jackknife bootstrap and other resampling methods in regression analysis with discussion Annals of Statistics 14 1261 1350 77
21. 37 10 92 119 25 0 72 0 16 1 45 0 03 39 3 31 02 12 89 166 19 0 07 1 13 1 37 0 05 VAR MINIMUM P5 Qi MEDIAN Q3 P95 MAXIMUM 1 1 00 1 00 1 00 2 00 3 00 3 00 3 00 2 4 00 7 00 13 50 19 00 28 00 44 00 52 00 3 0 00 10 00 20 00 31 00 42 50 51 00 54 00 The first variable is the Level 2 identifier variable The second and third variables are the score on the pretest and the posttest respectivily Formulas can be found in Section 2 2 Part 5 gives OLS estimates This part is also optional The user must supply the keyword olsquares in the OUTPUT statement As described in Chapter 2 ordinary least squares estimation yields two different estimates for the Level 1 variance component c one by ignoring the hierarchical data structure and one using this structure These are both displayed in Part 5 of the output The one step estimate is labeled E 1 and the two step estimate is labeled E 2 U1 U1 gives the variance estimate for the Level 2 variance component U1 URDINARY LEAST SQUARES ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE G1 31 016760 0 963540 RANDOM PARAMETERS LABEL ESTIMATE SE E 1 166 185111 17 615587 U1 U1 29 469076 24 061400 EC2 136 503030 14 469292 E 1 ONE STEP ESTIMATE OF SIGMA SQUARED IGNORING GROUPING E 2 TWO STEP ESTIMATE OF SIGMA SQUARED SEE DOCUMENTATION FOR FURTHER ELABORATION ON THESE SUBJECTS As can be seen the overall mean G1 equals the mean of Variable 3 the score on the posttest 31 02 Ignori
22. 6 0 6335839 1 1931032 3 1749 6156005 0 0572362 0 7851259 4 1749 6110404 0 1167412 O 7 776501 5 1749 4998336 0 4502308 0 7366149 6 1749 4862306 0 0849216 0 7000794 7 1749 4491604 0 1256636 0 6790682 8 1749 4462601 0 1138294 0 0593836 9 1749 4441677 0 1133145 0 0583135 10 1749 4439035 0 0301898 0 0290881 11 1749 4439029 0 0014868 0 0020701 12 1749 4439027 0 0046646 0 0002624 13 1749 4439026 0 0005529 0 0002900 14 1749 4439026 0 0000003 0 0000056 CONVERGENCE CRITERION REACHED NORM dP LENGTH OF DIFFERENCE BETWEEN SUCCESSIVE PARAMETER VECTORS NORM G LENGTH OF GRADIENT VECTOR SEE DOCUMENTATION FOR FURTHER ELABORATION ON THESE SUBJECTS The following part gives the FIML estimates FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE T PROB T Gi 59 098244 6 547975 9 03 0 0000 G2 15 827270 6 925261 2 29 0 0223 G3 1 108726 4 648499 0 24 0 8115 G4 0 922201 4 916968 0 19 0 8512 RANDOM PARAMETERS 46 LABEL U1 U1i U2 U1 U2 U2 E ESTIMATE 39 862888 28 697577 21 390296 42 782546 SE 20 314503 14 153063 10 257549 3 902927 10 T 96 03 09 96 PROB T 0 0497 0 0426 0 0370 0 0000 INTRA CLASS CORRELATION 39 8629 42 7825 39 8629 0 4823 As can be concluded from the output the interaction term G4 is not significant A model without this term might be preferred because it is a more parsimonious model It does however not alter the significant negativ
23. 7 281 312 Magnus J R amp Neudecker H 1985 Matrix differential calculus with applications to simple Hadamard and Kronecker products Journal of Mathematical Psychology 29 474 492 Magnus J R amp Neudecker H 1988 Matrix differential calculus with applications in statistics and econometrics Chichester Wiley Markus M T 1994 Bootstrap confidence regions in nonlinear multivariate analysis Leiden DSWO Press Mason W M Wong G M amp Entwistle B 1983 Contextual analysis through the multilevel linear model In S Leinhardt Ed Sociological methodology pp 12 103 San Francisco Jossey Bass Mood A M Graybill F A amp Boes D C 1974 Introduction to the theory of statistics 3rd ed Singapore McGraw Hill Press W H Flannery B P Teukolsky S A amp Vetterling W T 1986 Numerical recipes The art of scientific computing Cambridge UK Cambridge University Press Prosser R Rasbash J amp Goldstein H 1991 ML3 Software for three level analysis User s guide for V 2 London University of London Institute of Education Putter H 1994 Consistency of resampling methods Unpublished doctoral dissertation Leiden University Leiden The Netherlands Quenouille M H 1949 Approximate tests of correlation in time series Journal of the Royal Statistical Society B 11 18 84 Quenouille M H 1956 Notes on bias in estimation Biometrika 43 353
24. 8 4 73 0 0000 G3 2 967709 10 597305 0 28 0 7794 G4 0 146866 0 065072 2 26 0 0240 RANDOM PARAMETERS LABEL ESTIMATE SE T PROB T U1 U1 18 585604 17 183843 1 08 0 2794 U2 U1 6 598562 6 911578 0 95 0 3397 U2 U2 2 580007 5 765943 0 45 0 6545 E 91 746665 23 688887 3 87 0 0001 INTRA CLASS CORRELATION 18 5856 91 7467 18 5856 0 1685 CONVERGENCE CRITERION REACHED 44 ITERATIONS 2 LUG L 10 376 262296 The posterior means may be compared with the Level 2 outcomes As can be seen the posterior means tend to be shrunken towards the grand mean and therefore have less variance than the Level 2 outcomes POSTERIOR MEANS a H B1 111 2096 123 6838 118 5373 103 2717 102 5805 5713 92 8730 109 4713 96 1403 115 4610 o0 0 100450 Wa e m 1800 fo E zl H gt 4 m o A B2 28 0714 31 7001 31 3653 26 1328 25 9444 8216 23 7186 23 1345 26 1874 25 5239 oO 0 0 100450 tn N e zl gt N o 7600 4 4 Multilevel analysis In 1988 the National Center for Education Statistics of the U S Department of Education collected data on amount of homework done and scores on math tests from students of more than 1000 schools The subset from this National Education Longitudinal Study NELS of 1988 data used for this example consists of ten manually selected schools containing 260 students from Public coded 1 and Private coded 0 schools These data were also u
25. 9 2 1 The general two level model 2 2 Descriptive statistics 11 2 3 Ordinary Least Squares 12 24 Maximum Likelihood methods 14 2 5 Residuals 14 2 6 Posterior means 15 2 7 Diagnostics 16 2 8 Simulation 16 2 9 Missing data 24 3 Input 27 3 1 TITLE optional 27 3 2 DATA reguired 28 3 3 MODEL required 29 3 4 CONSTRAINTS optional 30 3 5 SIMULATION optional 30 3 6 TECHNICAL optional 33 3 7 OUTPUT optional 35 4 Output 37 4 1 Analysis of variance 3T 4 2 Analysis of covariance 41 4 3 Repeated measures analysis 43 4 4 Multilevel analysis 45 4 5 Simulation study 4T A Technical Appendix 49 A 1 The model and the likelihood function 49 A 2 Someusefulformulas 50 A 3 Computational formulas for the function and gradient 54 A 4 The
26. E B1 T PROB T 1 5 111 4000 4 9044 22 71 0 0000 2 5 120 2000 2 9967 40 11 0 0000 3 5 119 8000 6 7621 17 72 0 0000 4 5 103 4000 3 8018 27 20 0 0000 43 5 5 100 0000 3 2701 30 58 0 0000 6 5 99 0000 4 4505 22 24 0 0000 7 5 93 0000 5 5281 16 82 0 0000 8 5 113 6000 1 6391 69 31 0 0000 9 5 90 4000 4 5284 19 96 0 0000 10 5 121 0000 2 4549 49 29 0 0000 MEAN 107 1800 UNIT SIZE B2 SE B2 T PROB T 1 5 28 8000 3 4679 8 30 0 0000 2 5 28 1000 2 1190 13 26 0 0000 3 5 36 3000 4 7816 7 59 0 0000 4 5 27 2000 2 6882 10 12 0 0000 5 5 23 4000 2 3123 10 12 0 0000 6 5 29 3000 3 1470 9 31 0 0000 7 5 25 6000 3 9090 6 55 0 0000 8 5 19 7000 1 1590 17 00 0 0000 9 5 23 6000 3 2021 7 37 0 0000 10 5 25 6000 1 7359 14 75 0 0000 MEAN 26 7600 UNIT SIZE SIGMA2 SE SIGMA2 T PROB T 1 5 120 2667 98 1973 1 22 0 2207 2 5 44 9000 36 6607 1 22 0 2207 3 5 228 6333 186 6783 1 22 0 2207 4 5 72 2667 59 0055 1 22 0 2207 5 5 53 4667 43 6554 1 22 0 2207 6 5 99 0333 80 8604 1 22 0 2207 7 5 152 8000 124 7607 1 22 0 2207 8 5 13 4333 10 9683 1 22 0 2207 9 5 102 5333 83 7181 1 22 0 2207 10 5 30 1333 24 6038 1 22 0 2207 MEAN 91 7467 In the next part we can see that both G2 and G4 indicate that the mother s weight has a positive effect on the rat s weight The rat s weight starts higher and rises faster with a heavier mother FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE T PROB T G1 18 873660 18 784897 1 00 0 3150 G2 0 545101 0 11534
27. Leeden 1994 For extensive discussions on theory and application of multilevel analysis we refer to the textbooks by Goldstein 1987 Bryk and Raudenbush 1992 and Longford 1993 Small example A small imaginary example from education may clarify what is meant by a multilevel model Suppose we have data of students nested within schools and we want to predict the score on a math test from the amount of time spend on doing math homework Fur thermore we expect smaller schools to be more effective than larger ones so we collect the school size as another variable Clearly at the student level math is the dependent variable and homework is the predictor variable At the school level size is the predic tor variable Now the multilevel model for this example in this case a two level model is specified as follows At the student level Level 1 for each school a regression model is formulated with math as the dependent variable and homework as the predictor This reflects the intra class dependency of the observations the students within each school All models contain the same variables but we expect them to yield different intercept and slope estimates within each school At the school level Level 2 a regression model is formulated in which the intercepts and slopes of the first level models are dependent variables predicted by the second level variable size This reflects the possible effect of sch
28. MLA software for MultiLevel Analysis of Data with Two Levels User s Guide for Version 1 0b Frank M T A Busing Erik Meijer Rien van der Leeden December 1994 Frank M T A Busing is Research Associate in the Department of Psychometrics and Research Methodology at Leiden University P O Box 9555 2300 RB Leiden The Netherlands Erik Meijer is Graduate Student in the Department of Psychometrics and Research Method ology at Leiden University Rien van der Leeden is Assistant Professor in the Department of Psychometrics and Research Methodology at Leiden University Preface This manual describes MLA Version 1 0b a computer program developed for multilevel analysis of data with two levels The MLA program can be characterized by four major properties e User friendly interface e Extensive options for simulation in particular three options for bootstrapping mul tilevel models e Simple estimation methods providing an alternative for the complex iterative esti mation procedures that are commonly used to estimate the parameters of multilevel models e A fast algorithm using the Broyden Fletcher Goldfarb Shanno optimization method to obtain maximum likelihood estimates of all model parameters The MLA program runs as a stand alone batch program on 286 386 and 486 based personal computers under DOS It uses simple ASCII text files as input and output The program is easy to use by means of a number of
29. ON REACHED ITERATIONS 2 LOG L 9 5527 578950 After 200 bootstrap replications there are no replications that are incorrect i e with inadmissible parameter values or non convergence The final bootstrap estimates that were computed are given below BOOTSTRAP ESTIMATES REPLICATIONS 200 CORRECT REPLICATIONS 200 FIXED PARAMETERS LABEL ESTIMATE SE Gi 15 254733 0 044857 G2 0 592006 0 007469 G3 1 271495 0 028049 RANDOM PARAMETERS LABEL ESTIMATE SE U1 U1 4 013912 0 300021 E 27 954714 0 385577 This bootstrap simulation took about half a minute on a 486 DX2 66MHz This amounts to 6 replications per second This means that there do not have to be any obstacles for using a high number of replications Literature suggests many different numbers ranging from 100 replications to 1000 replications taking 17 seconds to 3 minutes computer time 48 Appendix A Technical Appendix In this appendix the theory of maximum likelihood estimation used in the MLA program will be discussed in detail Other authors such as Bryk and Raudenbush 1992 and Longford 1987 give some technical detail as well but much is left to the reader We think however that it is useful to explain in much more detail what is actually done in the program and this appendix serves this purpose In this appendix the minus log likelihood function and its gradient function are derived as well as computationally more efficient formulas of them Th
30. The bootstrap The bootstrap was introduced by Efron 1979 as an alternative to the jackknife The idea of the bootstrap is that the empirical distribution function is a consistent estimator of the distribution function in the population Let Z be a random variable with distribution function F and let 21 22 zw be a random sample of size N from F Now the empirical distribution function Fy in some point z is the proportion of z that are smaller than or equal to z N If Z has a multivariate distribution this formula has an obvious generalization and all subsequent formulas will also have obvious generalizations It is known e g Mood et al 1974 p 507 that as N oo Fy z F z Let 0 be a parameter associated with the distribution F 0 06 F and let 0 be an estimator of 0 from a sample 6 0 z1 23 N Fx The idea of the bootstrap is now to simulate the sampling and estimation process where samples are drawn from Fy which is completely known once the original sample is obtained In the simulation the distribution Fy plays the role of F and 6 plays the role of 0 Simulation samples 21 25 o ZNJ are drawn from Fy and 6 is estimated by 9 in the same way 0 was estimated by 8 Because Fy F it is assumed that the properties of the estimator 0 based on the distribution Fy give information about the properties of 0 based on the distribution F For example the bias of 0 based on the distribution Fy is ta
31. ZV yj Xj A 32 j l j l Now using A 9 A 12 A 13 and A 15 we find computationally more efficient formulas for the derivatives J Xi c In 0 ZOG7 Zi y X59 lie e J 07 XIZjOGT Ziy Zi X y Jj j l OL 1 J 1 1 yr 2 ad Lt 5 uly Xim vj uj X jy Il l q b z ww u NI q b e 3 J So yj Xi o Iu o Z jOGT ZV yj X59 1 39 N Jq ies 1 l Qa bo Ms yj Xjy Vj yj X59 v ll NI q L Ms v ll yj X53 ZjOG Z Vi yj X57 1 39 N Jq Uwe 1 l Qa bo Ms yj Xy e PIN o ZjOG Zi yj X53 v ll NI q L Ms v ll yj X5 Z 0G o 7G Zi y X57 57 and unit The symmetric matrices may be stored linearly thereby saving additional memory A 4 The asymptotic covariance matrix of the estimators The asymptotic distribution of the maximum likelihood estimators under appropriate 1 57 N nq E Luc 1 nu 2y Dau ta Sx J J lol 5 M Zig Z Xj OGF Ziy Z X53 j l J 1 5 M Ziy ZX 57 OG5 Ziy Z X53 j l J 1 1 57 N nq 3 2G lo J J J 39 Ip 27 Sx ta Sx J 1 J 50 Y Zi ZX yeu GF GF y ZIX S j l OL iJ 2 4 1 nl 1 2 4 4 yh 2 jo 22 0 G Zi 5 2l G Ziy ZX j l j l These formulas a
32. alues must be specified as floating point numbers Covariances are specified by connect ing the appropriate Level 2 residual terms by an asterisk Example CONSTRAINTS Ui U2 0 0 fix level 2 covariance U1 U2 to 0 0 3 5 SIMULATION optional Several options for simulation are available in MLA These include the jackknife and three versions of the bootstrap Efron 1982 Theoretical details concerning the implementation of these resampling methods for the two level model can be found in Chapter 2 With the substatements provided with the SIMULATION statement one can choose between the different kinds of simulation using the keyword kind and specify special simulation features using the keywords method type and resample Additional features are the number of replications and the initial seed for the random number generator replications and seed Finally one can specify a separate output file for intermediate results of the simulation file 1041245 start with random seed 1041245 boot out write simulation results to boot out see fil Example SIMULATION kin bootstrap use simulation method bootstrap met error 4 resample from error vectors typ raw use raw residuals as error vectors res 1 only resample level 1 units rep 200 repeat simulation 200 times 30 3 5 1 kind required With this substatement the user can choose from two options namely bootstrap and jackknife simulation Both types of
33. aning e g G3 one of the fixed parameters The Bi terms have meaning in the equations of both levels Every equation consists of one term before and at least one term after the equals sign The minimal specification of a model is Note that this feature may be used to create an aggregated Level 1 variable serving as a Level 2 predictor variable simply by specifying a Level 1 variable as a Level 2 variable as well 29 Bi Gi fixed intercept V4 B1 E level 1 variation Or MODEL Bi Ui random intercept V4 B1 E level 1 variation As shown above terms on the right hand side of the equations are connected by plus signs A variable and a corresponding parameter are connected by an asterisk This is used to connect a fixed parameter and an observed predictor variable in Level 2 equations and to connect a Level 1 regression coefficient and an observed predictor variable in the Level 1 equation In Chapter 4 several variations of the two level model will be presented and discussed in more detail 3 4 CONSTRAINTS optional MLA has a limited option for imposing parameter constraints Parameters to be estimated may be constrained to a certain value Constraints are imposed as parameter value This feature is only implemented for the FIML estimation part It is simply ignored for the various OLS estimators Example CONSTRAINTS Gi 1 0 fix component Gi to 1 0 Ui 0 5 fix level 2 variance of Ui to 0 5 V
34. asons why it may be useful to consider the school specific coeffi cients as random First the schools in the data set are usually a random sample from the population of schools and scientists are usually interested in the population rather than the specific data set Second with a model that explains part of the variation in the random coefficients the effect of the school level variables on the student level relation ships can be assessed and in particular the model can give guidance to schools that want to improve their effectiveness Third the relationships between the outcome variable and the student level predictors become clearer Between school variation that may blur these relationships is accounted for and consequently the estimates of the average coefficients are more precise School specific estimates of intercept and slope can however be obtained This will be discussed below under the heading of Random Level 1 coefficients Cross level interaction If aschool level predictor variable like size is added to the Level 2 model in our imaginary example means and variances change to conditional means and variances It means that part of the variance of intercepts and slopes among schools is explained by size The contribution of this school level variable introduces a term to the model that specifies a relationship between both levels The relationship between size at the school level and the slope coefficie
35. asymptotic covariance matrix of the estimators 58 A 5 Reparametrization 68 B Read Me 71 ili References iv 75 Chapter 1 Introduction 1 1 Introduction to multilevel analysis Multilevel analysis comprises a set of techniques that explicitly take into account the hier archical structure in the data In this section a brief introduction to the underlying ideas of multilevel analysis is given Several relevant topics such as hierachical data structures intra class correlation the formulation of a multilevel model and the estimation of the model parameters are discussed This introduction does not contain formulas Chap ter 2 will discuss the main formulas and the Technical Appendix will give supplementary mathematical details Hierarchical data Hierarchically structured data arise in a variety of research areas Such data are character ized by so called nested membership relations among the units of observation Classical examples of hierarchically structured data are found in educational research where for instance students are nested within classes and classes are nested within schools But in many other instances in the social and behavioral sciences as well as in many other fields of science data are also hierarchically structured For instance in clinical psychology clients can be nested within therapy groups people can be nested within families and so forth A so
36. bles most of them never measured and thus omitted from any possible model Hence if we fit a common single level model to such data intra class dependency has an effect on the error terms It causes the error terms to be correlated The result is that the usual assumption of independent observations is violated if the nested structure of the data is ignored The degree of intra class dependency is reflected in the intra class correlation Obviously this idea of intra class dependency applies to every hierarchical data set Their intra class correlations however may differ substantially Multilevel models For the analysis of hierarchical data hierarchical models or multilevel models have been developed Such models can be conceived as linear regression models specified separately for each level of the hierarchy but statistically connected Since each level of the hierarchy has its own regression equation predictor variables measured at either level can be included in the appropriate level model Because hierarchical data structures frequently arise in social and behavioral science research but also in many other scientific areas the application and development of mul tilevel analysis has in the last decade drawn a lot of attention from numerous researchers Below a brief introduction of some relevant topics concerning multilevel models will be given A more comprehensive introduction of these topics is given by Kreft and Van Der
37. ciological example is given by a study concerning employees nested within industries It should be noted that nested structures naturally arise where explicit hierarchical sampling schemes are used This is often the case in large scale educational research where for instance a set of schools is sampled first followed by the sampling of a set of students within these schools However there are many other cases where data are not explicitly sampled in that way but where it appears to be a fruitful approach to treat them as having a hierarchical structure For instance in a medical study one could consider it to be important that patients can be viewed as nested within general practitioners Apart from this there are several types of data for which it proves to be very useful to apply the concept of hierarchy because it makes their analysis more easy and transparent One example is the hierarchical treatment of repeated measures data where measurements at different points in time are considered nested within individuals Another example is the analysis of data from meta analysis where say p values can be treated as being nested within studies providing a partial solution for the problem of comparing apples with oranges With hierarchical data it is common to have information obtained from the different levels in the hierarchy For instance one has variables describing the individual students but also variables describing their schools Wh
38. component estimation and to related problems Journal of the American Statistical Association 72 320 340 15 Kreft I G G 1994 Are multilevel techniques necessary An attempt at demystifi cation Unpublished manuscript California State University School of Education Los Angeles Kreft I G G amp De Leeuw J 1991 Model based ranking of schools International Journal of Educational Research 15 45 59 Kreft I G G De Leeuw J amp Van Der Leeden R 1994 A review of five multi level analysis programs BMDP 5V GENMOD HLM ML3 VARCL American Statistician 48 324 335 Kreft I G G amp Van Der Leeden R 1994 Random coefficient linear regression models Tech Rep No PRM 03 94 Leiden The Netherlands Leiden University Department of Psychometrics and Research Methodology Longford N T 1987 A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects Biometrika 74 817 827 Longford N T 1990 VARCL Software for variance component analysis of data with nested random effects maximum likelihood Princeton NJ Educational Testing Service Longford N T 1993 Random coefficient models Oxford Clarendon Press Maddala G S 1977 Econometrics Singapore McGraw Hill Magnus J R 1978 Maximum likelihood estimation of the GLS model with unknown parameters in the disturbance covariance matrix Journal of Econometrics
39. d a bootstrap sample is created by resampling the original data Thus complete cases are randomly drawn with replacement from the original cases The procedure follows the nested structure in the data by a nested resampling of cases Level 2 units are randomly drawn with replacement and cases within a particular drawn unit are resampled It is also possible to resample only complete Level 2 units where the Level 1 units within a sampled Level 2 units are the same as in the original data set which is useful for repeated measures data or to resample only Level 1 units within Level 2 units where the Level 2 units are the same as in the original sample but the Level 1 units within each Level 2 units are resampled useful when there are few Level 2 units and many Level 1 units in each Level 2 unit such in studies with many subjects from a few countries 31 parametric This method computes a new outcome or dependent variable using the original predictor variables their corresponding FIML parameter estimates and a set of random Level 1 and Level 2 error terms These terms are obtained as follows New Level 1 errors are drawn from a normal distribution with mean zero and variance 8 which is the FIML estimate of the Level 1 variance component New Level 2 errors are drawn from a multivariate normal distribution with zero mean vector and covariance matrix 6 which contains the FIML estimates of the Level 2 variance components 3 5 3 type
40. e asymptotic covariance matrix of the maximum likelihood estimators and computationally efficient formulas for it are derived and the explicit imposition of implicit constraints in the model is discussed A 1 The model and the likelihood function To find maximum likelihood estimates we start with the model 2 4 yj Xjy t Zjuj e A 1 j N 0 c Iu A 2 uj N 0 0 A 3 where y is a vector with the endogenous variable for the N Level 1 units in Level 2 unit J X is an N x p matrix of exogenous variables for the Level 1 units in Level 2 unit 7 and Z is an N x q matrix of exogenous variables for the Level 1 units in Level 2 unit 7 The p vector y is a vector of fixed regression coefficients the q vector u is a vector of random regression coefficients in Level 2 unit 7 and the N vector is a vector of residuals of the Level 1 units in Level 2 unit 7 It is assumed that and u are independent of each other and independent of e and uj where j Z j From the model equations A 1 A 3 it is found that conditional on X and Z y is normally distributed and the expectation and covariance matrix of y are Ey X A 4 V E y Xp yi Xin 07 In ZjOZ A 5 Consequently the probability density of y is 1 1 fy i j a lui Xa V w Xjm f 22 Nj det yat DUE PHOT 49 so that the contribution of Level 2 unit 7 to the minus log likelihood function is Lj log f y N 1 1 r 1 3
41. e effect of the school type 4 5 Simulation study In the Junior School Project Inner London Education Authority 1987 the following variables were collected Mathematics Achievement in Years 1 through 3 an ability mea sure score on the Ravens test in Year 1 and sex There are 48 classes present from 36 different schools with a total of 887 children The input file shown below represents a bootstrap study with resampling from the shrunken residuals Resampling from both levels is used with 200 replications Together with the other input the input file is as follows TITLE MLA example 5 simulation study DATA file jsp dat vars 7 id2 1 MODEL bi gi ui b2 g2 b3 g3 v5 bi b2 v4 b3 v3 e SIMULATION kind bootstrap method cases resample 1 replications 200 seed 1 END The FIML estimates are given below In fact this model is an analysis of covariance with two covariates at the first level G2 and G3 Thus only one random estimate for the second level is specified U1 FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE T PROB T Gi 15 251835 0 896721 17 01 0 0000 G2 0 592560 0 032978 17 97 0 0000 G3 1 272573 0 443152 2 87 0 0041 RANDOM PARAMETERS LABEL ESTIMATE SE T PROB T AT U1 U1i 4 049940 1 184081 3 42 0 0006 E 27 852020 1 359013 20 49 0 0000 INTRA CLASS CORRELATION 4 0499 27 8520 4 0499 0 1269 CONVERGENCE CRITERI
42. e elapsed time is in STR seconds HH MM SS HH The program is terminated correctly in about a quarter of a second as can be seen in the seventh and final default part of the output SYSTEM INFORMATION START FINISH ELAPSED DATE 19 12 1994 19 12 1994 TIME 12 52 47 12 52 47 00 00 00 16 PROGRAM TERMINATED CORRECTLY 4 2 Analysis of covariance For the next example the same Sesame Street data set is used Now an analysis of covariance is performed on these data with MLA The model to be estimated is Yij Va V2Xij Uj Eijs 4 4 where 4 is the overall mean X is the covariate u is the Level 2 error component and is the Level 1 error component Equation 4 4 can be divided into separate equations one equation for Level 1 and in this case two Level 2 equations Vig Bij Bag Xig Eijs Pij V1 Uj Baj 72 Along with the other statements the input file is as follows 41 TITLE MLA example 2 analysis of covariance DATA file sesame dat vars 3 id2 1 MODEL bi gi ui b2 g2 v3 bi b2 v2 e OUTPUT olsq END Compared to the previous example a fixed parameter G2 is added in the OLS estimates part This is the regression coefficient of the Level 1 covariate containing the pretest score ORDINARY LEAST SQUARES ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE G1 14 672451 1 621040 G2 0 764871 0 067590 RANDOM PARAMETERS LABEL ESTIMATE SE EC1 96 968087 10 307591 U1 U1 7 2
43. e output of MLA as will become clear later on In section 2 2 the computational formulas are presented for the descriptive statistics The next section 2 3 discusses various forms of ordinary least squares estimation namely OLS estimates for each group separately section 2 3 1 and one step and two step OLS for the fixed and random parameters of the total two level model sections 2 3 2 and 2 3 3 10 respectively Maximum likelihood estimation is dealt with in the next section 2 4 subdivided into subsections about full information maximum likelihood section 2 4 1 and restricted maximum likelihood section 2 4 2 An extensive elaboration on the subjects of maximum likelihood estimation will follow in Appendix A In section 2 5 several types of residuals will be discussed namely total residuals raw residuals and shrunken residuals Section 2 6 will introduce the posterior means and section 2 7 will discuss diagnostics The theory behind the simulation options in MLA is described in section 2 8 Finally in section 2 9 some remarks will be made about missing data 2 2 Descriptive statistics MLA produces if asked for the following descriptive statistics mean standard deviation variance skewness and kurtosis Any statistical package will produce these statistics as well Before looking at the other output it may be useful to inspect these statistics Their formulas are N mean i N PS M ji 1 ji standard deviation si 2
44. e step OLS estimate of the variance of the residuals The usual standard errors for 7 and G are respectively se Vl X X Tu 2 15 2 2 r N p e 02 8 2 16 2 3 3 Two step OLS total model With the two step OLS the same estimates 7 are used as with the one step OLS see 2 13 The total residuals for every group j can be divided into a Level 2 and a Level 1 part This was already done in Equation 2 11 Using ordinary least squares estimates for the Level 2 random components u can be obtained by Uj ZZ ZF 2 17 The estimate for the covariance matrix O of u becomes J j l 1 2 18 wo The estimated covariances of the elements of can be obtained by Anderson 1958 p 161 EON Oui Omn Oti Ow Ox Oi J 2 19 Consequently the estimated standard errors of the elements of are given by Se Oui 4 Ox Oq 62 J 2 20 By first computing the residuals r Z 2 21 the estimate for c2 becomes a Lys 2 22 t 1 This estimate 82 is the two step OLS estimate of the variance of the elements of e The estimated standard error for 82 becomes analogous to Equation 2 16 2 23 All the estimators in this section are consistent if J oo and N oo for each 7 Although this may be unrealistic these estimators may be good initial estimators start ing values for maximum likelihood estimators In some cases the differences between these esti
45. ed namely all N observations Now call the estimator Ona obtained by removing observation 4 from the sample bii The corresponding estimator of the bias is called bias i and the corresponding bias corrected estimator is called 07 Now a more precise estimator of the bias can be obtained by averaging the different estimators of the bias bias Xo 2 36 The corresponding bias corrected estimator of is 0 bn bias Noy N 1 6 2 37 where A is the average of the estimators bi ZA Y y N The estimator 8 is called the ungrouped jackknife bias corrected estimator of and bias is called the ungrouped jackknife bias estimator 18 Tukey 1958 proposed to use the estimators YA to obtain an estimator of the variance of the estimator Oy Its formula is 6j D fo 4 2 38 Although it was originally an estimator of the variance of On and Efron 1982 p 13 states that it is a better estimator of the variance of On than of the variance of 8j it can also be used as an estimator of the variance of Oy The standard error of 8 i is then estimated by ya If m gt 1 the sample can be divided into g mutually exclusive groups of size m where g N m Of course this is only possible when g is an integer Now call the estimator based on the total sample from which group j is removed 2n and the according bias estimator 2 34 bias j The average of the estimators c GU 1 g is called A
46. elihood function FUTURE PLANS check on equations large sample summary statistics implementation of weights restricted maximum likelihood estimation other resampling methods fit measures oo 009 08 0 DISTRIBUTION You can contact the authors by writing to the following address Leiden University Faculty of Social and Behavioural Sciences Department of Psychometrics and Research Methodology Wassenaarseweg 52 P O Box 9555 2300 RB Leiden The Netherlands Phone 31 0 71 273761 Fax 31 0 71 273619 74 References Aitkin M A amp Longford N T 1986 Statistical modeling issues in school effectiveness studies Journal of the Royal Statistical Society A 149 1 43 Anderson T W 1958 An introduction to multivariate statistical analysis New York Wiley Blalock H M 1984 Contextual effects models Theoretical and methodological issues Annual Review of Sociology 10 353 312 Bryk A S amp Raudenbush S W 1992 Hierarchical linear models Applications and data analysis methods Newbury Park CA Sage Bryk A S Raudenbush S W Seltzer M amp Congdon R T 1988 An introduction to HLM Computer program and user s guide University of Chicago Busing F M T A 1993 Distribution characteristics of variance estimates in two level models A Monte Carlo study Tech Rep No PRM 93 04 Leiden The Nether lands Leiden University Department of Psychometrics and Research Methodol
47. en analyzing such data one has to decide in what way the hierarchical structure of the data is taken into account Obviously the easiest approach is simply ignoring the structure and analyzing the data at the student level leaving all school information for what it is Generally however one s intention will be to use all information in the data and use it correctly Thus if one is also interested in school differences and in their possible interaction with effects measured at the student level one has to solve the unit of analysis problem This means that one has to decide whether to analyze the data at the student level incorporating disaggregated variables from the school level or to analyze the data at the school level incorporating aggregated variables from the student level Unfortunately according to De Leeuw in his introduction to the book of Bryk and Raudenbush 1992 both of these strategies are subject to serious disadvantages Hence traditional single level analyses fail in the presence of nested data Intra class dependency The basic problem with hierarchical data is that group membership may cause intra class dependency People from the same group are more alike than people from different groups The reason for this phenomenon is that people within a group share the same environment have the same leader experiences and so forth In other words people within the same group have the same score on a number of varia
48. f the form dT trAdO where O is a symmetric q x q matrix This term can be written as q q dT 3S And k 1 l 1 q k 1 q M Au An dOr M ArkdOkk k 1 1 k 1 So OT A A A 22 08 kl Alk 24x if A is symmetric A 23 and OT A A 24 JO uk kk 53 where k 1 Similarly consider a term of the form dS A d0 Blu where O is a symmetric q x q matrix and A and B are matrices This term can be written as q q dS 5 5 Aku dO Bui ucl v 1 q u l q 5 3 Abu Bul Ax Bu dOw 5 Apu Bu dO wi ucl v 1 ucl so Os Am By Am By A 25 08 ku Bul Ago Bui and Os Ap By A 26 Jo ku Bul A 26 where u v A 3 Computational formulas for the function and gradient The formula A 6 of the minus log likelihood function is computationally inefficient be cause a matrix of size N has to be inverted and its determinant calculated Therefore in this section a computationally efficient formula will be derived based on Longford 1987 and using formulas from the previous section Along the same lines computationally effi cient formulas for the derivatives of this function with respect to the parameters will also be derived 54 Combining A 6 A 9 and A 11 we find the following formula for L J L log2r 5 D log o det G 2 1 1 J 5 u y o Iy 0 Z 9G Zilly X77 j l EE 2 i c 53 det G 773 og 27 2 og a 22 og de 1 J 397 i Xii
49. grams is given in Kreft De Leeuw and Van Der Leeden 1994 Final remark In the literature multilevel models are referred to under various names One may find the terms random coefficient regression models De Leeuw amp Kreft 1986 Prosser et al 1991 contextual effects models Blalock 1984 multilevel mixed effects models Gold stein 1986 random parameter models Aitkin amp Longford 1986 full contextual models Kreft amp Van Der Leeden 1994 variance components models Aitkin amp Longford 1986 multilevel linear models Goldstein 1987 Mason et al 1983 and hierarchical linear models Bryk amp Raudenbush 1992 Although there are minor differences all these models are basically the same In one way or another they are versions of the multilevel model discussed here or straightforward extensions thereof 1 2 Why another program for multilevel analysis This manual describes the use and capabilities of a new program for multilevel analysis called MLA T his program has been developed to analyze data with a two level hierarchical structure In this section we will explain why we think it is useful to add a new program for multilevel analysis to the existing ones mentioned above In other words we are concerned with the question What is special about MLA Simulation options Much research concerning multilevel analysis has been directed to the extension and refinement of multilevel theory including the de
50. he estimator of the covariance matrix of 6 A 5 Reparametrization In the formulas of the previous sections all parameters were treated as free parameters But c should obviously be nonnegative because it is a variance Similarly O should be a positive semi definite matrix because it is a covariance matrix To impose these restrictions the parameters can be written in the following way c a A 59 Oo CC A 60 where C is a lower triangular matrix i e with zero elements above the diagonal Equa tion A 59 states that amp should be the parameter used by the program not 0 Equation A 60 expresses in its Cholesky decomposition and the elements of C should be the parameters used by the program This reparametrization may have some drawbacks cf Gill Murray amp Wright 1981 pp 268 269 but we think that it may generally be useful for multilevel analysis See also Longford 1987 who uses a similar reparametrization of a restricted model Note that the reparametrization A 60 cannot be easily used if some elements of O are restricted In order to minimize the reparametrized function the gradient vector should be reparametrized accordingly This is done by using the chain rule of partial derivatives If the original parameter vector is denoted by 0 and the reparametrized parameter vector by then oL OL 00 Therefore the formulas from section A 3 have to be postmultiplied by ae 09 The releva
51. he function and the gradient computationally more efficient formulas will be derived for the covariance matrix of the estimators Combining A 49 A 50 A 51 and A 52 with A 9 it is found that PL que v iz ocz Fx XM eI o7 Z 0G Z1 X j l J J TSO XIX o XIZjOG Zi X j l j l aL EL zz L EIL J 4 0 L Fe Combining A 53 A 54 and A 55 with A 16 and A 14 it is found that L 4 2 E do da LA q DL trG 2 1 and l 4 l 4 2 39 N nq 56 Yu aL J E 45 G Zi cras z 26 Li and OL 4 4 2 Z dp EE 2 Gj 4f j l Combining A 56 A 57 and A 58 with A 13 it is found that aL li E seas o c L523 ku G5 Z Zj ui JE G7 Zi Zi e G7 ZIZI OL 4 J ji ly E e gt G ZZ u G ZjZi us ze won j l Me i OL AN eZ vi 2 Gj 452 j l These formulas are implemented in the program Note that these expressions depend on the data only through the terms Dli XiXj Zi X and ZiZ which are also used for 67 the function and gradient cf section A 3 so that no additional memory is required for data storage Let H be the matrix defined by these expressions Then H p I 8 o Jm 02 where 0 is given by equation A 34 and is the parameter vector that has to be estimated So H N is a consistent estimator of the asymptotic covariance matrix of v N 8 0 or H7 is t
52. he input statements are digested and re displayed and a short table of contents of the output is given INPUT INFORMATION REQUIRED NAME OF DATAFILE NUMBER OF VARIABLES LEVEL 2 ID COLUMN MODEL SPECIFICATION SINGLE EQUATION OPTIONAL TITLE OF ANALYSIS ESTIMATION METHOD OUTPUT INFORMATION PART CONTENTS TITLE PAGE NOOR WNE SESAME DAT 3 1 BizG1 U1 V3 B1 E V3 E G1 U1 MLA EXAMPLE 1 ANALYSIS OF VARIANCE FULL INFORMATION MAXIMUM LIKELIHOOD INPUTFILE STATEMENTS INPUT INFORMATION DATA DESCRIPTIVES ORDINARY LEAST SQUARES ESTIMATES FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES SYSTEM INFORMATION The single equation shows the integration of the Level 2 equations and the Level 1 equation in the same way as in Chapter 2 Equations 2 1 2 2 and 2 4 It is displayed directly below the model specification The output information displays the different parts in the output Default as well as optional output parts are mentioned in two columns one for the part number and one for the contents The fourth part consists of the data descriptives optionally given by the use of the keyword descriptives under the OUTPUT statement These statistics are displayed in two major blocks and are preceded by the number of Level 1 and Level 2 units DATA DESCRIPTIVES LEVEL 1 UNITS 179 LEVEL 2 UNITS 3 VAR MEAN STDDEV VARIANCE SKEWNESS KURTOSIS K S Z PROB Z 1 2 02 0 83 0 70 0 04 1 57 3 18 0 00 2 21
53. iate and it will be clear from the context which form is used For now we will proceed with the form 2 4 Generally it is assumed that e N 0 o2IN and u N 0 where 07 the vari ance of the Level 1 error term is an unknown scalar parameter and O the covariance matrix of the Level 2 error terms is a symmetric matrix of unknown parameters The covariance matrix V of y conditional on X and Z that is the matrix containing the variances and covariances of the random part Z u in Equation 2 4 conditional on Zi is expressed as V Z OZ o Iu 2 5 A model for the complete data follows straightforwardly from stacking the J groups models in Equation 2 4 Its equation is Z 0 0 yi X 0 Z0 U1 1 y T YJ Xj 0 0 e Zy us EJ or y Xyt Zute 2 6 The covariance matrix of the complete data conditional on X and Z is Z 0 0 Oo 0 0 Z 0 0 0 Z 0 0 O 0 0 Z0 V toe E 0 0 ss Z 0 0 0 0 Z 021 Wy 0 s 0 0 Vo 0 0 0 e V The parameters of the model that have to be estimated are the fixed coefficients elements of the vector y the covariance matrix O of the random coefficients and the variance c2 of the errors The elements of y are called the fired parameters and 0 and the elements of O are called the random parameters In the following formulas are presented for the various parts of the output of MLA The order of this chapter is similar to the order of th
54. ical theory only applies to interior points so boundary solutions are a problem in any parametrization 69 70 Appendix B Read Me MMMM MMMMM LLLL AAAAAAAA MMMMM MMMMMM LLLL AAAAAAAAAA MMMM M MMMMMMM LLLL AAAA AAAA MMMM MM MMM MMMM LLLL AAAA AAAA MMMM MMMM MMMM LLLL AAAA AAAA MMMM MM MMMM LLLL AAAAAAAAAAAAAAAAAA MMMM M MMMM LLLL AAAAAAAAAAAAAAAAAAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA AAAA MMMM MMMM LLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA MMMM MMMM LLLLLLLLLLLLLLLLLLLLLLLLLLLLLL AAAA AAAA MULTILEVEL ANALYSIS FOR TWO LEVEL DATA AAAA AAAA VERSION 1 0b AAAA AAAA DEVELOPED BY AAAA FRANK BUSING AAAA ERIK MEIJER AAAA RIEN VAN DER LEEDEN AAAA AAAA PUBLISHED BY AAAA LEIDEN UNIVERSITY AAAA FACULTY OF SOCIAL AND BEHAVIOURAL SCIENCES AAAA DEPARTMENT OF PSYCHOMETRICS AND RESEARCH METHODOLOGY AAAA WASSENAARSEWEG 52 AAAA P O BOX 9555 AAAA 2300 RB LEIDEN AAAA THE NETHERLANDS AAAA PHONE 31 0 71 273761 AAAA FAX 31 0 71 273619 AAAA THIS FILE CONTAINS INFORMATION ABOUT THE FOLLOWING TOPICS FILES ON THE MLA DISTRIBUTION DISK INSTALLATION NOTES PROGRAM S MAIN FEATURES OTHER FEATURES INPUT AND OUTPUT CAPABILITY SYSTEM REQUIREMENTS CREDITS DOCUMENTATION FUTURE PLANS DISTRIBUTION FILES ON THE MLA DISTRIBUTION DISK T1 MLA EXE multilevel analysis executable MLAE EXE extended memory implementation of MLA READ ME the file you re reading
55. implementations as for example discussed by Efron 1982 These have some drawbacks and therefore alternative resampling methods have been proposed that have some advantages for example that they are robust to heteroskedasticity A thorough discussion can be found in Wu 1986 2 8 4 Resampling multilevel models Because multilevel analysis is based on regression analysis resampling methods for mul tilevel models can be based on resampling methods for regression models The methods of section 2 8 3 can however not straightforwardly be applied to multilevel models be cause the usual jackknife and bootstrap theory requires that the different observations be independently distributed This is not the case with multilevel analysis where the observations within the same Level 2 unit are dependent Another difference between regression analysis and multilevel analysis is that in mul tilevel analysis there can be variables measured at all levels In the two level case for example there are variables describing the Level 1 units and possibly variables describ ing the Level 2 units This implies that resampling can be performed at two levels Consider two level data A straightforward implementation of the ungrouped jack knife would be to eliminate one observation from one Level 2 unit at the time to obtain a jackknife sample This resampling scheme is exactly equivalent to the resampling scheme of the standard ungrouped jackknife of secti
56. in estimates for the parameters several estimation procedures have been pro posed These procedures are all versions in one way or another of full information FIML or restricted maximum likelihood REML FIML and REML estimators have several attractive properties such as consistency and efficiency A drawback of both approaches however is their relative complexity Generally parameter estimates must be obtained iteratively and serious computational difficulties may arise during such processes Software The flourishing of models and techniques for analyzing hierarchical data has been stimu lated by the software widely available for estimating multilevel models The three major packages are ML3 Prosser Rasbash amp Goldstein 1991 VARCL Longford 1990 and HLM Bryk Raudenbush Seltzer amp Congdon 1988 although multilevel models can also be estimated with BMDP BMDP 5V procedure Schluchter 1988 SAS MIXED procedure SAS Institute 1992 and GENMOD a program based on the work of Mason Wong amp Entwistle 1983 The three major packages use different methods for maximizing the likelihood In ML3 an Iterative Generalized Least Squares IGLS procedure is implemented Goldstein 1986 and a restricted version of IGLS RIGLS Goldstein 1989 VARCL uses Fisher scor ing Longford 1987 and HLM uses the EM algorithm Dempster Laird amp Rubin 1977 Bryk amp Raudenbush 1992 A comparative study of several of these pro
57. ize goes to infinity the distribution of the estimators will converge to a multivariate normal distribution with a certain covariance matrix see Appendix A The reported standard errors that are the square roots of the diagonal elements of this matrix The exceedance probabilities of the according t values that are reported are based on the approximation of the distribution of the estimators by the normal distribution In finite samples this approximation may not be very good The true standard errors may be quite different from the reported ones based on asymptotic theory and the distributions of the estimators may not be normal In fact Busing 1993 showed in his simulation study that the distributions of the random parameters can be severely skewed As mentioned above however the focus is on the bias and the standard errors and not on the specific distribution 2 8 1 The jackknife The jackknife was introduced by Quenouille 1949 1956 to estimate the bias of an esti mator from one sample and to correct for it Tukey 1958 proposed an accompanying estimator for the variance of the estimator and hence for the standard error The idea of the jackknife is as follows Consider an independently and identically distributed sample of size N from some distribution and an estimator On of a parameter 0 obtained from this sample Many estimators based on a sample of size N have a bias that can be written as b b biasy E n 0 4 Z 4
58. ken as an estimator of the bias of based on the distribution F It has been proved by many authors that this approach works in many cases that is that it leads to consistent estimators of the properties of 6 e g Putter 1994 The actual implementation of the bootstrap is guite simple Drawing samples from Fw is eguivalent to drawing samples with replacement from 21 22 zN The bootstrapi is now implemented as follows B bootstrap samples 25 25 5 Zn fs b 1 B are drawn from Fy that is these samples are drawn with replacement from 21 22 zv From each of the B samples the parameter is estimated thereby ob taining B estimators 87 b 1 B Now the expectation of 8 given Fy is estimated by the mean of the estimators 6 namely 6j DR F B The variance of 6 given Fy is estimated by the variance of the estimators 07 namely Var 0 ZE OF 0c y B The bias of 6 is estimated by the estimated bias of 0 biasg bias 6 07 6 2 44 and the bias corrected estimator of is therefore p 0 biasp 26 6r 2 45 The variance of 6 is simply estimated by the variance of 67 1 B Varg var 52 o 6 2 46 BT 20 The parametric bootstrap The bootstrap as described above can also be termed the nonparametric bootstrap because the distribution the bootstrap samples are drawn from is the nonparametric empirical dis tribution function Fy Frequently howeve
59. l See chapter 2 for details 36 Chapter 4 Output The output of MLA consists of a single text file the second parameter in the statement that starts program execution The file is divided into parts One part may take more than one page Before each new part a pagebreak is inserted This chapter will elaborate on the MLA output file We will illustrate the output using several example analyses These will stretch from a simple analysis of variance to a bootstrap analysis for a complicated two level model It is not our intention to give extensive examples of case studies The examples will give insight in how to use MLA for different analyses and glance at specific parts of the output 4 1 Analysis of variance To illustrate how to run an ANOVA using MLA we consider part of the Sesame Street data set The original set from Glasnapp and Poggio 1985 is used in Stevens 1990 for an analysis of covariance In the first example with this data set we only use two variables from the set which originally included 12 background variables and 8 achievement vari ables for 240 subjects The first 3 sites of the original 5 sites are used on both pretest and posttest Only the achievement variable measuring knowledge of numbers is considered here The series was viewed in between the pretest and posttest The series was meant to teach pre school skills to 3 to 5 year old children An analysis of variance is performed on these data with MLA Level
60. ldfarb Shanno BFGS method Press Flannery Teukolsky amp Vetterling 1986 This is a fast and stable method to optimize arbitrary functions It requires that the function and the gradient the vector of first derivatives of the function with respect to the parameters be programmed It minimizes the function with respect to both fixed and random parameters simultaneously As such it resembles most the algorithm used by VARCL although the BFGS method does not compute the inverse of the information matrix at each iteration The algorithms of ML3 and HLM alternately update the fixed and the random parameters Chapter 2 Theory In this chapter the theoretical background of the general two level model will be discussed It will give the relevant formulas of the model equations and it will give the theory and formulas of the descriptive statistics and estimators that are implemented in the program Additionally it will discuss the theory of the residuals the estimators of the group specific coefficients and the diagnostic statistics that the MLA program provides and the simulation options that can be chosen 2 1 The general two level model In MLA the following general two level model is implemented Suppose data are obtained from N individuals nested within J groups with group 7 containing N individuals Now for group j j 1 yj is a vector containing values on an outcome variable X is an N x q matrix with fixed explana
61. leg 5 log det V 5 yj Xi Vj u X57 and the minus log likelihood function for the whole sample is J L L j l N J F log 27 5 lon det V 2 1 Ie J 2 ui Xy V7 y X57 A A 6 where J is the number of Level 2 units and N is the total number of Level 1 units N DL N where N is the number of Level 1 units in Level 2 unit 7 This is the function that has to be minimized with respect to the parameters to obtain maximum likelihood estimators To minimize this function the program uses the Broyden Fletcher Goldfarb Shanno BFGS minimization method see e g Press et al 1986 which uses the gradient of the function to be minimized In section A 3 computationally efficient formulas for the function and the gradient will be derived In section A 4 the asymptotic covariance matrix of the estimators will be derived In section A 5 a reparametrization of the model will be discussed in which the restriction of positive semi definiteness of covariance matrices is explicitly imposed But first in the next section some useful notation matrices and formulas will be introduced A 2 Some useful formulas First we define the matrix Gj h ZiZjOJo A T This matrix will be used frequently in the following The inverse of V Maddala 1977 p 446 states the following formula A BDB A A B D D A B BAT where A and D are square nonsingular matrices and B is a matrix of appr
62. mators and the maximum likelihood estimators is small and therefore these estimators can be used as well Kreft 1994 Van Der Leeden amp Busing 1994 13 2 4 Maximum Likelihood methods 2 4 1 Full Information Maximum Likelihood FIML One of the most important parts of the program consists of the maximum likelihood estimation This estimation method was chosen for its desirable properties such as con sistency and efficiency In maximum likelihood estimation given the observations pa rameters are found that maximize the likelihood function Mood et al 1974 This is the same as minimizing the minus log likelihood function Assuming normally distributed errors the density of y given X and Z is _ 1 i yj X9 V yj X59 fui AA so that the contribution of Level 2 unit j to the minus log likelihood function is Lj log f yj Xj Zi N 1 1 7 1 gt log 27 5 log det Vj z yj Xjy Vj yj X57 and the minus log likelihood function for the whole sample is simply the sum of all Level 2 units j L Ya L This is the function that has to be minimized with respect to the parameters to obtain maximum likelihood estimators Specifically it will produce a set of fixed parameter estimates 7 and a set of random parameter estimates for the second level and 82 for the first level Details can be found in Appendix A The asymptotic covariance matrix of the estimators is derived from the matrix of second derivative
63. mpling complete cases Bootstrap samples lt zi UN vN consist of pairs y z7 that are also elements of the original sam ple that is for each i 1 N there exists a j 1 lt j N such that y z7 yj 2 Then the parameters can be estimated from each bootstrap sample and bias corrected estimates can be obtained as well as an estimate of the covariance matrix of the estimator using the formulas from section 2 8 2 The implementation of the parametric bootstrap depends on whether a specific dis tribution of x is assumed If x is regarded as a random variable with an unspecified distribution the parametric bootstrap should start with drawing nonparametric boot strap samples of z If on the other hand a specific distribution of x is assumed for example a normal distribution with mean jz and variance c2 then the parametric boot strap starts with drawing parametric bootstrap samples of 7 for example samples from a normal distribution with mean T and variance s2 which are the estimates of u and c2 from the original sample Given a bootstrap sample z1 zy of z the parametric bootstrap draws a sample et 8x of from a normal distribution with mean zero and variance 8 where 6 is the estimate of c from the original sample Then a bootstrap sample yf yx of y is computed from the following equation yf cxi 2 47 where and 8 are the estimates of a and D from the original sample The situation is
64. n can however be extracted from the program by using a file to write the simulation results to The bias of an estimator 0 of some parameter is defined as the difference between the expected value of the estimator and the true value of the parameter A desirable property of an estimator is unbtasedness which means that its bias is zero In the maximum likelihood theory discussed so far however it was only stated that the FIML estimators are consistent This means that as the sample size gets larger the mean of the estimator converges to the true parameter value and its variance decreases to zero Informally speaking the estimator comes closer to the true parameter value as sample size gets larger This is of course a highly desirable property but it does not ensure that the estimator is unbiased in finite samples In fact maximum likelihood estimators are in many models and situations biased in finite samples For a general class of regression models including multilevel models however Magnus 1978 proved that the maximum likelihood estimators of the fired regression coefficients are unbiased On the other hand Busing 1993 showed in a Monte Carlo simulation study that the maximum likelihood estimators of the random parameters in multilevel models are biased The standard errors of the maximum likelihood estimators that are reported by MLA are derived from asymptotic theory This means that they are based on the idea that as the sample s
65. n the assumption that the normal distribution is completely specified beforehand not estimated This is not entirely correct because u and c are estimated but it is sufficient for descriptive exploratory purposes The formula used cf Mood Graybill amp Boes 1974 p 509 is Pr Z 2 29 21 be 27 NT 2 7 II Jj where the series is truncated after convergence of the sum 2 3 Ordinary Least Squares 2 3 1 Within group models In this section we will use the notation 2 1 2 2 Consider Equation 2 1 Ordinary least squares estimates for 9 are given by 8 XIX Xy 2 8 and the estimated standard errors of the elements of 8 are the square roots of the diagonal elements of the covariance matrix given by cov 3 67 X X5 2 9 where x2 1 23M 4 9 lus XiBi is XjBj 2 10 j 4 and q is the dimension of 5 2 3 2 One step OLS total model From Equation 2 6 the term Zu can be considered the random part of the equation Taking the total residuals r Zu de 2 11 leaves after substitution y Xyt r 2 12 Now y can be estimated using ordinary least squares Notice that grouping is ignored Estimates for are given by 3 X X X y 2 13 Using the estimated residuals F y X5 the estimate of the variance of the elements of r can be obtained by a2 a 1 N 2 F2 2 14 Nope 2 14 12 where p is the dimension of y This estimate 67 is the on
66. n the next chapter 3 6 TECHNICAL optional The TECHNICAL statement provides useful possibilities to alter the estimation process It concerns the estimation method method minimization stop criteria like the maxi mum number of iterations naximum number of iterations and two convergence cri teria fconvergence and pconvergence the critical p value for the display of outliers and the possibility of writing intermediate iteration results to disk file If this state ment and subsequent substatements are not specified the program will run using default values Example TECHNICAL met fiml estimation method fiml max 10 maximum number of iterations equals 10 fco 0 00001 function convergence set to 0 00001 pco 0 0001 parameter convergence set to 0 0001 out 0 01 critical p value for outlier display fil tech out technical results will be written to tech out 3 6 1 method optional The substatement method provides the opportunity to set the estimation method One can choose between FIML and REML FIML is the default method and represents full information maximum likelihood estimation REML is restricted maximum likelihood estimation Both procedures are described in Chapter 2 and in Appendix A not implemented yet 33 3 6 2 maximum number of iterations optional The default value of max is 20 This number should be sufficient for reaching convergence if the sample size is large enough and or the n
67. nd W s fixed once a Level 2 unit is drawn This is useful when the data within the unit can not be considered a simple random sample for example with repeated measures data or families Then a complete Level 2 unit is temporarily regarded as a single observation and bootstrap samples are drawn from these observations With repeated measures this implies that for each subject that is drawn in the bootstrap sample the data for all the timepoints are exactly the same as in the original sample For a family this means that the complete family is kept together and that once the family is drawn in the bootstrap sample mother father and children are all part of the bootstrap sample and for example the mother can not be drawn twice within the same Level 2 unit On the other hand it is also possible to keep the Level 2 units fixed and draw boot strap samples only from the Level 1 units within each Level 2 unit This can be useful when the Level 2 units can not be considered a simple random sample for example when several prespecified countries are compared and people within each country are drawn randomly Then in the bootstrap samples all countries are present once just as in the original sample Bootstrap samples are drawn from complete cases within each country Once bootstrap samples are drawn bootstrap bias corrected estimators and bootstrap standard errors can be obtained straightforwardly The three possible methods for drawing bo
68. ng grouping will result in 166 19 for c Using the two step procedure lowers the estimate to 136 50 and also gives an estimate of the variance of u 29 47 Part 6 contains the FIML estimates This part is default and appears in all output Compared to the previous ordinary least squares estimates part T values and probabilities for T are given Here unlike for the OLS estimates these are theoretically justified FULL INFORMATION MAXIMUM LIKELIHOOD ESTIMATES FIXED PARAMETERS LABEL ESTIMATE SE T PROB T Gi 31 322433 3 123586 10 03 0 0000 RANDOM PARAMETERS LABEL ESTIMATE SE T PROB T 40 U1 U1i 26 935304 23 900162 1 13 0 2597 E 138 833164 14 799662 9 38 0 0000 INTRA CLASS CORRELATION 26 9353 138 8332 26 9353 0 1625 CONVERGENCE CRITERION REACHED ITERATIONS 2 LOG L 5 1398 626571 Whenever there are residuals associated with the grand mean the intra class correla tion is computed and given just below the FIML estimates The formula for the intra class correlation is 2 c 4 2 P T c and in MLA notation E 4 3 P Ui x U1 E If the technical keyword is omitted from the OUTPUT statement a short description of the final iteration results is given in the FIML part Here convergence is reached in 5 iterations and yields a 2 LOG L value of 1398 63 The final part of the output contains some system information The format of the date is DD MM YYYY and for the time HH MM SS Th
69. nly the first or the second level This feature may be useful for instance in analyzing repeated measures data Alternative simple estimation methods Usually complicated iterative estimation procedures are used to estimate the parameters of multilevel models From a theoretical and technical point of view these procedures provide the best estimates that can be obtained However in practice some of the algorithms used may be rather slow under certain conditions In other cases serious com putational difficulties may arise that are not easy to overcome De Leeuw and Kreft 1993 discuss alternative estimation procedures for both fixed and random parameters in multilevel models that are non iterative and relatively easy to implement Moreover in certain cases the quality of the parameter estimates is rather good Hence one could question the real gain of the complicated iterative procedures over these simpler alterna tives Therefore in MLA we have implemented a one step and a two step OLS procedure A simple WLS procedure has still to be implemented Simple procedures can always be used as an addition to complex ones and vice versa Their results can always be compared with the results of the iterative methods It depends on the data which estimation procedures are to be preferred De Leeuw amp Kreft 1993 Kreft 1994 Fast Maximum Likelihood algorithm To maximize the likelihood function the MLA program uses the Broyden Fletcher Go
70. now EXAMPLE IN MLA input examples EXAMPLE 0UT MLA output examples SESAME DAT sesame street data set RAT DAT rat data set NELS DAT nels data set JSP DAT junior school project data set INSTALLATION NOTES To install the program simply copy all the files to the destination drive and or directory and put the drive and or directory in your PATH statement Don t forget to make the PATH effective For example installing MLA on your C drive in the directory C MILA md c mla gt make the directory on your C drive c gt change to the C drive cd mla gt and make MLA the current directory copy a gt copy all files from a path path c mla gt add the directory to your path statement If you want the path to be effective from computer startup change the path statement in your autoexec bat To run the program type MLA lt inputfile gt lt outputfile gt where lt inputfile gt should be replaced by the name of the input file and lt outputfile gt replaced by the name of the output file PROGRAMS MAIN FEATURES MLA is a batch driven statistical program that provides several types of estimates for a multilevel model with two levels including o Summary statistics mean variance standard deviation etc o Ordinary least squares estimates one step OLS two steps OLS OLS per level 2 unit o Full information maximum likelihood estimates including Standard errors Test statistic
71. nput display digested input statements des display variable descriptive statistics out ols res pos dia display all other output input The input information is digested and displayed in two parts A required and an optional part Here a single equation is displayed similar to 2 3 and input can be checked After the input information a short table of contents of the output is displayed It explains which part of the output gives which information descriptives All data variables are used to obtain simple summary statistics The sample sizes of the different levels are displayed followed by two blocks of information The first block displays mean stddev variance skewness kurtosis and K S Z the latter with its significance level denoted by respectively Computational formulas are given in Chapter 2 The second block contains seven percentiles of the variables These are the Oth minimum 5th P5 25th Q1 50th median 75th Q3 95th P95 and 100th maximum percentiles respectively outcomes The Level 2 outcomes consist of ordinary least squares estimates per Level 2 unit Es timates of the regression coefficients and estimates of the error variance including their standard errors t ratios and exceedance probabilities of the t ratios per Level 2 unit are displayed in separate blocks with their Level 2 unit number and Level 2 unit size olsquares This part contains the ordinary least squares estimates f
72. nt for each separate school which is part of the model at the student level As was said above this term refers to the expected influence of size on the regression of math on homework In the terminology of multilevel analysis this term is called a cross level interaction For some researchers this interaction term provides the main attraction to multilevel analysis It is the cross level interaction parameter that leads to the interpretation of slopes as outcomes cf Aitkin amp Longford 1986 The number of levels Theoretically we can model as many levels as we know the hierarchy has or as we think it will have In practice however most applications of multilevel analysis concern problems with two or three levels Data sets with more than three levels are rare In fact a majority of applications just concerns two level data and can be viewed as within and between analysis problems It should be noted that models with more than three levels show a rapid increase in complexity especially where interpretation is concerned If such models are necessary they should be limited to rather simple cases that is to cases with only a few predictor variables Random Level 1 coefficients In multilevel modeling we are usually not looking for estimates of the regression coef ficients within each separate group but for their variances and covariances However there can be circumstances in which we still want to ob
73. nt formula for o is o 20 Oo To form the relevant expression for C consider the k l and k k elements of O where 68 k gt l q On X ChuChu ucl l M ChuChu u i g Ork 3 Chu u i k Yo ci ucl So aoe Cy ifu lt l A 0 if ul Tm Chu ifu lt l m 0 if ul sat 0 if u Z k and uz l Tm 2Cpu ifu lt k Tm 0 if u gt k Tm 0 ifu k Consequently if u gt v aL PC L 004 SOL 004 0 7 222 364 00 t 2 Onn Wun k 1 1 u 1 q aL aL aL v UA v 2 Cw 2 96 Ion ae These formulas are implemented in the program It is possible to transform the second derivatives in a similar way to obtain an estimator of the covariance matrix of the estimators But in general the user will be more interested in the original parameters and therefore the estimates of the transformed parameters are retransformed to estimates of the original parameters and the covariance matrix of section A 4 is used This procedure is correct because the transformation is a one to one mapping from the feasible region of the original parameters to the domain of the transformed parameters except for some trivial equivalent solutions such as c and o which lead to the same retransformed solution Only when the estimates are near the boundary of the feasible region the asymptotic covariance matrix may not be correct but the usual statist
74. o be executed Finally comments preceded by a percent sign 4 may appear throughout the input file All text on a line after and including the percent sign will serve as comment and is ignored as program input In the following all statements and substatements implemented are discussed and illustrated with small examples In Chapter 4 where we focus on the program output complete examples are provided 3 1 TITLE optional Following the keyword TITLE the first non blank line contains the title for the analysis Although the statement is optional it is highly recommended Moments after the analysis all may seem clear but after a few months you may have no idea what you have done The title may be your only clue You may also enrich your input file with comments In contrast to comments the title is repeated on top of every part of the output Example 27 TITLE MLA example 1 analysis of variance 3 2 DATA required The DATA statement contains information about the data file This statement has four substatements three of which are required The file substatement gives the name of the data file variables the number of variables in the data file idi the optional variable number of the Level 1 identifier variable and id2 the variable number of the Level 2 identifier variable Example DATA file sesame dat data set from Glasnapp and Poggio 1985 vars 3 total of three variables id2 1 Level 2 identifica
75. ogy De Leeuw J amp Kreft I G G 1986 Random coefficient models for multilevel analysis Journal of Educational Statistics 11 57 86 De Leeuw J amp Kreft I G G 1993 October Questioning multilevel models Paper presented at the Multilevel Modeling Workshop at the Rand Corporation Santa Monica CA Dempster A P Laird N M amp Rubin D B 1977 Maximum likelihood for incomplete data via the EM algorithm Journal of the Royal Statistical Society B 39 1 38 Durbin J 1973 Distribution theory for tests based on the sample distribution function Philadelphia SIAM Efron B 1979 Bootstrap methods Another look at the jackknife Annals of Statistics 7 1 26 Efron B 1982 The jackknife the bootstrap and other resampling plans Philadelphia SIAM Gill P E Murray W amp Wright M H 1981 Practical optimization London Academic Press Glasnapp D R amp Poggio J P 1985 Essentials of statistical analysis for the behavioral sciences Columbus OH Charles Merrill Goldstein H 1986 Multilevel mixed linear model analysis using iterative generalized least squares Biometrika 73 43 56 Goldstein H 1987 Multilevel models in educational and social research London Oxford University Press Goldstein H 1989 Restricted unbiased iterative generalized least squares estimation Biometrika 76 622 623 Harville D A 1977 Maximum likelihood approaches to variance
76. on 2 8 1 Another possibility is to implement the grouped jackknife With the grouped jackknife it is most logical to use the Level 2 units as groups The Level 2 units may have different sizes and therefore the grouped jackknife with unequal group sizes 2 41 2 42 should be used The theoretical proper ties of these estimators are currently not known Moreover the grouped and ungrouped jackknife are based on the assumption that the observations are independent which is not the case in multilevel analysis Furthermore the jackknife estimators for regression anal ysis may also be nonoptimal Wu 1986 Therefore the jackknife estimators in multilevel analysis are experimental and may not be consistent Further research will be needed to 22 obtain information about the properties of these estimators For that purpose they are implemented as options in MLA The parametric bootstrap can be easily implemented in multilevel analysis If the X and W variables are considered fixed in 2 1 and 2 2 bootstrap samples y5 yey can be obtained in the following way First for each 7 1 J draw a bootstrap Level 2 error vector u from a normal distribution with mean zero and covariance matrix 6 Then draw a bootstrap Level 1 error vector 7 from a normal distribution with mean zero and covariance matrix GIN Finally the bootstrap sample of y is obtained from B Wiy c u 2 50 and y XjBt 2 51 Then bias corrected
77. ool size on school effectiveness School size may influence the estimated relationship between math and homework At first glance the model presented above seems to lead to a hierarchically structured regression procedure which proceeds in two steps First the models for all schools are estimated and then the intercept and slope estimates are used as the dependent variables in the Level 2 model which is then estimated Although such procedures have been proposed in the past this is not what will be discussed here under the heading of multilevel models because there is no statistical connection between the Level 1 and Level 2 models In multilevel models separate regression equations for each level are only formulated because they facilitate insight and understanding The statistical linkage of both levels is created by the Level 2 model which states that Level 1 regression coefficients intercepts and slopes are treated as random variables at the second level The Level 2 model models intercept and slope estimates as a mean value over all schools plus a school specific deviation or residual It follows that we are not primarily looking for intercept and slope estimates for each separate school but for their means and variances and their covariance over all schools In this way just as students are considered a sample from a population of students schools are considered a sample from a population of schools There are several re
78. opriate dimen sions This formula can also be written as A BDB A AT B D BD A B p A A7 AT B I B AT BD DH B AT A A BD I B A BD B At A 8 By defining A c IN B Z and D O it follows that V can be written as A BDB Consequently the inverse of V can be found from equation A 8 Vi oy o7 TA 20 fy Zi o 1 zie zi 01n o ly o7 0 I ZZ Jo Z o Iy 0 ZjOGT Zi A 9 50 The determinant of V Based on Maddala 1977 pp 446 447 the following formula for the determinant of a partitioned matrix can be derived aul 4 BL aal A BY 1 47B lc pj 0 D 0 I A 0 u 2 E det A det D CA 1B where A and D are square nonsingular matrices and B and C are matrices of appropriate orders Similarly A B A B I 0 sf 5 ew 5 pic r det D det A BD C Consequently det A det D CA B 2 det D det A BD7 C A 10 Now define A I B Zi C Z 0 and D c IN The matrix V can now be written as V D CA B and the determinant of A is 1 Consequently using equation A 10 det V det Adet D CA B det D det A BD C det o Iw det I Zi o Iw Z 0 o det I Z Z 0 a o det Gj A 11 The factor ZIV In the following the factor ZV will frequently pop up This factor can be written in a computationally more efficient form ZVi Z e Iw o Z 0G Z from A 9
79. or the fixed Gi and random variances and covariances of Ui and E parameters A regression analysis is performed ignoring grouping For the error variance two estimates are displayed the one step EC1 and two step E 2 estimates corresponding to 2 14 and 2 22 respectively technical The technical information output consists of four columns The iteration number is in the first column In the second column the 2 log likelihood is displayed The third column 35 shows the norm of the difference of the parameter vector at the current iteration and the parameter vector at the previous iteration That is the differences between the current and previous values of the parameters are squared and summed and the square root of the resulting summation is reported The last column shows the norm of the gradient vector When the technical keyword is not used only the final information is displayed as part of the maximum likelihood information part residuals For the first level three different types of residuals are displayed namely the total raw and shrunken residuals The Level 2 residuals are the raw and shrunken residuals for every random Level 2 component Computational formulas are given in Chapter 2 posterior Displayed are the posterior means 2 29 based on the full information maximum likelihood estimates diagnostics For diagnostic purposes outliers are reported There are two kinds of outliers one for each leve
80. ormulas will be used One can write Z 0 OZ and taking V Zj Z 4 In from 2 5 ZV 26D from A 12 where Gj l 2 2 0 02 2 27 then OZ V7 y X53 6 766 Ziu Z X 5j Finally the shrunken residuals for Level 1 follow from 2 11 e Zi 2 28 2 6 Posterior means The posterior means are the shrunken estimators of 8 They are the expected values of the 8 given the data and the maximum likelihood estimates of y O and o2 They are derived from the shrunken residuals and their formula is HEWA 2 29 where is the estimate obtained by full information maximum likelihood and wu is taken from 2 26 This can easily be shown to be equal to B l A W 3 A895 2 30 where pus is the within group OLS estimator 2 8 of 5 A X V X V Z N X X and the notation is of 2 1 and 2 2 Thus from 2 30 8 can be seen as a matrix weighted average of the within group estimator pus and the estimated prior expectation W of B the former being unbiased and the latter being more efficiently estimated The more efficient fous is estimated the more matrix weight it gets and the closer the posterior means are to the within group estimates A can be called an estimated reliability matrix cf Bryk amp Raudenbush 1992 p 43 15 2 7 Diagnostics Currently the only option for diagnostics performed by MLA apart from the descriptive statistics is ou
81. otstrap samples from complete cases dis cussed above are implemented as options in MLA as well It will depend on the nature of the data which one is most fit for a particular application 2 9 Missing data Missing data are a frequently occurring phenomenon For instance in repeated measures designs the points in time at which the different subjects are measured may not be the same or the number of points in time the subjects are measured may differ This situation leads to missing time points that is all time specific variables of a subject are missing at some point in time However the time invariant variables such as sex are of course known This situation is easily handled by a multilevel model in which the subjects are the Level 2 units and the time points are the Level 1 units As was discussed for a usual multilevel model the number of Level 1 units may be different for different Level 2 units and so the missing timepoints give no problems An example of repeated measures is given in chapter 4 If however in a multilevel model be it an application in repeated measures or not for some Level 1 unit some Level 1 variables are measured but others are not or for some Level 2 unit some Level 2 variables are measured but others are not there are missing values that can not be handled by the standard model If only output variables are missing the EM algorithm provides a standard way of dealing with the missing values in a sati
82. r it is assumed that F is a specific distribution F only depending on a parameter or parameter vector which may or may not be the same parameter as 0 Then if is estimated by 6 F can also be estimated by Fy F 9 instead of Fy If the distributional assumption about F is correct this para metric empirical distribution function will generally be a better more efficient estimator of F The parametric bootstrap is defined exactly analogous to the nonparametric bootstrap except that bootstrap samples are drawn from Fw instead of Fy This means that no longer samples are drawn with replacement from the original data but from a generally more smooth distribution function Hence the values of the z7 in the bootstrap sample will usually not be values also encountered in the original sample For example if it is assumed that F is a normal distribution function with mean pu and variance 07 then bootstrap samples are drawn from a normal distribution with mean and variance s where 7 and s are the mean and variance of the original sample 2 8 3 Resampling regression models Consider a simple linear regression model y e Bz Ee where is a normally distributed error term with mean zero and variance c Suppose that a sample yi 21 yv v N is available Then parameter estimates B and 8 can be obtained Now if x is considered a random variable nonparametric boot strap samples can be easily obtained by resa
83. r 3 Input In this chapter the input of the MLA program is explained A simple introduction to multilevel models is given in Chapter 1 A discussion of estimation and other relevant theory concerning the multilevel model implemented in MLA can be found in Chapter 2 MLA Version 1 0b runs as a stand alone batch program lt uses an input file and an output file as parameters This means that in DOS the program can be started by the command MLA input file name output file name where input file name is the name of the file that contains the input and output file name is the name of the file in which the output of the program will be saved Both files are simple text files ascii The output file will be explained in the next chapter The input file will be considered here The input file consists of statements which are case insensitive Every statement begins with a slash and a keyword e g TITLE Every keyword may be abbreviated but it must be at least of length three to be recognized e g TIT Other text following the keyword and or leading spaces will be ignored The rest of the statements must follow on lines below the keyword and should precede the next statement These lines are called substatements and may also consist of one or more keywords e g file The last statement to be read is the END statement All other statements and corresponding substatements may appear in any order but before the END statement if they are t
84. re implemented in the program Note cf Bryk amp Raudenbush 1992 p 239 that the function and the derivatives depend on the data only through the J J J terms 5572 YY jar XU Di1 AUX Zi ZX j and Zi Z of which is a scalar the second a p vector and the third a symmetric pxp matrix The last three are a q vector a q x p matrix and a symmetric q x q matrix for each Level 2 general conditions is given by see Magnus 1978 N V in 0 E x 04 lim 2 58 through G the first where N is the sample size 1 8 E aa A 34 and L is the minus log likelihood function Therefore the asymptotic covariance matrix of the estimators is derived from the matrix of second derivatives of L the Hessian matrix From A 29 we have a cecal j9 j l Thus aL J J d gt X AV u Xi JA XG dy j 1 j l J SONGS Ado Vj Z dO ZV yj X7 j l J ONIX dy using A 21 j l J OP XIV yj Xin do j l J DOGV ZOZ ys Xo j l J J XIVj X dy A 35 j l Therefore PL J Fae LXV A 36 ay ay 2 jj o A 36 PL J ay 0a XN ui Xi A 37 j and using A 25 and A 26 aL J B a 5 06 AON Zi lZ V i Xi tOGVE Zoe Vy Gi Xi A 38 aL Jd a OIO i JGV Z5 eulZiVi ui Xie A 39 j From A 30 we have EPU 73 24 Xy Vr Xi 59 Thus aL 14 1 3 5X dV 73 24 dV ys Xj J So yj XV X dy j l 1 J 32
85. s Probability values o Restricted maximum likelihood estimates including not implemented yet Standard errors Test statistics Probability values UTHER FEATURES o Simulation analysis including Two kinds of simulation Bootstrap Jackknife Three different methods of bootstrap simulation Resampling from cases Resampling with multivariate normal distribution Resampling with residuals Two different types of residual estimation for resampling raw residuals 72 shrunken residuals Three different cases resampling schemes resample only level 1 units resample only level 2 units resample both level units o Constraints for estimates of the form parameter value o Technical settings options o Special output providing Input and output contents Technical information Raw and shrunken residuals Posterior means Simple diagnostics INPUT AND OUTPUT The input data and output files are all ASCII files The input file contains statements about the data the model and other input requirements The data file is a free field formatted numbers only ASCII file The output file is also an ASCII file If a file with the same name as the name of the output file already exists it will be overwritten See documentation for further elaboration on these subjects CAPABILITY The program can handle up to 16 equations with 32 terms each The table below gives the maximum values of the different inpu
86. s of L the Hessian matrix This covariance matrix is used for the standard errors for both fixed and random parameters For a detailed description of the derivations used and an extensive discussion of the computational formulas used in the program Appendix A contains all information 2 4 2 Restricted Maximum Likelihood REML This is an important alternative estimation procedure which will be implemented in subsequent versions of the program 2 5 Residuals The total residuals are given by Equation 2 12 r y XY In this equation estimated residuals are based on the fixed parameter estimates from the maximum likelihood estimation although other estimates of y could be used as well The raw residuals for the first level are taken from the within group model 2 1 j yj Xj 2 24 where the estimates 8 are the OLS estimates from 2 8 Using the between group model from Equation 2 2 and the OLS estimates from Equation 2 8 the Level 2 raw residuals are uj B Wy 2 25 14 where stems from Equation 2 13 The Level 2 shrunken residuals are given by aj ZjOY LZ O Z 62Iu fj 2 26 where F contains the total full information maximum likelihood residuals for group 7 i e PF y X 7 where 7 is the FIML estimator of y and and 82 are the FIML estimators of O and c2 respectively The formula is computationally rather inefficient Therefore the following more efficient f
87. sed as an example analysis in Kreft and Van Der Leeden 1994 The model equations are given by Vig By PojXij Eis B1 1 Ya W t5 B5 Y3 ya W uj where Y represents the score on the math test V9 X V5 is the amount of homework done and W is the variable indicating the type of school Public or Private V17 The following input file shows the application of the TECHNICAL statement as well where the maximum number of iterations is raised to 100 and both convergence criteria are set to 0 00001 45 TITLE MLA example 4 multilevel analysis DATA file nels dat vars 17 id2 1 MODEL bi gi g2 vi7 ul b2 g3 g4 vi7 u2 v9 bi b2 v5 e TECHNICAL maxiter 100 fconv 0 00001 0 00001 pconv OUTPUT tech END The special output given contains the intermediate iteration results These are the results during the full maximum likelihood estimation Starting with the values from the two step OLS estimation we can see that the first value is already rather good More about the difference between first iteration and final estimates in two level models can be found in Van Der Leeden and Busing 1994 The estimation process stopped after 14 iterations The difference between the last two 2 LOG L values was less than the user provided fconvergence 0 00001 TECHNICAL ITERATION INFORMATION ITER 2 LOG L WORM CdP WORM CG 1 1749 8685307 1 9340571 11 8087864 2 1749 632754
88. sfactory way If however some exogenous X and or Z variables are missing the EM algorithm can not be used straightforwardly because it requires that the joint distribution of the exogenous and the endogenous output variables is known Standard multilevel modeling only assumes that the conditional distribution of the output variables given the exogenous variables is known This poses severe complications If the amount of data that is missing is relatively small standard ad hoc solutions to the missing data problem can be used such as listwise deletion deletion of cases with one 24 or more missing values pairwise deletion computation of sufficient statistics such as covariances on the basis of all available information for the variables in question mean substitution substitution of the mean of the observed values of a variable for a missing value on that variable or other substitution methods All these methods have their advantages and drawbacks and none is fully satisfactory especially when the number of missing values is large In the current version of MLA no specific means of missing data handling are imple mented Listwise deletion and several forms of substitution can be done by the user before the data set is processed by MLA Pairwise deletion can not be done because the program requires raw data In principle pairwise deletion could be done within the program but this is not implemented yet 25 26 Chapte
89. simulation work as follows e obtain a new sample e repeat the analysis e save the new estimates These three steps together called a replication are repeated a number of times Af terwards bias corrected estimates of model parameters and nonparametric estimates of standard errors are computed These estimates are computed from the set of saved boot strap or jackknife estimates and the original maximum likelihood estimates The bootstrap introduced by Efron 1979 differs from the jackknife the nonpara metric technique proposed by Quenouille 1949 in the way a new sample is obtained The choice between bootstrap or jackknife resampling also determines the way the final simulation estimates are computed More details can be found in Chapter 2 3 5 2 method This substatement specifies the method of bootstrap to be performed It is required whenever kind bootstrap One can choose between three different methods error cases and parametric The three methods differ in the way the bootstrap sample is obtained error This method resamples the elements of the Level 1 and Level 2 error vectors Subsequently a new outcome or dependent variable is computed using these error vectors the original predictor or independent variables and their corresponding FIML parameter estimates The way in which the Level 1 and Level 2 error terms are estimated from the total FIML residuals is discussed in Chapter 2 cases Using this metho
90. st can be applied The difference between the function values is approx imately 1399 1318 81 and the degrees of freedom is equal to 1 The likelihood ratio test indicates that the effect is highly significant 4 3 Repeated measures analysis The rat data set used for the repeated measures example has been analyzed by a number of investigators The first use of these data with multilevel analysis appeared in Strenio Weisberg amp Bryk 1983 The rat data consist of the weights of ten rats These rats were measured five times with four week intervals from birth Also included in the model is the weight of each rat s mother V2 Divided into two levels the equations are given by Vig By PojXij Eiz B1 1 Ya W t5 B5 Y3 ya W uj where X V5 is the age in weeks divided by 4 minus 2 so that it is in deviation of the mean of the rat W V2 represents the weight of the mother The input file for the repeated measures example is as follows TITLE MLA example 3 repeated measures analysis DATA file rat dat vars 4 gi g2 v2 ul b2 g3 g4 v2 u2 vi bi b2 v4 e OUTPUT outc post END For each rat a multiple regression analysis is performed and displayed in the Level 2 outcomes part This part is optional and displayed through the use of the outcomes keyword in the OUTPUT statement LEVEL 2 OUTCOMES ORDINARY LEAST SQUARES ESTIMATES PER LEVEL 2 UNIT UNIT SIZE B1 S
91. statement the critical p value for outlier display can be set The keyword is outliers and an example is given below TECHNICAL outliers 0 25 On the extremes outliers 0 0 will show no outliers at all while outliers 1 0 will show all available cases being both Level 1 and Level 2 units 3 6 6 file optional The technical output can be written to a separate file The file is specified after the file substatement under the TECHNICAL statement and must satisfy the usual DOS conventions on filenames The file will contain the iteration numbers and the parameter estimates in the same order as in section 3 5 7 after each iteration Depending on the number of parameters multiple lines of parameter estimates will be displayed 34 3 7 OUTPUT optional The OUTPUT statement gives the user control over the output Not all output is optional The default output consists of a title page an echo of the input the maximum likelihood estimates FIML and system information Output for the simulation analysis is generated whenever the SIMULATION statement is used Additional output is controlled by keywords following the OUTPUT statement The keywords must be separated by spaces or commas and may take up more than one line The keywords will be shortly explained below A more profound elaboration follows in the chapter on output Most theory underlying the different parts of the output can be found in Chapter 2 Example OUTPUT i
92. statements starting with a keyword Models are specified by simply formulating the model equations This manual provides the necessary information for the new user to fit multilevel models with two levels to a hierarchical data set It is expected that the user has basic knowledge of regression analysis A brief introduction to multilevel analysis and related concepts is given in the first chapter References to three major textbooks on multilevel analysis can be found in the text The MLA program was developed and is being further developed by Frank Busing Erik Meijer and Rien van der Leeden As the version number indicates this manual describes a beta version We are still doing research to polish and improve certain simulation options We would very much appreciate hearing about any of your experiences using the program and this manual Please contact us by email busing rulfsw leidenuniv nl or vanderleeden rulfsw leidenuniv nl We would like to thank Jan de Leeuw and Ita Kreft for helpful discussions comments and references The Institute for Educational Research in the Netherlands SVO is gratefully acknowl edged for supporting this project by a grant SVO project no 93713 Frank M T A Busing Erik Meijer Rien van der Leeden Leiden December 1994 ii Contents 1 Introduction 1 1 1 Introduction to multilevel analysis 1 1 2 Why another program for multilevel analysis 5 2 Theory
93. t variables input variable maximum equations 16 parameters 32 per equation level 1 units 8000 per level 2 unit level 2 units 16000 variables 16000 bootstrap replications 16000 constrainst 64 These limitations are the absolute maxima and can be somewhat lower depending on the amount of memory available SYSTEM REQUIREMENTS MLA will run on any IBM PC AT PS 2 or compatible under MS DOS PC DOS DR DOS or 08 2 A minimum of 256K of free RAM is necessary MLA will also run in a DOS environment under WINDOWS or 08 2 The program DOES need a numeric coprocessor However non coprocessor implementations are available from the authors A coprocessor is highly recommended for extensive simulations or computations on large samples CREDITS The development of this program has been supported by a grant from SVO project number 93713 IBM PC AT and PS 2 PC DOS and 08 2 are trademarks of International Business Machines MS DOS and MS WINDOWS are registered trademarks of Microsoft Corporation DR DOS is a registered trademark of Digital Research 73 DOCUMENTATION From the same authors an extensive manual was written for the MLA program The manual contains an introduction to multilevel analysis information about estimation procedures used in the program a description of the input statements for MLA and many different examples A technical appendix describes reparametrization and the minimization of the lik
94. tain the best estimates for these coefficients also called random Level 1 coefficients Such questions may arise for example in education when schools are to be ranked in terms of effectiveness using their estimated slope coefficients Kreft amp De Leeuw 1991 The first thing that comes to mind is to simply estimate them by a separate OLS regression for each school However this procedure has the serious disadvantage that the coefficients will not be estimated with the same precision for each school For instance in one school we could have say 45 students whereas in another school we only have 7 students This will definitely influence the accuracy of results Within the framework of multilevel analysis there is a way to obtain best estimates of these coefficients by a method called shrinkage estimation The underlying idea of this estimation is that there are basically two sources of information the estimates from each group separately and the estimates that could be obtained from the total sample ignoring any grouping Shrinkage estimation consists of a weigted combination of these two sources The more reliable the estimates are within the separate groups the more weight is put on them Vice versa the less reliable these estimates are that is the less precise the more weight is put on the estimates obtained from the total sample The result is that estimates are shrunken towards the mean of the estimates over all groups
95. through the vector that stacks its columns vec dF d vecF Note that the differential of a scalar vector or matrix is a scalar vector or matrix of the same size Some useful formulas are Magnus amp Neudecker 1985 1988 A z amp df A x da d c 0 d cg cdg d g h dg dh d gh tig gan d log f FU d det F det Fee dF d trF trdF d F F dF F 52 where cis a scalar vector or matrix constant g and h may be scalars vectors or matrices provided the expression is a valid expression f is a scalar and F is a matrix There is also a chain rule If f is a function of z and g is a function of f then cf Magnus amp Neudecker 1988 p 91 Og _ Og OF Oz Of Aa This means the following for the differentials If dg Adf and df Bdz then dg ABdz which illustrates that informally speaking the formulas for the differentials can be filled in sequentially A similar formula holds for differentials of matrices In the following it will be clear how the chain rule can be applied The formulas above can be used to derive some important differentials dV d c Iu Z OZ do In Z 49 Z A 17 d det V det V tr V dVj A 18 dVi z vril dvjyvz A 19 1 det V t d V et V tr V7 dVj tr V7 dVj A 20 Combining equation A 19 with A 17 we find that dvi Vj do 1w Zi d9 Z V Vi de V Z 40 Z V A 21 Consider a term o
96. tion given by first variable 3 2 1 file required This substatement indicates the name of the data file The name is given after the equals sign and must satisfy the usual DOS conventions on filenames If the file is in the current directory the complete pathname is not necessary The file itself is a free field formatted numbers only ascii file This means that values of variables must be separated by at least one blank A case may consist of more than one line Cases must be sorted by the Level 2 identifier variable see below 3 2 2 variables required The variables substatement specifies the number of variables in the data file Because the data file is a free field formatted file and one case may consist of more than one line this is necessary information for the program to determine when to start a new case 3 2 3 id1 optional With this substatement a case number variable can be given This can be useful in those situations where the output gives specific information about cases at the first level The substatement is otherwise equal to the id2 substatement see below If omitted the order in which the Level 1 units are read from the data file is used as identification 3 2 4 id2 required One of the variables in the data file must contain a code number that identifies the Level 2 units This may be a group number or in case of repeated measurements a subject number The number is essential for a correct discrimination
97. tlier detection Although the term outlier seems to be unambiguous this is not completely true An outlier is considered to be a deviant observation in the data and not a deviant residual after model estimation However a procedure fitting outliers in the data as residual outliers is considered to be a robust procedure Outliers are in MLA detected using residuals So we expect MLA to be robust against data outliers and therefore look for residual outliers More research in the field of robustness for multilevel models would be useful however The detection of outliers differs for Level 1 and Level 2 outliers For both levels the shrunken residuals are considered For the first level the quotients amp j va E 2 31 are calculated where amp E N DE uj is the variance of the Level 1 residuals Residuals will be displayed whenever the quo tient 2 31 when compared to a standard normal distribution has a p value less than some possibly user specified value The default value is 0 1 For the Level 2 outliers the Mahalanobis distances of the Level 2 residuals to their theoretical mean of zero are calculated by M wO uj Now residuals are displayed for which M is larger than the critical value corresponding to a possibly user specified p value of a chi square distribution with q degrees of freedom where q is the dimension of u This p value is the same as for the Level 1 outliers 2 8 Simulation The maximum
98. tory variables including the constant 8 is a vector of regression coefficients and is a vector with random error terms vectors and matrices of appropriate dimensions Then for each group j the Level 1 or within group model can be written as yj XjB t 6j 2 1 The Level 2 or between group model can be written as 8 Wi uj 2 2 where W is a q x p matrix with explanatory variables including the constant obtained at the group level 7 is a vector containing fixed coefficients and u is a vector with error terms Equation 2 2 clearly illustrates the slopes as outcomes interpretation because it gives the illusion that the coefficients in 9 are outcome variables in a separate Level 2 model However substition of Equation 2 2 into Equation 2 1 gives the total model equation yj XjWyy Xjuj 8j 2 3 This is a mized linear model Harville 1977 of the form yj Xiy Zjuj t j 2 4 in which X X W and Zj Xj Several authors use different notations for the models presented in this chapter and in subsequent chapters We find the separate model equations 2 1 and 2 2 for the two levels most useful for interpretation of the model and its estimates and the program input is therefore based on them see chapter 3 For theoretical purposes we find the form 2 4 most useful where usually X7 will be simply written as X Therefore in the following both representations will be used where appropr
99. ulVj de V Z 40 Z V j l 1 J 5 Vw Xo av u Xy j l J So yj X yVi X dy using A 21 j l J 1 1 phy aot LO j a9 Z V 1 J 32 yj Xjy aV V VE dV y X59 j l J Vu X y Vi X dy j l 1 J py ao ied Z V Z d6 J M uy X 3 V dV y X57 j l J So yj X y V X dy j l 12 gt phy a ie Z V Z d0 J u Xin V Vi Ado Vj Z 40 Z V ij j l J So yj X y V X dy j l 60 X57 J 5 tV a i KZI ZO j l J Nui X 9 V y X37 do j l J Xi XVZ i Xen j l J u XV X d j l 1 J 32 uVj ied ZV Z d0 j l J tou XmyV yj Xjy de j l J oe ZV w XV yj X jy Vi Z do j l J w XV X d A40 j l Therefore L yee DX X y Vj yj Xin A 41 002002 and using A 22 A 23 and A 24 aL J 9o O O gt Z Les vi Xii XV va P A 42 aL 14 TaD 32207 Ze J M ZV Xii Xi Vj Z A 43 j l From A 31 we have OL J 1 J 47 1 Foy 32V Zu ZV ws X0 ZV Xi j l j l 61 Thus OL I aed d OO x1 Zia lu Dza 1 y Xi ZV tvi X X Ziv i X Zi aV i X n avi 1x dy ZV vj Xjy 2 IV ij X5 ZV X549 av do V Zi d0 Z V AZ E de Vj Z O ZI lj X55 x Zvr yj Xj J Zv w Xm l x UV Vi de Vi Z 40 Z V yj X50
100. umber of parameters to be estimated is not too large Changing the convergence criteria see below can make it necessary to raise the maximum number of iterations The value must be an integer between 1 and 32767 215 1 3 6 3 fconvergence optional The substatement fconvergence refers to function convergence After each iteration the new function value is compared to the previous function value The obtained difference is compared to a fconvergence related value If MUZE X fconvergence TE IE TT convergence is said to have been reached In this formula F is the function value after the ith iteration The first part of the formula represents the ratio between the difference of two successive function values and the mean of these values The default value of fconvergence is 0 001 and permitted values range from 1 0 to 1 0E 16 3 6 4 pconvergence optional The substatement pconvergence refers to parameter convergence After each iteration the parameter vector is compared with its predecessor by computing a vector of differences Using v as the norm of this vector after the ith iteration convergence is said to have been reached when Vi lt pconvergence The default value of pconvergence is 0 001 and permitted values range from 1 0 to 1 0E 16 The use of this substatement has no influence on the estimation process while simulating because the loss of speed resulting from its use 3 6 5 outliers optional With this sub
101. uses the estimated parameters as true values of the parameters of a multivariate normal distribution from which new outcome variables are drawn This method is implemented in the ML3 program as well as far as we know this is the only multilevel analysis program so far that has some form of simulation option built in It is called the parametric bootstrap 2 A bootstrap method that uses the observed values of outcome and predictor vari ables for resampling Thus whole cases are resampled Therefore we call it cases bootstrap 3 A bootstrap method that uses estimates of the error terms at both levels for re sampling In contrast with the cases bootstrap this method leaves the regression design unaffected We call it error bootstrap Because the error terms at both levels must be estimated in order to be resampled we need the estimates for the sepa rate Level 1 models random Level 1 coefficients As was explained earlier there are two choices for these coefficients OLS estimates for each group separately or shrinkage estimates based on the whole sample These two choices account for two additional options that can be used when applying the error bootstrap 4 The jackknife With this method one entire case is deleted for each resample There are as many resamples as there are cases Depending on the type of simulation used in MLA and depending on the nature of the data the user can decide to resample both levels in the data or o
102. velopment of multilevel software and to applications in other domains than educational research At the same time however several relevant questions of a statistical nature concerning this development are still not answered fully satisfactorily One major problem is that estimates of parameters and standard errors as well as hypothesis tests based on them rely on large sample properties of the estimates Unfortunately little is known about the behavior of the estimates when sample size is small Raudenbush 1988 Goldstein 1987 even suggests optimizing the design of a multilevel study by the use of pilot or simulation studies An additional problem is that it is usually assumed that the error terms are normally distributed In practice this assumption will often be violated which has other undesirable consequences for using standard error estimates for hypothesis testing and construction of confidence intervals Fortunately there is an increasing number of simulation studies available which give insight into the quality of estimates of parameters and standard errors under various conditions Busing 1993 Van Der Leeden amp Busing 1994 Kreft 1994 Concerning empirical data sets however we think that extensive simulation options in particular options for bootstrapping would be a very useful addition to a program for multilevel analysis Therefore four different simulation methods are implemented in MLA 1 A bootstrap method that

Software for MultiLevel Analysis of Data with Two Levels User's

Contents

Download Pdf Manuals

Related Search

Related Contents