Home

ASReml User Guide - VSN International

1. 0 0020 0000000002 De 42 4 2 The data file lt lt 24 4 2 Des Sate ee a a 42 Free format data files 000 20 0 000000 0004 42 Contents vii Fixed format data files 0 2 204 44 Preparing data files in Excel o oo aa 44 Binary format data files 2 222 2000 44 5 Command file Reading the data 45 5 1 Introduction 0 000000 ee 46 5 2 Important rules 2542 4 84 4 8484 295 oh 4 2 R45 bo Gas 46 5 3 WEG WINGS ya a Beg shoe ee Ge ee Se PE ee a ee E 47 5 4 Specifying and reading the data ooa aaa a a 47 Data field definition syntax ooo 48 Storage of alphabetic factor labels o oo aa a a 49 Reordering the factor levels ooa a aa a 50 Skipping input fields oo aa 0200220004 50 5 5 Transforming the data ooa a aa a 50 Transformation syntax o oo a a 52 QTL marker transformations ooo a 56 Other rules and examples 2 2 ee ee 57 Special note on covariates 0 0 0000004 58 5 6 Datafileline 2 20 2 ee 59 Data line syntax gss som oa 2 i ma a a a e a a 59 5 7 Data file qualifiers oaa a 60 Contents viii Combining rows from separate files 63 5 8 Job control qualifiers 2 a ee ee 63 Command file Specifying the terms in the mixed model 82 6 1 Introduction s sors aes a a 00 000 00 ee 83 6 2 Specifying model formulae in ASReml 83 General nil s 2 24 2444 6 4 4 P4 4b 84 bb boa
2. dis the number of variance models and hence direct product matrices involved in the G structure the following lines define the d variance models order is either the number of levels in the term or the name of a factor that has the same number of levels as the component key is usually zero but for power models EXP GAU provides the distance data needed to construct the model model is the ASReml variance model identifier acronym selected for the term variance models are listed in Table 7 3 these models have associated variance parameters initial_values are initial or starting values for the variance parameters the values for initial values are as described above for R structure definition lines qualifier tells ASReml to modify the variance model in some way the qualifiers are described in Table 7 4 7 5 Variance model description Table 7 3 presents the full range of variance models that is correlation homo geneous variance and heterogeneous variance models available in ASReml The table contains the model identifier a brief description its algebraic form and the number of parameters The first section defines BASE correlation models and New New 7 Command file Specifying variance structures 121 in the next section we show how to extend them to form variance models The second section defines some models parameterized as variance covariance matri ces rather than as correlation ma
3. 13 Description of output files Introduction An example Key output files The The The asr file sln file yht file Other ASReml output files The The The The The The The The The aov file res file vrb file vvp file rsv file dpr file pvc file pvs file tab file ASReml output objects and where to find them 188 13 Description of output files 189 13 1 Introduction With each ASReml run a number of output files are produced ASReml generates the output files by appending various filename extensions to basename A brief description of the filename extensions is presented in Table 13 1 Table 13 1 Summary of ASReml output files file description Key output files asr contains a summary of the data and analysis results pvc contains the report produced with the P option pvs contains predictions formed by the predict directive res contains information from using the pol spl and fac functions the iteration sequence for the variance components and some statistics derived from the residuals rsv contains the final parameter values for reading back if the CONTINUE qualifier is invoked see Table 5 4 sln contains the estimates of the fixed and random effects and their corresponding standard errors tab contains tables formed by the tabulate directive yht contains the predicted values residuals and diagonal elements of the hat matrix for each data p
4. 228 14 2 List of warning messages and likely meaning s 229 14 3 Alphabetical list of error messages and probable cause s remedies 231 15 1 A split plot field trial of oat varieties and nitrogen application 242 15 2 Rat data AOV decomposition oa a a a a 247 15 3 REML log likelihood ratio for the variance components in the volt age data 2 5 bi we bee eda Pt he ew dh hee bo ee we 252 15 4 Summary of variance models fitted to the plant data 254 List of Tables xvii 15 5 Summary of Wald test for fixed effects for variance models fitted to the plant data es 425 8 e aoa a e ak eA Ewe BE se es 260 15 6 Field layout of Slate Hall Farm experiment 262 15 7 Summary of models for the Slate Hall data 002 267 15 8 Estimated variance components from univariate analyses of blood worm data a Model with homogeneous variance for all terms and b Model with heterogeneous variance for interactions involving tmt 276 15 9 Equivalence of random effects in bivariate and univariate analyses 279 15 10 Estimated variance parameters from bivariate analysis of bloodworm Data ede aes cots See ae Ao ume ney ee ae ater ats GR ee 280 15 11 Orange data AOV decomposition 288 15 12 Sequence of models fitted to the Orange data 289 15 13 REML estimates of a subset of the variance parameters for each trait for the genetic example expressed as a ratio to their as
5. of J 021 units x ofl ozp J 1 CORU It follows that 2 2 2 og oi o a 15 4 P o2 02 1 2 Portions of the two outputs are given below The REML log likelihoods for the two models are the same and it is easy to verify that the REML estimates of the variance parameters satisfy 15 4 viz 02 286 310 159 858 126 528 286 386 159 858 286 386 0 558191 r units ga LogL 204 593 S2 224 61 60 df 0 1000 1 000 LogL 201 233 S2 186 52 60 df 0 2339 1 000 LogL 198 453 S2 155 09 60 df 0 4870 1 000 LogL 197 041 S52 133 85 60 df 0 9339 1 000 LogL 196 881 S2 127 56 60 df 1 204 1 000 LogL 196 877 S2 126 53 60 df 1 261 1 000 Final parameter values 1 2634 1 0000 Source Model terms Gamma Component Comp SE C units 14 14 1 26342 159 858 2 11 OP Variance 70 60 1 00000 126 528 4 90 OP CORU H LogL 196 975 S2 264 10 60 df 1 000 0 5000 LogL 196 924 S2 270 14 60 df 1 000 0 5178 LogL 196 886 S2 278 58 60 df 1 000 0 5400 LogL 196 877 S2 286 23 60 df 1 000 0 5580 LogL 196 877 S2 286 31 60 df 1 000 0 5582 Final parameter values 1 0000 0 55819 Source Model terms Gamma Component Comp SE C Variance TO 60 1 00000 286 310 3 65 0 P Residual CORRelat 5 0 558191 0 558191 4 28 o U A more realistic model for repeated measures data would allow the correlations to decrease as the lag increases such as occurs with the first order autoregressive model However since the heights are not measured at equally spaced
6. e regular or irregular spatial data The engine of ASReml forms the basis of the REML procedure in GENSTAT and the asrem1 class of S language functions Butler et al 2007 available for S Plus ASReml S and R ASRemI R Both of these have good data manipulation and graphical facilities and will be adequate for many analyses some large problems will need to use ASReml The ASReml user interface is terse Most effort has been directed towards efficiency of the engine It normally operates in a batch mode Problem size depends on the sparsity of the mixed model equations and the size of your computer However models with 500 000 effects have been fitted suc cessfully The computational efficiency of ASReml arises from using the Average Information REML procedure giving quadratic convergence and sparse matrix operations ASReml has been operational since March 1996 and is updated peri odically Installation Installation instructions are distributed with the program If you require help with installation please email support asreml co uk These are available from VSN http www vsni co uk 1 Introduction 3 1 3 User Interface New ASReml is essentially a batch program with some optional interactive features The typical sequence of operations when using ASReml is e Prepare the data typically using a spreadsheet or data base program e Export that data as an ASCII file for example export it as a csv comma
7. 200000005 17 Tests of hypotheses variance parameters 17 DIAGNOSTICS 2084 04 oe eee oS eee meee ee ee eS 18 2 6 Inference Fixed effects 0 2 02000 00 00 19 Introduction s 4 sae hw ae aoe eS 4d ee eR ee 19 Incremental and Conditional Wald Statistics 20 Kenward and Roger Adjustments 24 Approximate stratum variances 2 0004 24 3 A guided tour 26 3 1 Introduction s soro aea 0 000000 a E a e ae a E A 27 3 2 Nebraska Intrastate Nursery NIN field experiment 27 Contents vi 3 3 The ASReml datafile 0 20020020058 28 3 4 The ASReml command file o aaa 30 The title line 2 2 2 000000000020 00000028 31 Reading the data 20000020 eee ee 31 The data file line sis ei toe opaca dee p ia ig eE a o a 32 Tabulation sotoetan 2 O44 a a a dee a a aie eup ae Ge 32 Specifying the terms in the mixed model 32 Prediction 3 2 ec a dock 2 Be Re ee Goths al dink a ae 33 Variance structures ooo 33 3 5 Running the job 00000002 eee 33 Forming a job template 0 a 34 3 6 Description of output files 0 0 00008 35 Whe wasr tile s oora Ge ooo Be wed ae ie ow ee 35 Whe is date 2er es p ay ae a eee Bg eg wk a et 37 The yht file 2 2 0 ee 38 3 7 Tabulation predicted values and functions of the variance components 38 4 Data file preparation 41 4 1 Introduction
8. sire 3 68 3 57 3 95 1 92 1 92 dam 6 25 4 93 2 78 0 37 0 05 litter 8 79 0 99 2 23 1 91 0 00 age grp 2 29 1 39 0 31 1 15 1 74 sex grp 2 90 3 43 3 70 1 83 Table 15 14 Wald tests of the fixed effects for each trait for the genetic example term wwt ywt gfw fdm fat age 331 3 67 1 52 4 26 7 5 brr 554 6 734 149 0 3 13 9 sex 196 1 123 3 02 2 9 0 6 age sex 10 3 1 7 19 5 0 age wwt IMO ywt MO MO recodes zeros as missing values gfw IMO fdm MO fat IMO coop fmt wwt mu age brr sex age sex r sire dam lit age grp sex grp f grp Tables 15 13 and 15 14 present the summary of these analyses Fibre diameter was measured on only 2 female lambs and so interactions with sex were not fitted The dam variance component was quite small for both fibre diameter and fat The REML estimate of the variance component associated with litters was effectively zero for fat Thus in the multivariate analysis we consider fitting the following models to the sire dam and litter effects var us Ys Io2 15 Examples 295 var ua Yq I3561 var uj Lago1 where 2 53 and 5 are positive definite symmetric matrices correspond ing to the between traits variance matrices for sires dams and litters respectively The variance matrix for dams does not involve fibre diameter and fat depth while the variance matrix for litters does not involve fat depth The effects in each of the above vectors are o
9. 0000 Gamma 37 2202 23 3935 41 5195 51 6524 61 9169 259 121 70 8113 550 8 533 0 541 4 60 df 60 df 60 df 60 df 60 df Component Comp SE BY 2262 2 45 23 3935 irr 41 5195 2 45 51 6524 1 61 61 9169 1 78 259 121 2 45 70 8113 1 54 oO 2 6 Oo CO Oo 6a el ee es A 4 C 15 Examples 260 Residual US UnStr 2 57 6146 57 6146 Residual US UnStr 3 2331 807 831 807 Residual US UnStr 4 551 507 551 507 Residual US UnStr 1 73 1857 13 7857 Residual US UnStr 2 62 5691 62 5691 Residual US UnStr 3 330 851 330 851 Residual US UnStr 4 533 756 533766 Residual US UnStr 5 542 175 542 175 Covariance Variance Correlation Matrix US UnStructu 37 23 0 5950 0 5259 0 4942 0 5194 23 39 41 52 0 5969 0 3807 0 4170 51 65 61 92 259 1 Q STITT 0 8827 70 81 57 61 331 3 551 5 0 9761 TTO 62 57 330 9 533 8 542 2 23 29 45 60 33 29 42 45 NNNFREPFNN amp OC OS O oo ao 4 aa a eo The antedependence model of order 1 is clearly more parsimonious than the unstructured model Table 15 5 presents the incremental Wald tests for each of the variance models There is a surprising level of discrepancy between models for the Wald tests The main effect of treatment is significant for the uniform power and antedependence models Table 15 5 Summary of Wald test for fixed effects for variance models fitted to the plant data treatment treatment time model di 1 df 4 Uniform 9 41 5 10 Power 6 86 6 13 Heterogeneou
10. An extract of the ASReml input file is circ mu age r Tree 4 6 Tree age 000094 spl age 7 1 spl age 7 Tree 2 3 fac age 13 9 oO 2 Tree 2 2 0 US 4 6 00001 000094 50 0 predict age Tree IGNORE fac age We stress the importance of model building in these settings where we generally commence with relatively simple variance models and update to more complex variance models if appropriate Table 15 12 presents the sequence of fitted mod els we have used Note that the REML log likelihoods for models 1 and 2 are comparable and likewise for models 3 to 6 The REML log likelihoods are not comparable between these groups due to the inclusion of the fixed season term in the second set of models We begin by modelling the variance matrix for the intercept and slope for each tree X as a diagonal matrix as there is no point including a covariance com ponent between the intercept and slope if the variance component s for one or both is zero Model 1 also does not include a non smooth component at the overall level that is fac age Abbreviated output is shown below 15 Examples 289 Table 15 12 Sequence of models fitted to the Orange data model term 1 2 3 4 5 6 tree y y y y y y age tree y y y y y y covariance n n n n n y spl age 7 y y y y n y tree spl age 7 y y y n y y fac age n y y n n n season n n y y y y REML log likelihood 97 78 94 07 87 95 91 22 90 18 87 43 12 LogL 97 7788 52 6 3550 33 df So
11. l Z Rty i These can be written as CB WR y where C W R W G B r uJ and x 10 0 G ogc The solution of 2 11 requires values for y and In practice we replace y and by their REML estimates and Note that 7 is the best linear unbiased estimator BLUE of 7 while amp is the best linear unbiased predictor BLUP of u for known y and We also note that a o iz e 2 Some theory 15 2 3 What are BLUPs Consider a balanced one way classification For data records ordered r repeats within b treatments regarded as random effects the linear mixed model is y XT Zu e where X 1 1 is the design matrix for T the overall mean Z I 1 is the design matrix for the b random treatment effects u and e is the error vector Assuming that the treatment effects are random implies that u N Aw o7Iy for some design matrix A and parameter vector w It can be shown that 2 _ y 1y Av 2 12 uU ro 0 ro 0 where y is the vector of treatment means is the grand mean The differences of the treatment means and the grand mean are the estimates of treatment effects if treatment effects are fixed The BLUP is therefore a weighted mean of the data based estimate and the prior mean Aw If Y 0 the BLUP in 2 12 becomes ro 2 _ g 19 2 13 u TET y 2 13 and the BLUP is a so called shrinkage estimate As ra becomes large relative to o
12. yield mu variety repl yield mu variety lIr repl yield mu variety lIr repl 001 repl i 40 IDV 0 1 yield mu variety If mv 120 11 column ID 22 row AR1 0 3 yield mu variety If mv 120 11 column AR1 0 3 22 row AR1 0 3 yield mu variety r units f mv 120 11 column AR1 0 3 22 row AR1 0 3 yield mu variety lIr repl f mv 121 11 column AR1 0 3 22 row AR1 0 3 repl i 40 IDV 0 1 yield mu variety Ir column row 001 column row 2 column ARI 3 row 0 AR1V 0 3 0 1 repl repl units repl column row IDV IDV IDV IDV AR1 AR1V error error error column column column column error row row row row ID ID ID ID AR1 AR1 AR1 ID AR1 AR1 AR1 AR1 7 Command file Specifying variance structures 115 7 4 Variance structures The previous sections have introduced variance modelling in ASReml using the NIN data for demonstration In this and the remaining sections the syntax is described formally However where appropriate we continue to reference the example General syntax Variance model specification in ASReml has the following general form variance header line R structure definition lines G structure header and definition lines variance parameter constraints e variance header line specifies the number of R and G structures e R structure definition lines define the R
13. F phenvar 1 2 pheno var pin file beginning with an H The specific F genvar 1 4 geno var a H herit 4 3 heritabilit form of the directive in this case is 7 H label nd This calculates 02 07 and se o o2 where n and d are integers pointing to com ponents v and vg that are to be used as the numerator and denominator respec tively in the heritability calculation 4 2 2 o on7 o2 o2 o2 V o2 C o2 02 GCE a a 2 d In the example H herit 4 3 calculates the heritability by calculating component 4 from second line of pin 11 Functions of variance components 173 component 3 from first line of pin that is genetic variance phenotypic variance Correlation Correlations are requested by lines in the pin F phenvar 1 3 4 6 file beginning with an R The specific form of R Phencorr 7 8 9 i ive i R 4 6 the directive is gencorr R label a ab b This calculates the correlation r oq o207 and the associated standard error a b and ab are integers indicating the position of the components to be used Alternatively R label a n calculates the correlation r Cab 0207 for all correlations in the lower tri angular row wise matrix represented by components a to n and the associated standard errors 2 2 var o var o var a var r r oa oi ab 4o4 4o Tab 2cov o2 o 2cov o2 cab 2cov ab a 4020 2020ab 20ab0 In the example
14. H herit 4 3 heritability nents are either simple variances or are vari ances and covariances in an unstructured matrix The functions covered are linear combinations of the variance components for example phenotypic variance a ratio of two components for example heritabilities and the correlation based on three components for example genetic correlation The user must prepare a pin file A simple sample pin file is shown in the ASReml code box above The pin file specifies the functions to be calculated The user re runs ASReml with the P command line option specifying the pin file as the input file AS Reml reads the model information from the asr and vvp files and calculates the requested functions These are reported in the pve file 11 2 Syntax Functions of the variance components are specified in the pin file in lines of the form letter label coefficients e letter either F H or R must occur in column 1 F is for linear combinations of variance components His for forming the ratio of two components Ris for forming the correlation based on three components e label names the result e coefficients is the list of coefficients for the linear function Linear combinations of components First ASReml extracts the variance compo F phenvar 1 2 pheno var nents from the asr file and their variance F genvar 1 4 geno var matrix from the vvp file Each linear func H ber
15. Warning term in the predict USE list Warning term is ignored for prediction Warning Check if you need the RECODE qualifier Warning Code B fixed at a boundary GP Warning Eigen analysis check of US matrix skipped WARNING Extra lines on the end of the input file Warning Fewer levels found in term Warning FIELD DEFINITION lines should be INDENTED data values should be positive usually means variance model is overparame terized the structures are probably at the boundary of the parameter space either use MVINCLUDE or delete the records it is better to avoid negative weights unless you can check ASReml is doing the correct thing with them check you have the intended number of fields per line You have probably mis specified the number of levels in the factor or omitted the I qual ifier see Section 5 4 on data field definition syntax ASReml corrects the number of lev els the term did not appear in the model the term did not appear in the model terms like units and mv cannot be included in prediction RECODE may be needed if the binary file was not prepared with ASReml suggest drop the term and refit the model matrix is probably OK this indicates that there are some lines on the end of the as file that were not used The first extra line is displayed This is only a problem if you intended ASReml to read these lines ASReml increases to the correct
16. as given in the following table lt pass gt is inserted when the job is repeated RENAME or CYCLE to ensure filenames are unique across repeats lt section gt is inserted to distinquish files produced from different sections of data for example from multisite spatial analysis and lt ezt gt indicates the file graphics format lt type gt file contents R marginal means of residuals from spatial analysis of a section V variogram of residuals from spatial analysis for a section Ss residuals in field plan for a section H histogram of residuals for a section RvE residuals plotted against expected values XYGi figure produced by X Y and G qualifiers PV_i Predicted values plotted for PREDICT directive 7 The graphics file format is specified by following the G or H option by a number g or specifying the appropriate qualifier on the top job control line as follows g qualifier description lt ext gt 1 HPGL HP GL pgl IPS Postscript default ps 6 BMP BMP bmp 10 WPM Windows Print Manager 11 WMF Windows Meta File wmf 12 HPGL 2 HP GL2 hgl 22 EPS EncapsulatedPostScript eps 12 Command file Running the job 183 New New Job control command line options C F O R C CONTINUE indicates that the job is to continue iterating from the values in the rsv file This is equivalent to setting CONTINUE on the datafile line see Table 5 4 for details F FINAL indicates that the job is to co
17. asr 35 189 190 dbr 189 dpr 189 199 pvc 189 pvs 189 199 res 189 200 rsv 189 205 sln 37 189 193 spr 189 tab 189 207 veo 189 vrb 207 vvp 189 208 yht 38 189 195 overspecified 16 own models 132 OWN variance structure 131 IF2 132 IT 132 parameter scale 7 variance 7 Path DOPATH 187 PATH 187 PC environment 177 pedigree 149 file 150 power 129 Predict ITP 164 TURNINGPOINTS 164 TP 95 PLOT suboptions 165 PRWTS 167 predicted values 38 prediction 33 157 qualifiers 162 predictions estimable 39 prior mean 15 product direct 9 qualifier UpArrow 53 lt 53 lt 53 lt gt 53 Index 316 I 53 I gt 53 I gt 53 Ix 53 1 53 1 53 1 7 53 s 133 I 53 ABS 53 ADJUST 73 ATLOADINGS 79 ATSINGULARITIES 72 ALPHA 152 AOD Analysis of Deviance 98 ARCSIN 53 ARGS 180 ASK 180 ASMV 67 ASUV 67 1A 48 BINOMIAL distribution 98 BLUP 72 BMP 72 IBRIEF 72 180 ICINV 79 COLFAC 67 COMPLOGLOG 98 CONTINUE 64 139 180 CONTRAST 64 COS 53 ICSV 60 ICYCLE 186 DATAFILE 73 IDDF 68 DEBUG 180 IDEC 164 DENSE 73 DEVIANCE residuals 99 IDF 73 IDIAG 152 DISPLAY 68 IDISP dispersion 98 DOM dominance 57 DOPART 187 DOPATH 187 ID 53 IEMFLAG 74 IEPS 68 EXP 54 EXTRA 75 FACPOINTS 80 FCON 65 FILTER 60 FINAL 1
18. ide f inv v r le f leg v n 1t f log v r mai f mai out n out n t adds r times the design matrix for model term to the previous design matrix r has a default value of 1 If t is complex if may be necessary to predefine it by saying t and t r factor f is fitted with sum to zero constraints forms cosine from v with period r condition on factor variable f gt r associates the nth giv G inverse with the factor f condition on factor variable f gt r factor fis fitted Helmert constraints fits pedigree factor f without rela tionship matrix forms reciprocal of v r condition on factor variable f lt r forms n 1 Legendre polynomials of order 0 intercept 1 linear n from the values in v the intercept polynomial is omitted if v is pre ceded by the negative sign condition on factor variable f lt r forms natural logarithm of v r constructs MA1 design matrix for factor f forms an MA1 design matrix from plot numbers condition on observation n condition on record n trait t Ss sy Sa Sas Ss lt lt 6 Command file Specifying the terms in the mixed model 87 Summary of reserved words operators and functions model term brief description common usage fixed random pol v n pow z p o qtl f p sin v r sqrt v 7r uni f uni f n xfa f k forms n 1 orthogonal polynomials of order 0 intercept
19. matrix or array of plots must be present To achieve this we augmented the data with the 18 records for the missing yields as shown on page 30 In the 7 Command file Specifying variance structures 111 See Chapter 13 See Sections 6 3 and 6 11 See Sections 2 1 and 7 5 See Section 7 7 augmented data file the yield data for the missing plots have all been made NA one of the missing value indicators in ASReml and variety has been arbitrarily coded LANCER for all of the missing plots any of the variety names could have been used f mv is now included in the model specification This tells ASReml to estimate the missing values The f before mv indicates that the missing values are fixed effects in the sparse set of terms e unlike the case with G structures ASReml automatically includes and esti mates a scale parameter for R structures o2 for V 02 I p in this case This is why the variance models specified for row AR1 and column ID are correlation models The user could specify a non correlation model diagonal elements 1 in the R structure definition for example ID could be replaced by IDV to represent V 02 02J amp p However IDV would then need to be followed by S2 1 to fix o at 1 and prevent ASReml trying unsuccessfully to estimate both parameters as they are confounded the scale parameter associated with IDV and the implicit error variance parameter see Section 2 1 under Combi
20. positive definite C Constrained by user VCC U unbounded S Singular Information matrix The convergence criteria has been satisfied after six iterations A warning message in printed below the summary of the variance components because the variance component for the setstat teststat term has been fixed near the boundary The default constraint for variance components GP is to ensure that the REML estimate remains positive Under this constraint if an update for any variance component results in a negative value then ASReml sets that variance component to a small positive value If this occurs in subsequent iterations the parameter is fixed to a small positive value and the code B replaces P in the C column of the summary table The default constraint can be overridden using the GU qualifier but it is not generally recommended for standard analyses Figure 15 2 presents the residual plot which indicates two unusual data values These values are successive observations namely observation 210 and 211 being testing stations 2 and 3 for setting station 9 J regulator 2 These observa tions will not be dropped from the following analyses for consistency with other analyses conducted by Cox and Snell 1981 and in the GENSTAT manual The REML log likelihood from the model without the setstat teststat term was 203 242 the same as the REML log likelihood for the previous model Ta ble 15 3 presents a summary of the REML log likeliho
21. statistics As this is a small example denom ly blocks blocks wplote inator degrees of freedom are reported by de fault An extract from the asr file is followed by the contents of the aov file The FCON qualifier requests conditional F Degrees of Freedom and Stratum Variances 5 00 3175 06 12 0 4 0 1 0 10 00 601 331 0 0 4 0 1 0 45 00 177 083 0 0 0 0 120 Source Model terms Gamma Component Comp SE C blocks 6 6 1 21116 214 477 Iar OP blocks wplots 18 18 0 598937 106 062 1 56 OP Variance 72 60 1 00000 177 083 4 74 OP Analysis of Variance NumDF DenDF_con F_inc F_con M P_con 8 mu 1 6 0 245 14 138 14 lt 001 4 variety 2 10 0 1 49 1 49 A 0 272 7 linNitr 1 45 0 110 32 110 32 a lt 001 2 nitrogen 2 45 0 1 37 1 37 A 0 265 9 variety linNitr 2 45 0 0 48 0 48 b 0 625 10 variety nitrogen 4 45 0 0 22 0 22 B 0 928 The analysis shows that there is a significant linear response to nitrogen level but the lack of fit term and the interactions with variety are not significant In this example the conditional F statistic is the same as the incremental one because the contrast must appear before the lack of fit and the main effect before the interaction and otherwise it is a balanced analysis The first part of the aov file the FMAP table only appears if the job is run in DEBUG mode There is a line for each model term showing the number 13 Description of output files 198 of non singular effects in the terms before the cur
22. ww FAk k factor C FF E kw w analytic F contains k correlation factors E diagonal DD diag X FACV 1 1k order 5X IT Y wtw FACVk k factor T contains covariance factors kw w analytic W contains specific variance covariance form XFA 1 i k order S T Y z wtw XFAk k extended T contains covariance factors kw w factor W contains specific variance analytic covariance form Inverse relationship matricest AINV inverse relationship matrix derived from pedigree 0 1 GIV1 generalized inverse number 1 0 1 GIV6 generalized inverse number 6 0 1 t This is the number of values the user must supply as initial values where w is the dimension of the matrix The homogeneous variance form is specified by appending V to the correlation basename the heterogeneous variance form is specified by appending H to the correlation basename t These must be associated with 1 variance parameter unless used in direct product with another structure which provides the variance 7 Command file Specifying variance structures 126 Forming variance models from correlation models The base identifiers presented in the first part of Table 7 3 are used to specify the correlation models The corresponding homogeneous and heterogeneous variance models are specified by appending V and H to the base identifiers respectively This convention holds for most models However no V or H should be appended to the base identifiers for the heter
23. 1 n The quadratic form is said to be nonnegative definite if x Ax gt 0 for all x R If x Ax is nonnegative definite and in addition the null vector 0 is the only value of x for which x Ax 0 then the quadratic form is said to be positive definite Hence the matrix A is said to be positive definite if x Ax is positive definite see Harville 1997 pp 211 7 Command file Specifying variance structures 107 7 2 Variance model specification in ASReml The variance models are specified in the AS NIN Alliance Trial 1989 Reml command file after the model line as variety A shown in the code box In this case just one variance model is specified for replicates see column 11 nin89 asd skip 1 r ield mu variety r repl tabulate lines may appear after the model 01 j model 2b below for details predict and line and before the first variance structure repl 1 line These are described in Chapter 10 repl 0 IDV 0 1 Table 7 3 presents the full range of variance models available in ASReml The identifiers for specifying the individual variance models in the command file are described in Section 7 5 under Specifying variance models in ASReml Many of the models are correlation models However these are generalized to homogeneous variance models by appending V to the base identifier They are generalized to heterogeneous variance models by appending H to the base identifier 7 3 A sequ
24. 1 0 an 02 C 0 7 gt 1 2 0 T 0 lt 1 0 lt 1 lt 1 ARMA autoregressive Ce 2 3 2 w moving average Ci 0 4 06 1 6 200 Ch Ciiis j gt i 1 lA lt 1 J lt 1 CORU uniform C 1 C 6 145 1 2 l w correlation CORB banded C 1 w 1l w 2w 1 correlation Caja 1 lt j lt w 1 l lt 1 CORG general Cp 1 wwe ww 44 le correlation C 6 445 w CORGH US ies lig lt 1 One dimensional unequally spaced EXP exponential C 1 1 2 1l w Cy pl a Ej xi are coordinates 0 lt lt 1 New New New 7 Command file Specifying variance structures 123 Details of the variance models available in ASReml base description algebraic number of parameters identifier form corr homo s hetero s variance variance GAU gaussian C 1 1 2 1l w Cy 9 tj xi are coordinates 0 lt lt 1 Two dimensional irregularly spaced x and y vectors of coordinates bij min di 1 1 dij is euclidean distance IEXP isotropic C 1 1 2 1l w exponential C lTi Tiltu i j 0 lt lt 1 IGAU isotropic C 1 1 2 1l w gaussian C oiT Hiu ixj ij 0 lt lt l TEUC isotropic C 1 1 2 1l w euclidean zi gt 24 y ix j 0 lt lt 1 LVR linear variance C 1 63 1 2 l w 0 lt d1 SPH spherical C 1 30i 67 1 2 1l w 0 lt CIR circular Web C 1 1 2 1l w ne a Oliver 2 Oi 1 02 sin 6 oot p 113 0 lt AEXP anisotropic ex Cral 2 3 2 w ponential C
25. 1 linear n from the values in v the intercept polynomial is omitted if n is pre ceded by the negative sign defines the covariable x o for use in the model where zv is a vari able in the data p is a power and o is an offset impute a covariable from marker map information at position p forms sine from v with period r forms square root of v r forms a factor with a level for each record where factor f is non zero forms a factor with a level for each record where factor f has level n is formally a copy of factor f with k extra levels This is used when fit ting extended factor analytic mod els XFA Table 7 3 of order k J S S 6 Command file Specifying the terms in the mixed model 88 Examples ASReml code action yield mu variety yield mu variety r block yield mu time variety time variety livewt mu breed sex breed sex r sire fits a model with a constant and fixed variety effects fits a model with a constant term fixed variety effects and random block effects fits a saturated model with fixed time and variety main effects and time by va riety interaction effects fits a model with fixed breed sex and breed by sex interaction effects and ran dom sire effects 6 3 Fixed terms in the model Primary fixed terms The fixed list in the model formula e describes the fixed covariates factors and interactions including special functions to be i
26. 77 SLOW 81 I SMX 77 I SORT 153 I SPATIAL 77 ISPLINE 71 I SQRT link 98 Index 318 ISTEP 71 SUBSET 71 SUB 55 SUM 66 TABFORM 77 TOLERANCE 81 TOTAL 98 99 TWOSTAGEWEIGHTS 164 TWOWAY 78 TXTFORM 77 UNIFORM 55 VCC 78 VGSECTORS 78 VPV 165 VRB 81 IV 56 IWMF 78 WORKSPACE 180 WORK residuals 99 1X 66 YHTFORM 78 LYSS 73 78 YVAR 180 1Y 66 ITDIFF 164 qualifiers datafile line 60 genetic 149 job control 63 variance model 133 R structure 106 definition 118 definition lines 115 random effects 7 correlated 15 regressions model 11 terms multivariate 143 random regressions 136 random terms 83 89 RCB 31 analysis 107 design 28 reading the data 31 47 REML i 2 11 17 REMLRT 17 repeated measures 2 253 reserved terms 85 Trait 85 95 a t r 91 and t r 86 91 at 92 at f n 85 92 cos v 7 86 92 fac v y 85 92 fac v 85 92 e f n 92 giv f n 86 92 h 93 i f 93 ide f 86 93 inv v 7 86 93 1 f 93 leg v n 86 93 lin f 85 93 log v r 86 93 mai f 86 93 mai 86 93 mu 85 93 mv 85 94 out 94 p v n 94 pol v n 87 94 pow z p o 95 qt1 95 s v k 95 sin v 7 87 95 spl v k 85 95 Index 319 sqrt v r 87 95 uni f k 96 uni f n 87 uni f 87 units 85 95 xfa f k 87 96 reserved words AEXP 123 AGAU 12
27. F_inc 98 95 116 72 59 78 4 90 0 103 0 318 0 0027 0 0327 0 9974 0 0633 0 0039 0 010 0 444 0 0087 0 0700 0 3098 0 2612 0 9115 for at Tr 1 dam res file is reported an eigen analysis of these four variance structures 15 Examples 300 Eigen values 4 382 0 010 0 025 Percentage 100 352 0 225 0 577 1 0 7041 0 2321 0 6711 2 0 7081 0 1585 0 6881 3 0 0533 0 9597 0 2760 Eigen Analysis of UnStructured matrix for at Tr 1 1lit Eigen values 4 795 1 827 0 482 0 016 Percentage 67 345 25 664 6 769 0 221 1 0 7752 0 5928 0 2178 0 0133 2 0 6159 0 6328 0 4691 0 0106 3 0 0016 0 0340 0 0255 0 9991 4 0 1403 0 4969 0 8555 0 0390 The REML estimates of all the variance matrices except for the dam components are positive definite Heritabilities for each trait can be calculated using the pin file facility of ASReml The heritability is given by _ oA 2 h 2 P where o2 is the phenotypic variance and is given by 2 2 2 2 2 op 0 04 0 0 recalling that i L O TOA 4 2 _ 12 2 Og eA Om In the half sib analysis we only use the estimate of additive genetic variance from the sire variance component The ASReml pin file is presented below along with the output from the following command asreml p mt3 24 29 39 44 45 50 30 33 51 54 34 38 4 24 29 55 60 61 64 65 69 70 84 85 90 phenWYG 9 14 defines phenD 15 18 phenF 19 23 Direct 24 38 Maternal 39 44 WWTh2 70 5
28. Factor analytic models are discussed in Chapter 7 There are three forms FAk FACVk and XFAk where kis the number of factors The XFAk form is a sparse formulation that requires an extra k levels to be inserted into the mixed model equations for the k factors This is achieved by the xfa f k model function which defines a design matrix based on the design matrix for f augmented with k columns of zeros for the k factors 6 7 Weights caution 6 8 Weighted analyses are achieved by using WT weight as a qualifier to the response variable An example of this is y WT wt mu A X where y is the name of the response variable and wt is the name of a variate in the data containing weights If these are relative weights to be scaled by the units variance then this is all that is required If they are absolute weights that is the reciprocal of known variances use the S2 1 qualifier described in Table 7 4 to fix the unit variance When a structure is present in the residuals the weights are applied as a matrix product If X is the structure and W is the diagonal matrix constructed from the square root of the values of the variate weight then Rt WD W Negative weights are treated as zeros Generalized Linear Models ASReml includes facilities for fitting the family of Generalized Linear Models GLMs McCullagh and Nelder 1994 GLMs are specified by qualifiers after the name of the dependent variable but before the character Table 6
29. M M Fox K M Warren G N Cullis B R Coombes N E and Lewin L G 1999 An image analysis technique for assessing resistance in rice cultivars to root feeding chironomid midge larvae diptera Chirono midae Field Crops Research 66 25 26 Stroup W W Baenziger P S and Mulitze D K 1994 Removing spatial variation from wheat yield trials a comparison of methods Crop Science 86 62 66 Thompson R Cullis B R Smith A and Gilmour A R 2003 A sparse implementation of the average information algorithm for factor analytic and reduced rank variance models Australian and New Zealand Journal of Statistics 45 445 459 Bibliography 311 Verbyla A P Cullis B R Kenward M G and Welham S J 1999 The analysis of designed experiments and longitudinal data by using smoothing splines with discussion Applied Statistics 48 269 311 Waddington D Welham S J Gilmour A R and Thompson R 1994 Com parisons of some glmm estimators for a simple binomial model Genstat Newsletter 30 13 24 Webster R and Oliver M A 2001 Geostatistics for Environmental Scientists John Wiley and Sons Chichester Welham S J 1993 Genstat 5 Procedure Library manual R W Payne G M Arnold and G W Morgan eds release 2 3 edn Numerical Algorithms Group Oxford Welham S J 2005 Glmm fits a generalized linear mixed model in R Payne and P Lane eds GenStat Reference Ma
30. PRESENT qualifier on the predict line when PRWTS is specified The order of factors in the tables of weights must correspond to the order in the PRESENT list with later factors nested within preceding factors Check the output to ensure that the values in the tables of weights are applied in the correct order ASReml may transpose the table of weights to match the order it needs for processing Consider a rather complicated example from a rotation experiment conducted over several years This particular analysis followed the analysis of the daily live weight gain per hectare of the sheep grazing the plots There were periods when no sheep grazed Different flocks grazed in the different years Daily liveweight 10 Tabulation of the data and prediction from the model 168 gain was assessed between 5 and 8 times in the various years To obtain a measure of total productivity in terms of sheep liveweight we need to weight the daily per sheep figures by the number of sheep grazing days per month To obtain treatment effects for each year the experimenter used predict year 1 crop 1 pasture lime AVE month 56 55 56 53 57 63 6 0 predict year 2 crop 1 pasture lime AVE month 36 0 0 53 23 24 54 54 43 35 0 0 predict year 3 crop 1 pasture lime AVE month 70 0 21 17000 7000 53 0 predict year 4 crop 1 pasture lime AVE month 53 56 22 92 19 44 0 0 36 0 O 49 predict year 5 crop 1 pasture lime AVE month 0 22 0 53 70 22 0 51 16 5100 but then wanted to av
31. Q 0 Q l i 2 1507 2581 3084 1350 0841 7044 0 O 0 2966 0136 2028 1227 1115 2703 5726 0048 6333 5168 5285 1251 wre Oo OC Oo OO OO OO oO OO 2 2 6 0396 0624 0716 OTA 0402 1025 slit 20 1810 3513 3247 3868 2724 lt 2022 2653 sor TO TESI 1561 7985 15 Examples 302 Animal model In this section we will illustrate the use of ped files to define the genetic pedigree structure between animals This is an alternate method of estimating additive genetic variance for these data The variance matrix of the animals sires dams and lambs for which we only have data on lambs is given by var ua 54 8 A where A is the inverse of the genetic relationship matrix There are a total of 10696 92 3561 7043 animals in the pedigree The ASReml input file is presented below Note that this model is not equivalent to the sire dam litter model with respect to the animal litter components for gfw fd and fat Multivariate Animal model tag IP sire dam P grp 49 sex brr 4 litter 4871 age wwt mO ywt mO MO identifies missing values giw mO fdm mO fat mO coop fmt read pedigree from first three fields coop fmt DOPATH 1 CONTINUE MAXIT 20 STEP 0 01 1 allows selection of PATH as a command line argument PATH 3 EXTRA 4 Force 4 more iterations after convergence criterion met PATH wwt ywt gfw fdm fat Trait Tr age Tr brr Tr sex Tr age sex lr Tr tag
32. and B is au B pB B gt a B m1 mp a Direct products in R structures Consider a vector of common errors associated with an experiment The usual least squares assumption and the default in ASReml is that these are indepen dently and identically distributed IID However if e was from a field experiment 2 Some theory 8 See Chapter 8 for further de tails laid out in a rectangular array of r rows by c columns we could arrange the resid uals as a matrix and might consider that they were autocorrelated within rows and columns Writing the residuals as a vector in field order that is by sort ing the residuals rows within columns plots within blocks the variance of the residuals might then be 2 Zelpe D Ur Pr where amp p and pr are correlation matrices for the row model order r auto correlation parameter pp and column model order c autocorrelation parameter Pc respectively More specifically a two dimensional separable autoregressive spatial structure AR1 AR1 is sometimes assumed for the common errors in a field trial analysis see Gogel 1997 and Cullis et al 1998 for examples In this case 1 1 Pr 1 Pe 1 See Pr 1 and Se Pe Pe 1 pr pe pe 1 pe pe ps 1 Alternatively the residuals might relate to a multivariate analysis with n traits and n units and be ordered traits within units In this case an appropriate variance structure might be I9 X where xm i
33. indication that a linear drift from column 1 to column 10 is present We include a linear regression coefficient pol column 1 in the model to account for this Note we use the 1 option in the pol term to exclude the overall constant in the regression as it is already fitted The linear regression of column number on yield is significant t 2 96 The sample variogram Figure 15 7 is more satisfactory though interpretation of variograms is often difficult particularly for unreplicated trials This is an issue for further research The abbreviated output for this model and the final model in which a nugget effect has been included is AR1xAR1 pol column 1 1 LogL 4270 99 S2 0 12730E 06 665 df 2 LogL 4258 95 S2 0 11961E 06 665 df Outer displacement Maer displacement Figure 15 6 Sample variogram of the residuals from the AR1xAR1 model for the Tullibigeal data Outer displacement Aer displacement Figure 15 7 Sample variogram of the residuals from the ARIxXAR1 pol column 1 model for the Tullibigeal data 15 Examples 271 3 LogL 4245 27 S2 0 10545E 06 665 df 4 LogL 4229 50 S2 78387 665 df 5 LogL 4226 02 S2 75375 665 df 6 LogL 4225 64 S2 77373 665 df 7 LogL 4225 60 S2 77710 665 df 8 LogL 4225 60 S2 77786 665 df 9 LogL 4225 60 S2 77806 665 df Source Model terms Gamma Component Comp SE C variety 532 532 1 14370 88986 3 9 91 QF Variance 670 665 1 00000 77806 0 8 79 OP Residu
34. lt factor gt lt levi gt lt facg gt lt levg gt lt fac3 gt lt lev3 gt qualifier EQORDER o New EXTRA n New Difficult OWN f limits the order in which equations are solved in ASReml by forcing equations in the sparse partition involving the first lt lev gt equations of lt factor gt to be solved after all other equations in the sparse partition Is intended for use when there are multiple fixed terms in the sparse equations so that ASReml will be consistent in which effects are identified as singular The test example had Ir Anim Litter f HYS where genetic groups were included in the definition of Anim Consequently there were 5 singularities in Anim The default reordering allows those singularities to appear anywhere in the Anim and HYS terms Since 29 genetic groups were defined in Anim LAST Anim 29 forces the genetic group equations to be absorbed last and therefore incorporate any singularities In the more general model fitting Ir Tr Anim Tr Lit f Tr HYS without LAST the location of singularities will almost surely change if the G structures for Tr Anim or Tr Lit are changed invalidating Likelihood Ratio tests between the models supplies the name of a program supplied by the user in asso ciation with the OWN variance model page 131 5 Command file Reading the data 76 List of rarely used job control qualifiers qualifier action PRINT n PVSFORM n New R
35. oo oaa ee 115 General syntax ooo 115 Variance header line oaoa 2 2 117 R structure definition o oo aaa 00000005 118 G structure header and definition lines 120 7 5 Variance model description oao oo a a 120 Forming variance models from correlation models aoaaa 126 Notes on the variance models oo aa a a a a a a a a 127 Contents x 7 6 Variance structure qualifiers ooo a 2000 133 7 7 Rules for combining variance models oaoa a aaa 134 7 8 G structures involving more than one random term 135 7 9 Constraining variance parameters o oo ooo a 137 Parameter equality within and between variance structures 137 Constraints between and within variance models 138 7 10 Model building using the CONTINUE qualifier 139 Command file Multivariate analysis 141 8 1 Introduction oaa aa 000000 ee 142 Repeated measures on rats 2 2 ee 142 Wether trialdata 0 00 02 0200 2 000 142 8 2 Model specification 0 2 000002 eee 143 8 3 Variance structures 2 aa 144 Specifying multivariate variance structures in ASReml 144 8 4 The output for a multivariate analysis o aa 145 Command file Genetic analysis 148 9 1 Introduction 000000 2 ee 149 9 2 The command file sse e se sark aoee e 2 00000 00 eee 149 9 3 The pedigree file 0a aa a a a 00 ee ee 150 9 4 Reading in the pedigree file oa a aaa 2004 151
36. 0 390635E 01 Covariance Variance Correlation Matrix ANTE UDU 37 20 0 5946 0 3549 0 3114 0 3040 23 38 41 55 0 5968 05237 0 5112 34 83 61 89 258 9 0 8775 0 8565 44 58 T922 331 4 550 8 0 9761 43 14 76 67 320 7 533 0 541 4 Analysis of Variance DF F_inc 8 Trait 5 188 84 Comp SE 44 55 41 54 43 s19 44 40 45 x Ee l Se oS aa E qaqaqacqaqaqagaaqaa 15 Examples 259 1 tmt 9 Tr tmt 1 4 14 4 S91 The iterative sequence converged and the antedependence parameter estimates are printed columnwise by time the column of U and the element of D Le 0 0269 0 0373 D diag 0 0060 0 0079 0 0391 1 0 j0 0 0 0 6 1 S amp S 284 0 1 4911 1 0 0 0 0 1 2804 1 0 0 eFPwoonoeo 9 678 Finally the input and output files for the unstructured model are presented below The REML estimate of X from the ANTE model is used to provide starting values RSE Ee input FE RSE yi y3 y5 y7 yi0O Trait tmt Tr tmt 120 14 S2 Tr 0 US 37 20 23 38 41 55 34 83 61 89 44 58 79 22 43 14 76 67 TEREE EEn gee output EHH EER 1 LogL 160 368 2 LogL 159 027 3 LogL 158 247 4 LogL 158 040 5 LogL 158 036 Source Model Residual US UnStr Residual US UnStr Residual US UnStr Residual US UnStr Residual US UnStr Residual US UnStr Residual US UnStr S2 S2 S2 S2 258 9 331 4 320 7 PRPRPRP PR terms Fane Ne 0000 0000 0000 0000
37. 0 59 and ss is seconds 0 to 59 The separator must be present for a group of m variates Gm is used when m contiguous data fields are to be treated as a factor or group of variates For example longhand shorthand X1 X2 X3 X4 X5 y X G5 y data dat data dat y mu X1 X2 X3 X4 X5 y mu X so that the 5 variates can be referred to in the model as X by using X G 5 e transformations see below Storage of alphabetic factor labels Space is allocated dynamically for the storage of alphabetic factor labels with a default allocation being 2000 labels of 16 characters long If there are large A factors so that the total across all factors will exceed 2000 you must specify the anticipated size within say 5 If some labels are longer then 16 characters and the extra characters are significant you must lengthen the space for each label by specifying LL c e g cross A 2300 LL 48 indicates the factor cross has about 2300 levels and needs 48 characters to hold the level names although only the first 20 characters of the names are ever printed PRUNE on a field definition line means that if fewer levels are actually present 5 Command file Reading the data 50 New in the factor than were declared ASReml will reduce the factor size to the actual number of levels Use PRUNALL for this action to be taken on the current and subsequent factors up to but not including a factor with the PRUNEOFF qualifier The use
38. 100000 12 nk 242 224 8000 Finished 27 Jul 2005 15 41 57 153 Structure Factor mismatch 8 A misspelt factor name in the predict statement The final error in the job is that a factor name is misspelt in the predict statement This is a non fatal error The faulty statement is simply ignored by ASReml and see Chapter 13 no pvs file is produced To rectify this statement correct voriety to variety 9 Forgetting mv in a spatial analysis The first error message from running part 2 of the job is R structures imply 0 242 records only 224 exist Checking the seventh line of the output below we see that there were 242 records read but only 224 were retained for analysis There are three reasons records are dropped 1 the FILTER qualifier has been specified 2 the D transformation qualifier has been specified and 3 there are missing values in the response variable and the user has not specified that they be estimated The last applies here so we must change the model line to read yield mu variety mv Folder C data ex manex variety A QUALIFIERS SKIP 1 QUALIFIER DOPART 2 is active Reading nin asd FREE FORMAT skipping 1 lines Univariate analysis of yield Using 224 records of 242 read Model term Size miss zero MinNon0O Mean MaxNonO 1 variety 56 0 0 1 28 5000 56 14 Error messages 226 11 column 11 0 0 1 6 3304 14 12 mu ul 11 AR AutoReg 0 1000 22 AR AutoReg 0 1000 Maybe you need to include mv in
39. 15 10 model 15 7 trait variety Uy 1 9 u u trait run Ur l 8 U U trait pair e 1 u e In addition to the assumptions in the models for individual traits 15 9 the bi variate analysis involves the assumptions cov ty Uy Ova L44 COV Ur Uy Or 166 and cov ec e o 1132 Thus random effects and errors are correlated between traits So for example the variance matrix for the variety effects for each trait is given by 2 o o var w v t Iu Ovet ou This unstructured form for trait variety in the bivariate analysis is equiv alent to the variety main effect plus heterogeneous tmt variety interaction variance structure 15 8 in the univariate analysis Similarly the unstructured form for trait run is equivalent to the run main effect plus heterogeneous tmt run interaction variance structure The unstructured form for the errors trait pair in the bivariate analysis is equivalent to the pair plus heteroge neous error tmt pair variance in the univariate analysis This bivariate analysis is achieved in ASReml as follows noting that the tmt factor here is equivalent to traits this is for the paired data id pair 132 run 66 variety 44 A yc ye ricem asd skip 1 X syc Y sye sqrt yc sqrt ye Trait r Tr variety Tr run 122 132 52 Tr 0 US 2 21 1 1 2 427 Tr variety 2 20 US 1 401 1 1 477 4400 15 Examples 280 Erin 2 20 US 79 6 2 887 66 0 0 predict variety A portion of
40. 162 6 074 17 27 074 4 476 6 222 18 28 795 6 255 6 282 19 23 775 6 325 6 235 20 27 042 6 008 5 962 240 24 695 1 855 6 114 241 25 452 0 1475 6 158 242 22 465 4 435 6 604 13 4 Other ASReml output files The aov file This file reports details of the ANOVA calculations particularly as relating to the conditional F statistics not computed in this run In the following table relating to the incremental F statistic the columns are e model term e columns in design matrix e numerator degrees of freedom e simple F statistic e F statistic scaled by A e as defined in Kenward amp Roger e denominater degrees of freedom mu 1 1 331 8483 331 8483 variety 56 55 2 2259 25 0082 110 8419 A more useful example is obtained by adding a linear nitrogen contrast to the oats example Section 15 2 13 Description of output files 197 The basic design is six replicates of three gpiit plot analysis oat whole plots to which variety was randomised blocks and four subplots which received 4 rates of a TA nitrogen A CONTRAST qualifier defines the mee eg 2 model term linNitr as the linear covariate wplots representing ntrogen applied Fitting this be yield oats asd skip 2 5 CONTRAST linNitr nitrogen 6 latter term represents lack of fit from a linear 6 4 0 5 0 0 fore the model term nitrogen means that this response FCON yield mu variety linNitr nitrogen variety linNitr er a v ariety nitrogen
41. 1994 Generalized Linear Models 2 edn Chapman and Hall London McCulloch C and Searle S R 2001 Generalized Linear and Mixed Models Wiley Millar R and Willis T 1999 Estimating the relative density of snapper in and around a marine reserve using a log linear mixed effects model Australian and New Zealand Journal of Statistics 41 383 394 Nelder J A and Wedderburn R W M 1972 Generalised linear models Journal of the Royal Statistical Society Series A 135 370 384 Patterson H D and Nabugoomu F 1992 REML and the analysis of series of crop variety trials Proceedings from the 25th International Biometric Conference pp 77 93 Patterson H D and Thompson R 1971 Recovery of interblock information when block sizes are unequal Biometrika 58 545 54 Piepho H P Denis J B and van Eeuwijk F A 1998 Mixed biadditive models Proceedings of the 28th International Biometrics Conference Pinheiro J C and Bates D M 2000 Mixed Effects Models in S and S PLUS Berlin Springer Verlag R Development Core Team 2005 R A language and environment for statistical computing R Foundation for Statistical Computing Vienna Austria ISBN 3 900051 07 0 Robinson G K 1991 That BLUP is a good thing The estimation of random effects Statistical Science 6 15 51 Rodriguez G and Goldman N 2001 Improved estimation procedures for multilevel models with binary response A cas
42. 3 lists the link 6 Command file Specifying the terms in the mixed model 97 function qualifiers which relate the linear predictor 77 scale to the observation u Ely scale Table 6 4 lists the distributions and other qualifiers A second dependent variable may be specified if a bivariate analysis is required but it will always be treated as a normal variate no syntax is provided for specifying GLM attributes for it The ASUV qualifier is required in this situation for the GLM weights to be utilized Table 6 3 Link qualifiers and functions Qualifier Link Inverse Link Available with IDENTITY n p N All SQRT n yH L n Poisson Normal Poisson LOGARITHM n In p u exp n Negative Binomial Gamma Normal Gamma nee q p Negative Binomial LOGIT n p 1 u i Ea ey Binomial PROBIT n u u O n Binomial en COMPLOGLOG n ln ln 1 y p 1l e Binomial where u is the mean on the data scale and n XT is the linear predictor on the underlying scale Table 6 4 GLM qualifiers qualifiers action Distributions where u is the mean on the data scale calculated from 7 XT n is the count specified by the TOTAL qualifier v is the variance expression for the distribution d is the deviance expression for the distribution y is the observation and is a parameter set with the PHI qualifier The default link is listed first followed by permitted alternatives NORMAL LOGARITHM IN
43. 5 but not to include random effects The matrix is lower triangular row wise in the order that the parameters are printed in the s1n file It can be thought of as a partitioned lower triangular matrix o2 Bp oC DD 13 Description of output files 208 where 33 p is the dense portion of 8 and C is the dense portion of C This is the first 20 rows of nin89a vrb Note that the first element is the estimated error variance that is 48 6802 see the variance component estimates in the asr output 0 486802E 02 0 807551E 01 0 313123E 00 0 295404E 01 0 743616E 01 0 472519E 01 0 330076E 01 0 768275E 01 0 395693E 01 0 226478E 01 0 402503E 01 0 508553E 01 0 428826E 01 0 855241E 01 0 384055E 01 0 392097E 01 0 000000E 00 0 470711E 01 0 000000E 00 0 000000E 00 0 163302E 01 0 402696E 01 0 347471E 01 0 310018E 00 0 383429E 01 0 000000E 00 0 440539E 01 0 000000E 00 0 417864E 01 0 243687E 01 0 330171E 01 0 406762E 01 0 000000E 4 0 000000E 4 0 410031E 4 0 343331E 4 0 000000E 4 0 837281E 4 0 357605E 4 0 000000E 4 0 458492E 4 0 379286E 4 0 362391E 4 0 393626E 4 0 363341E 4 0 000000E 4 0 362019E 4 00 0O O1 O1 00 O41 EOL 00 O1 O01 EOL O1 01 FOO O1 0 801579E 4 O1 0 298660E 01 0 456648E 01 0 476546E 01 0 389620E 01 0 377176E 01 0 129013E 01 0 316915E 01 0 376637E 01 0
44. 5000 56 2 id 0 0 1 000 28 50 56 00 3 pid 0 o 1101 2628 4156 4 raw 0 O 21 00 510 5 840 0 5 repl 4 0 0 1 2 5000 4 6 nloc 0 0 4 000 4 000 4 000 7 yield Variate 0 QO 1 050 25 535 42 00 8 lat 0 0 4 300 27 22 47 30 9 long 0 O 1 200 14 08 26 40 10 row 22 0 0 11 7321 az 11 column 1i 0 0 1 6 3304 11 12 mu 1 Fault O R header SECTIONS DIMNS GSTRUCT Last line read was Ir Repl 0000 ninerr4 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu 1 12 factors defined max 500 O variance parameters max1500 2 special structures Final parameter values 14 Error messages 222 Last line read was Ir Repl 0000 12 0 242 224 8000 Finished 28 Jul 2005 09 53 16 775 R header SECTIONS DIMNS GSTRUCT Inserting a comma on the end of the first line of the model to give yield mu variety Ir Repl solves that problem but produces the error message Error reading model factor list because Repl should have been spelt rep1 Portion of the output is displayed Since the model line is parsed before the data is read this run failed before reading the data ASReml 1 99a 01 Aug 2005 nin alliance trial Build d 27 Jul 2005 32 bit 28 Jul 2005 10 06 48 042 64 00 Mbyte Windows ninerrd Licensed to Arthur Gilmour Folder C data ex manex variety A QUALIFIERS SKIP 1 QUALIFIER DOPART 1 is active Reading nin asd FREE FORMAT skipping 1 lines Fault O Error reading model factor list Last line rea
45. 510 5 2 4132 4 000 25 53 25 80 13 80 11 5000 6 0000 MaxNonO 56 56 00 4156 840 0 4 4 000 42 00 47 30 26 40 22 11 13 Description of output files 192 12 mu 1 13 mv_estimates 18 22 AR AutoReg 0 5000 11 AR AutoReg 0 5000 Forming 75 equations 57 dense Initial updates will be shrunk by factor 0 316 NOTICE 1 singularities detected in design matrix iterations 1 LogL 401 827 S2 42 467 168 df 1 000 0 5000 0 5000 2 LogL 400 780 S2 43 301 168 df 1 000 0 5388 0 4876 3 LogL 399 807 S2 45 066 168 df 1 000 0 5895 0 4698 4 LogL 399 353 S2 47 745 168 df 1 000 0 6395 0 4489 5 LogL 399 326 S2 48 466 168 df 1 000 0 6514 0 4409 6 LogL 399 324 S2 48 649 168 df 1 000 0 6544 0 4384 7 LogL 399 324 S2 48 696 168 df 1 000 0 6552 0 4377 8 LogL 399 324 S2 48 708 168 df 1 000 0 6554 0 4375 Final parameter values 1 0000 0 65550 0 43748 Source Model terms Gamma Component Comp SE C parameter Variance 242 168 1 00000 48 7085 6 81 OP estimates Residual AR AutoR 22 0 655505 0 655505 11 63 OU Residual AR AutoR 11 0 437483 0 437483 5 43 OU ANOVA Analysis of Variance NumDF DenDF F_inc Prob 12 mu 1 25 0 331 85 lt 001 1 variety 55 110 3 2 22 lt 001 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 13 mv_estimates 18 effects fitted outliers 6 possible outliers in section 1 see res file Finished 14 Jul 2005 12 41 26 862 LogL Converged Finally we display a p
46. 53 129 88 135 19 o N eO 44 120 53 55 126 67 12 OOO e aN OQQON OGOOGO w oa 81 22 oo ou 74 48 97 58 99 109 91 101 36 33 57 ad DOG w pe ao OOQ 64 110 83 117 123 153 133 129 96 63 89 90 228 67 113 47 23 126 113 oOo 2 N ww A N foe 91 49 68 109 119 16 45 62 57 116 0 28 0 26 0 18 86 65 131 20 141 69 63 57 181 101 50 66 57 30 92 70 48 27 15 24 a7 2 141 40 25 104 111 70 198 29 73 64 13 Description of output files 204 a Residual Residual Residual Residual Residual Residual section section section section section section 6 possible 23 0311757288330 column 8 11 column 9 11 colma 9 41 column 10 11 column 10 11 column 11 11 outliers in section wOoOPRWWN BS FRN AAR R 22 1 22 i 22 i 22 i 22 i 22 is 3 test value Figures 13 2 to 13 5 show the graphics derived from the residuals when the IDISPLAY 15 qualifier is specified and which are written to eps files by run ning ASReml g22 nin89a as The graphs are a variogram of the residuals from the spatial analysis for site 1 Figure 13 2 a plot of the residuals in field plan order Figure 13 3 plots of the marginal means of the residuals Figure 13 4 and a histogram of the resid uals Figure 13 5 The selection of which plots are displayed is controlled by the DISPLA
47. 8 80 0 df df df df af 000 000 000 000 000 000 000 PRPrPrPRPRP PE Component 38756 6 0 683767 0 458607 F_inc 850 88 13 04 oqo oo Oo 2 2 0 000 0 68377 4049 0 1870 s5737 0 3122 6789 0 4320 6838 0 4542 6838 0 4579 6838 0 4585 6838 0 4586 0 45861 Comp SE C 5 00 0 P 10 80 Q Uy 5 55 0y Prob lt 001 lt 001 2 components constrained 1 components constrained 15 Examples 265 6 LogL 696 823 7 LogL 696 823 Source units Variance Residual Residual Analysis of Variance 8 mu 6 variety S2 45753 S2 45796 Model terms 150 150 150 125 AR AutoR 15 AR AutoR 10 0 1 ds 125 125 Gamma 06154 00000 0 843795 0 6 NumDF 1 24 82686 df df DenDF 3 5 foal Component 4861 48 45796 3 0 843795 0 682686 F_inc 259 261 10 21 Comp SE 212 2 74 12 33 6 68 oo OP OP y oU Prob lt 001 lt 001 The lattice analysis with recovery of between block information is presented below This variance model is not competitive with the preceding spatial models The models can be formally compared using the BIC values for example IB NOOR WNEK analysis LogL 734 LogL 720 LogL 711 LogL 707 LogL 707 LogL 707 LogL 707 Source Rep RowBlk ColB1lk Variance 184 060 119 937 786 786 786 S2 26778 S2 16591 S2 11173 S2 8562 4 S2 8091 2 S2 8061 8 S2 8061 8 Model
48. A column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r repl predict variety 001 repl repl IDV 0 1 3 A guided tour 33 See Chapter 10 See Chapter 7 cept variety fits a fixed variety effect and repl fits a random replicate effect The r qualifier tells ASReml to fit the terms that appear after this qualifier as random effects Prediction Prediction statements appear after the model statement and before any variance structure lines In this case the 56 variety means for yield would be formed and returned in the pvs output file See Chapter 10 for a de tailed discussion of prediction in ASReml Variance structures The last three lines are included for exposi tory purposes and are not actually needed for this analysis An extensive range of variance structures can be fitted in ASReml See Chap ter 7 for a lengthy discussion of variance mod elling in ASReml and identically distributed random replicate effects are specified using the identifier IDV in a G structure G structures are described in Section 2 1 and the list of available variance In this case independent NIN Alliance trial 1989 variety A column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r repl predict variety 001 repl 1 repl O IDV 0 1 NIN Alliance trial 1989 variety A column 11 nin89 asd skip 1 tabulate yield variety yield mu variet
49. AN V Ida L90L80N NIMSMONG 6L9LSAN 909985N L102 S i4 EZ9IS8AN 99998AN 60tL8AN LSTLSAN NVMUON 0PL8aN L0S980N STIL8AN 8998GIN 99LQOOS 5 Z8P98AN NIMsMONnd 90TESAN I9LSAN LOTESAN LOvEsaN E9PL8AN ET9L8AN LZSISAN VUNLNAO lt DLETESSH ZT9L8AN NISHONA AdOO SOPLSEIN ZISL8aN VNOA LOLSAN NIMSMONE LOVESN I II OL 6 8 L 9 G Vv G T Mor wumnyoo jeun PJ NIN 242 Ul SJOjd 0 salqalUeA Jo UO JedO Je pue node jel T E aIGeL 3 A guided tour 30 see Chapter 2 the data file must first be augmented to specify the complete 22 row x 11 column array of plots These are the first 20 lines of the augmented data file nin89aug asd with 242 data rows variety id pid raw repl LANCER 1 NA NA 1 4 NA 4 31 2 11 LANCER 1 NA NA 1 4 NA 4 3 2 42 1 LANCER 1 NA NA 1 4 NA 4 3 3 6 3 1 LANCER 1 NA NA 1 4 NA 4 3 4 8 4 1 LANCER 1 NA NA 1 4 NA 4 3651 LANCER 1 NA NA 1 4 NA 4 3 7 2 6 1 LANCER 1 NA NA 1 4 NA 4 3 8 47 1 LANCER 1 NA NA 1 4 NA 4 39 6 8 1 LANCER 1 NA NA 1 4 NA 4 3 10 89 1 LANCER 1 NA NA 1 4 NA 4 3 12 10 1 LANCER 1 NA NA 1 4 NA 4 3 13 2 11 1 LANCER 1 NA NA 1 4 NA 4 3 14 4 12 1 LANCER 1 NA NA 1 4 NA 4 3 15 6 13 1 LANCER 1 NA NA 1 4 NA 4 3 16 8 14 1 LANCER 1 NA NA 1 4 NA 4 3 18 15 1 LANCER 1 NA NA 2 4 NA 17 2 7 2 6 4 LANCER 1 NA NA 3 4 NA 25 8 22 8 19 6 LANCER 1 NA NA 4 4 NA 38 7 12 0 10 9 LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602
50. ASReml border pin will perform the pinfile calculations defined in border pin on the results in files border asr and border vvp ASReml Pborderwwt border pin will perform the pinfile calculations defined in border pin on the results in files borderwwt asr and borderwwt vvp Forming a job template from data file The facility to generate a template as file has been moved to the command line and extended Normally the name of a as command file is specified on the command line If a as file does not exist and a file with file extension asd csv dat gsh txt or x1s is specified ASReml assumes the data file has field labels in the first row and generates a as file template First it seeks to convert the gsh Genstat or xls Excel see page 44 file to csv format using the ASRemload d11 utility provided by VSN In generating the as template ASReml 12 Command file Running the job 179 takes the first line of the csv or other file as providing column headings and generates field definition lines from them If some labels have appended these are defined as factors otherwise ASReml attempts to identify factors from the field contents The template needs further editing before it is ready to run but does have the field names copied across 12 3 Command line options Command line options and arguments may be specified on the command line or on the top job control line This is an optional first line of the as file
51. B a m e column 12 mu oR a oo o 02 oo 2 2 2 Oo o e Oo So eo oc 2 2 2 m e I foe I m ninerr2 nin asd Model specification TERM LEVELS GAMMAS mu 0 variety 0 12 factors defined max 500 O variance parameters max1i500 2 special structures Last line read was variety id pid raw rep nloc yield lat long row column i2 0 0 8000 Finished 27 Jul 2005 15 41 40 068 Missing faulty SKIP or A needed for variety Fixing the error by changing slip to skip however still produces the fault message 14 Error messages 220 hint Missing faulty SKIP or A needed for variety The portion of output given below shows that ASReml has baulked at the name LANCER in the first field on the first data line This alphabetic data field is not declared as alphabetic The correct data field definition for variety is variety A to indicate that variety is a character field Folder C asr ex manex QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Univariate analysis of yield Field 1 LANCER of record 1 line 1 is not valid Since this is the first data record you may need to skip some header lines see SKIP or append the A qualifier to the definition of factor variety Fault 0 Missing faulty SKIP or A needed for variety Last line read was LANCER 1 NA NA 1 4 NA 4 31 21 1 ninerr3 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS mu 0 0 000 varie
52. B C R 1 A B C A B A C B C c Of these the conditional Wald statistic for the 1 B C and A B C terms would be the same as the incremental Wald statistics produced using the linear model y 1 A B C A B A C B C A B C The preceeding table includes a so called M marginality code reported by ASReml when conditional Wald statistics are presented All terms with the highest M code letter are tested conditionally on all other terms in the model i e by dropping the term from the maximum model All terms with the preceeding M code letter are marginal to at least one term in a higher group and so forth For example in the table model term A B has M code B because it is marginal to model term A B C and model term A has M code A because it is marginal to A B A C and A B C Model term mu M code is a special case in that its test is conditional on all covariates but no factors Following is some ASReml output from the aov table which reports the terms in the conditional statistics Marginality pattern for F con calculation Model terms Model Term DF 1 2 3 4 5 6 7 8 1 mu 1 oo oo 3 2 water 1 I C C c 3 variety 7 I I C c 4 sow 2 I I I 5 water variety T I I I I C C 2 Some theory 23 6 water sow 2 I I I I I C 7 variety sow 14 I I I I I I 8 water variety sow 14 I I I I I I I F inc tests the additional variation explained when the term is added to a model consisting of the I terms F con tests the add
53. Baar ARG Folder C data asr UG2 manex TAG I BloodLine I QUALIFIERS SKIP 1 Reading wether dat FREE FORMAT skipping 1 lines traits Bivariate analysis of GFW and FDIAM Using 1485 records of 1485 read Model term Size miss zero MinNonO Mean MaxNonO 1 TAG 521 0 0 1 261 0956 521 2 TRIAL 0 0 3 000 3 000 3 000 3 BloodLine 27 0 0 1 13 4323 ae 4 TEAM 35 0 0 1 18 0067 35 5 YEAR 3 0 0 1 2 0391 3 8 Command file Multivariate analysis 146 6 GFW Variate 0 O 4 100 7 478 11 20 7 YLD 0 o 60 30 75 11 88 60 8 FDIAM Variate 0 15 90 22 29 30 60 9 Trait 10 Trait YEAR 9 Trait 5 YEAR 3 11 Trait TEAM TO 9 Trait 4 TEAM 35 12 Trait TAG 1042 9 Trait 1 TAG 521 1485 identity 2 UnStructure 0 2000 0 2000 0 4000 2970 records assumed sorted 2 within 1485 2 UnStructure 0 4000 0 3000 1 3000 35 identity Structure for Trait TEAM has 70 levels defined 2 UnStructure 0 2000 0 2000 2 0000 521 identity Structure for Trait TAG has 1042 levels defined Forming 1120 equations 8 dense Initial updates will be shrunk by factor 0 316 Notice Algebraic ANOVA Denominator DF calculation is not available Empirical derivatives will be used NOTICE 2 singularities detected in design matrix convergence 1 LogL 886 521 52 1 0000 2964 df 2 LogL 818 508 S2 1 0000 2964 df 3 LogL 755 911 S2 1 0000 2964 df 4 LogL 725 374 S2 1 0000 2964 df 5 LogL 723 475 S2 1 0000 2964 df 6 LogL 723 462 S2 1 0000 2964 df 7 LogL 723 462 S2 1 0000 2964 df 8 LogL 723 462
54. Contents xi 9 5 Genetic groups 2 152 9 6 Reading a user defined inverse relationship matrix 154 The example continued 0 2 000002 2 ae 155 10 Tabulation of the data and prediction from the model 156 10 1 Introduction 2 0 0 0000 2p 157 10 2 Tabulation s q st ue i dor a4 4044 bea wkd pew eb dS eee 157 10 3 Prediction e poi a Be ge ee ace a a ee Eo eM a Be ee al BS 158 Underlying principles 2 2 2 002000 a 158 Predict Syntax ac 0 a aaa Bete oe ee Se we ee ee eS 160 Examples oaa 168 11 Functions of variance components 170 11 1 Introduction oaoa aa e 171 WL 2 SY MEA so eae a ange ee ee eg ee fe te ey Bee oe ee 171 Linear combinations of components 171 Heritability z asss 44 20 be ke ace ee Pek woe Se poe ees 172 Correlation 2 2 2 173 A more detailed example 2 a 2020005 174 12 Command file Running the job 176 12 1 Introduction 0 0 000000000 E e e eRe es 177 Contents xii 12 2 The command line 0 0000 2 ee 177 Normal TUN 0 2 4 04 24 eee 663 e wane OR ee ee ee 177 Processing a pin file 0a aaa 178 Forming a job template from data file 178 12 3 Command line options ooa 179 Prompt for arguments A oaa Ea wae ed es 181 Output control B J aie ae ee a ee 181 Debug command line options D EE 002 181 Graphics command line options G H 1 N Q 181 Job control comman
55. LOG PV for Section 1 1 00 1 54 10 possible outliers see res file Finished 13 Jul 2005 09 38 05 725 LogL Converged T prev 42 07 32 85 78 16 Command file Genetic analysis Introduction The command file The pedigree file Reading in the pedigree file Genetic groups GIV files The example Multivariate genetic analysis 148 9 Command file Genetic analysis 149 9 1 Introduction In an animal model or sire model genetic analysis we have data on a set of animals that are genetically linked via a pedigree The genetic effects are there fore correlated and assuming normal modes of inheritance the correlation ex pected from additive genetic effects can be derived from the pedigree provided all the genetic links are in the pedigree The additive genetic relationship matrix sometimes called the numerator relationship matrix can be calculated from the pedigree It is actually the inverse relationship matrix that is formed by ASReml for analysis Users new to this subject might find notes by Julius van der Werf helpful http www personal une edu au jvanderw Mixed_Models_for_Genetic_analysis pdf For the more general situation where the pedigree based inverse relationship matrix is not the appropriate required matrix the user can provide a particular general inverse variance GIV matrix explicitly in a giv file In this chapter we consider data presented in Harvey 1977 using the command fi
56. Multi Environment Trials analyses where say Column ef fects are to be fitted to a subset of environments It may also be used on the intrinsic factor Trait in a multivariate analysis provided it correctly identifies the number of levels of Trait either by including the last trait number or appending sufficient zeros Thus if the analysis involves 5 traits SUBSET Trewe Trait 13400 5 Command file Reading the data 72 New New Table 5 5 List of rarely used job control qualifiers qualifier action ATSINGULARITIES BMP BRIEF n BLUP n can be specified to force a job to continue even though a singu larity was detected in the Average Information AI matrix The AI matrix is used to give updates to the variance pa rameter estimates In release 1 if singularities were present in the AI matrix a generalized inverse was used which effec tively conditioned on whichever parameters were identified as singular ASReml now aborts processing if such singularities appear unless the AISINGULARITIES qualifier is set Which particular parameter is singular is reported in the variance component table printed in the asr file The most common reason for singularities is that the user has overspecified the model and is likely to misinterpret the results if not fully aware of the situation Overspecification will occur in a direct product of two unconstrained variance matrices see Section 2 4 when a random
57. Otherwise you will have to convert a factor with alphanu meric labels to numeric sequential codes ex ternal to ASReml so that an A option can be avoided The data file may need to be rewritten with some factors recoded as sequential integers This is an internal limit Reduce the number of response variables 14 Error messages 240 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Unable to invert R or G US 7 matrix Unable to invert R or G CORR matrix Variance structure is not positive definite XFA model not permitted in R structures XFA may not be used as an R structure this message occurs when there is an error forming the inverse of a variance structure The probable cause is a non positive definite initial variance structure US CHOL and ANTE models It may also occur if an identity by un structured ID US error variance model is not specified in a multivariate analysis including ASMV see Chapter 8 If the failure is on the first iteration the problem is with the starting values If on a subsequent iteration the up dates have caused the problem You could try reducing the updates by using the STEP qual ifier Otherwise you could try fitting an al ternative parameterisation The CORGH model may be more stable than the US model generally refers to a problem setting up the mixed model equations Most commonly it is c
58. PRESENT v is used when averaging is to be based only on cells with data v New is a list of variables and may include variables in the classify set v may not include variables with an explicit AVERAGE qualifier The variable names in v may optionally be followed by a list of levels for inclusion if such a list has not been supplied in the specification of the classify set ASReml works out what combi nations are present from the design matrix A second PRESENT qualifier is allowed on a predict statement but not with PRWTS This is needed when there are two nested factors such as sites within regions and genotype within family The two lists must not overlap PRWTS v is used in conjunction with the first PRESENT factors to specify New the weights that ASReml will use for averaging that PRESENT table More details are given below Controlling inclusion of model terms EXCEPT t causes the prediction to include all fitted model terms not in t IGNORE t causes ASReml to set up a prediction model based on the default rules and then removes the terms in t This might be used to omit the spline Lack of fit term IGNORE fac x from predictions as in yield mu x variety r spl x fac x predict x IGNORE fac x which would predict points on the spline curve averaging over variety 10 Tabulation of the data and prediction from the model 164 List of prediction qualifiers qualifier action ONLYUSE t causes the pred
59. The focus in developing ASReml has been on the core engine and it is freely acknowledged that its user interface is not to the level of these other packages Nevertheless as the developers interface it is functional it gives access to every thing that the core can do and is especially suited to batch processing and running of large models without the overheads of other systems Feedback from users is welcome and attempts will be made to rectify identified problems in ASReml The guide has 15 chapters Chapter 1 introduces ASReml and describes the con ventions used in this guide Chapter 2 outlines some basic theory while Chapter 3 presents an overview of the syntax of ASReml through a simple example Data file preparation is described in Chapter 4 and Chapter 5 describes how to input data into ASReml Chapters 6 and 7 are key chapters which present the syntax for specifying the linear model and the variance models for the random effects in the linear mixed model Chapters 8 and 9 describe special commands for multivari ate and genetic analyses respectively Chapter 10 deals with prediction of linear functions of fixed and random effects in the linear mixed model and Chapter 11 presents the syntax for forming functions of variance components Chapter 12 demonstrates running an ASReml job features available and Chapter 13 gives a detailed explanation of the output files Chapter 14 gives an overview of the error messages generated in ASReml and some g
60. They may be abbreviated truncated if they are referred to again provided no ambiguity is introduced Important It is often clearer if labels are not abbreviated If abbreviations are used then they need to be chosen to avoid confusion e if the model is written over several lines all but the final line must end with a comma to indicate that the list is continued In Tables 6 1 and 6 2 the arguments in model term functions are represented by the following symbols f the label of a data variable defined as a model factor k n an integer number r a real number t a model term label includes data variables v y the label of a data variable Parsing of model terms in ASReml is not very sophisticated Where a model term takes another model term as an argument the argument must be predefined If necessary include the argument in the model line with a leading which will cause the term to be defined but not fitted For example Trait male Trait female and Trait female Also dens is an abbeviation for density but spl dens 7 is a different model term albiet probably equivalent to spl density 7 because it does not repre sent a simple truncation 6 Command file Specifying the terms in the mixed model Table 6 1 Summary of reserved words operators and functions model term brief description common usage fixed random reserved terms operators commonly used functions mu mv T
61. This file contains a summary of the data the iteration sequence estimates of the variance parame ters and an analysis of variance ANOVA table The estimates of all the fixed and random effects are written to nin89 sln The residuals predicted values of the observations and the diagonal elements of the hat matrix see Chapter 2 are returned in nin89 yht see Section 13 3 Other files produced by this job include the aov pvs res tab vvp and veo files see Section 13 4 The asr file Below is nin89 asr with pointers to the main sections The first line gives the version of ASReml used in square brackets and the title of the job The second line gives the build date for the program and indicates whether it is a 32bit or 64bit version The third line gives the date and time that the job was run and reports the size of the workspace The general announcements box outlined in asterisks at the top of the file notifies the user of current release features The remaining lines report a data summary the iteration sequence the estimated variance parameters and an ANOVA table The final line gives the date and time that the job was completed and a statement about convergence ASRem1 1 630 01 Jun 2005 NIN alliance trial 1989 Build j 01 Jul 2005 32 bit 11 Jul 2005 13 55 21 504 32 00 Mbyte Windows nin89 Licensed to Arthur Gilmour FE A I A RA I A aK A a a 2k 2k 2k a ok 3 A guided tour 36 data summary iteration sequ
62. Usu ally no report is produced unless the algorithm has at least produced estimates for the fixed and random effects in the model Note that residuals are not included in the output forced by this qualifier This option is primarily intended to help debugging a job that is not converging properly 5 Command file Reading the data 81 List of very rarely used job control qualifiers action qualifier ISCALE 1 SCORE New ISLOW n I TOLERANCE s1 sol New VRB New When forming a design matrix for the sp1 model term ASReml uses a standardized scale independent of the actual scale of the variable The qualifier SCALE 1 forces ASReml to use the scale of the variable The default standardised scale is appropriate in most circumstances requests ASReml write the SCORE vector and the Average Information matrix to files basename SCO and basename AIM The values written are from the last iteration reduces the update step sizes of the variance parameters more persistently than the STEP r qualifier If specified ASReml looks at the potential size of the updates and if any are large it reduces the size of r If n is greater than 10 ASReml also modifies the Information matrix by multiplying the diagonal elements by n This has the effect of further reducing the updates This option may help when you do not have good starting values especially in multivariate analyses modifies the ability of ASReml to detect si
63. a factor is independent of any other factors in the term Multivariate data and repeated measures data usually satisfy the assumption of separability In particular if the data are indexed by factors units and traits for multivariate data or times for repeated measures data then the R structure may be written as units traits or units times This assumption is sometimes required to make the estimation process computationally feasible though it can be relaxed for certain applications for example fitting isotropic covariance models to irregularly spaced spatial data Variance structures for the random effects G structures The q x 1 vector of random effects is often composed of b subvectors u u wy us where the subvectors u are of length q and these subvectors are usually assumed independent normally distributed with variance matrices 0Gi Thus just like R we have Gy 0 0 0 0 Go 0 0 G 0_1Gi Co 4 l 0 0O Gnu 0 0 O sus 0 Gp There is a corresponding partition in Z Z Z1 Z2 Z As before each submatrix G is assumed to be the direct product of one two or three component matrices These matrices are indexed for each of the factors constituting the term in the linear model For example the term site genotype has two factors and so the matrix G is comprised of two component matrices defining the variance structure for each factor in the term Models for the component matrices G include the standard
64. a missing value in binary files Fortran binary in the above means all real bin or all double precision db1 variables mixed types that is integer and alphabetic binary representation of variables is not allowed in binary files binary files can only be used in conjunction with a pedigree file if the pedigree fields are coded in the binary file so that they correspond with the pedigree file this can be done using the SAVE option in ASReml to form the binary file see Table 5 5 or the identifiers are whole numbers less than 9 999 999 and the RECODE qualifier is specified see Table 5 5 Command file Reading the data Introduction Important rules Title line Specifying and reading the data Data field definition syntax Transforming the data Transformation syntax Other rules and examples Special note on covariates Other examples Datafile line Datafile line syntax Datafile qualifiers Job control qualifiers 45 5 Command file Reading the data 46 5 1 5 2 Introduction In the code box to the right is the ASReml NIN Alliance Trial 1989 command file nin89a as for a spatial analysis variety A of the Nebraska Intrastate Nursery NIN field experiment introduced Chapter 3 The lines ii that are highlighted in bold blue type relate repl 4 to reading in the data In this chapter we use aloe ae ield this example to discuss reading in the data in 4 w detail long row 22 id pid c
65. analysis only es timates the maternal variance component It is only significant for the weaning and yearling weights The litter variation remains unchanged The ASReml input file again consists of several parts which progressively build up to fitting unstruc tured variance models to Tr tag Tr dam Tr litter and error A portion of the output file is tag P dam P age wwt mO ywt mO gfw mO fdm 1m0 fat mO A inverse retrieved from ainverse bin PEDIGREE pcoop fmt has 10696 identities QUALIFIERS CONTINUE MAXIT 20 STEP 0 01 QUALIFIERS EXTRA 4 QUALIFIER DOPATH 3 is active 29474 Non zero elements Reading pcoop fmt FREE FORMAT skipping O lines Multivariate analysis of wwt ywt gfw fdm Multivariate analysis of fat Using 7043 records of 7043 read Model term Size miss zero MinNonO Mean MaxNonO 1 tag IP 10696 0 O 3 000 5380 0 1070E 05 2 sire 0 O 1 000 48 06 92 00 3 dam IP 10696 0 0 1 000 5197 0 1070E 05 Forming 95033 equations 40 dense Initial updates will be shrunk by factor 0 010 Restarting iteration from previous solution Notice LogL values are reported relative to a base of 20000 00 NOTICE 76 singularities detected in design matrix 1 LogL 1437 10 S2 1 0000 35006 df 2 components constrained 2 LogL 1436 87 52 1 0000 35006 df 3 components constrained 3 LogL 1434 97 S2 1 0000 35006 df 2 components constrained 4 LogL 1430 73 S2 1 0000 35006 df 2 components constrained 5 LogL 1424 71 52 1 000
66. and Nelder 1982 who consider fixed effects models They form fitted values for all combinations of the explanatory variables in the model then take marginal means across the explanatory variables not relevent to the current prediction Our case is more general in that random effects can be fitted in our mixed models A full de scription can be found in Gilmour et al 2004 and Welham et al 2004 Random factor terms may contribute to predictions in several ways They may be evaluated at values specified by the user they may be averaged over or they may be omitted from the fitted values used to form the prediction Averaging over the set of random effects gives a prediction specific to the random effects observed We call this a conditional prediction Omitting the term from the model produces a prediction at the population average zero that is substituting the assumed population mean for an unknown random effect We call this a marginal prediction Note that in any prediction some terms may be evaluated as conditional and others at marginal values depending on the aim of prediction For fixed factors there is no pre defined population average so there is no natural interpretation for a prediction derived by omitting a fixed term from the fitted values Averages must therefore be taken over all the levels present to give a sample specific average or prediction must be at specified levels 10 Tabulation of the data and pr
67. arguments 5 ASReml symbols 83 43 43 43 43 1 85 1 85 85 85 85 85 85 85 autoregressive 111 Average Information 2 balanced repeated measures 253 Bayesian Information Criteria 18 BIC 18 binary files 44 Binomial divisor 99 BLUE 14 312 BLUP 14 15 case 84 combining variance models 16 command file 30 genetic analysis 149 multivariate 142 Command line option A ASK 181 B BRIEF 181 C CONTINUE 183 D DEBUG 181 F FINAL 183 Gg graphics 181 Hg HARDCOPY 182 INTERACT 181 J JOIN 181 N NoGraphs 181 O ONERUN 183 Q QUIET 182 R RENAME 183 S WorkSpace 184 W WorkSpace 184 command line options 179 commonly used functions 85 conditional distribution 12 conditional factors 90 Conditional Wald F Statistics 20 constraining variance parameters 137 constraints on variance parameters 115 contrasts 64 Convergence criterion 65 correlated random effects 15 correlation 173 between traits 142 Index 313 model 10 covariance model 11 isotropic 10 covariates 42 58 101 cubic splines 95 data field syntax 48 data file 28 41 42 binary format 44 fixed format 44 free format 42 using Excel 44 data file line 32 datafile line 59 qualifiers 60 syntax 59 datasets barley asd 262 coop fmt 293 grass asd 253 harvey dat 149 nin89 asd 28 oats asd 243 orange asd 285 rat dat 142 rats asd
68. axes hy xj xj hy yi Yj Sx cos a hz sin a hy Sy cos a he sin a hy d 4 sq sy 6 1 For a given v the range parameter affects the rate of decay of p with increasing d The parameter v gt 0 controls the analytic smoothness of the underlying process us the process being v 1 times mean square differen tiable where v is the smallest integer greater than or equal to v Stein 1999 page 31 Larger v correspond to smoother processes ASReml uses numerical derivatives for v when its current value is outside the interval 0 2 5 When v m i with m a non negative integer pm is the product of exp d and a polynomial of degree m in d Thus v 5 yields the exponen tial correlation function pm d 5 exp d and v 1 yields Whittle s elementary correlation function py d 1 d Ki d Webster and Oliver 2001 When v 1 5 then pm d 1 5 exp d 1 d 7 Command file Specifying variance structures 129 which is the correlation function of a random field which is continuous and once differentiable This has been used recently by Kammann and Wand 2003 As v oo then pa tends to the gaussian correlation function The metric parameter A is not estimated by ASReml it is usually set to 2 for Euclidean distance Setting 4 1 provides the cityblock metric which together with v 0 5 models a separable AR1xAR1 process Cityblock met ric
69. be many vij with the same displacements ASReml calculates the means for each displacement pair 1 1 l 2 either ignoring the signs default or separately for same sign and opposite sign TWOWAY after grouping the larger displacements 9 10 11 14 15 20 The result is displayed as a perspective plot see page 205 of the one or two surfaces indexed by absolute displacement group In this case the two directions may be on different scales Otherwise ASReml forms a variogram based on polar coordinates It calculates the distance between points dj 4 Gj ljo and an angle 6 180 lt 4 lt 180 subtended by the line from 0 0 to l j1 lij2 with the x axis The angle can be calculated as 0 tan7 lj 1 lij2 choosing 0 lt 6 lt 180 if lij2 gt 0 and 180 lt i lt 0 if lij2 lt 0 Note that the variogram has angular symmetry in that vij vj dij dji and 0 6 180 The variogram presented averages the v4 within 12 distance classes and 4 6 or 8 sectors selected using a VGSECTORS qualifier centred on an angle of i 1 180 s i 1 s A figure is produced which reports the trends in i with increasing distance for each sector ASReml also computes the variogram from predictors of random effects which appear to have a variance structures defined in terms of distance The variogram details are reported in the res file Inference Fixed effects Introduction Inference for fixed effects
70. column numbering is typically within replicates and so the terms specified in the linear model to account for the lattice row and lattice column effects would be Rep latticerow Rep latticecolumn However in this example lattice rows and columns are both numbered from 1 to 30 across replicates see Table 15 6 The terms in the linear model are therefore simply RowBlk Co1lBlk Additional fields row and column indicate the spatial layout of the plots The ASReml input file is presented below Three models have been fitted to these data The lattice analysis is included for comparison in PATH 3 In PATH 1 we use the separable first order autoregressive model to model the variance structure of the plot errors Gilmour et al 1997 suggest this is often a useful model to commence the spatial modelling process The form of the variance matrix for the plot errors R structure is given by cE a Se D 15 5 where X and amp are 15 x 15 and 10 x 10 matrix functions of the column and row autoregressive parameters respectively Gilmour et al 1997 recommend revision of the current spatial model based on the use of diagnostics such as the sample variogram of the residuals from 15 Examples 262 the current model This diagnostic and a summary of row and column residual trends are produced by default with graphical versions of ASReml when a spatial model has been fitted to the errors It can be suppressed by the use of the n opt
71. command to capture the screen output rather than using the L option as the as1 file is not properly closed after a crash Graphics command line options G H 1 N Q Graphics are produced in the PC Linux and SUN 32bit versions of ASReml using the Winteracter graphics library The I INTERACTIVE option permits the variogram and residual graphics to be displayed This is the default unless the L option is specified The N NOGRAPHICS option prevents any graphics from being displayed This is also the default when the L option is specified The Gg GRAPHICS g option sets the file type for hard copy versions of the graphics Hard copy is formed for all the graphics that are displayed 12 Command file Running the job 182 H g HARDCOPY g replaces the G option when graphics are to be written to file but not displayed on the screen The H may be followed by a format code e g H22 for eps Q QUIET is used when running under the control of and editor such as ASRemI Wto suppress any POPUPs PAUSES from ASReml ASReml writes the graphics to files whose names are built up as lt basename gt lt args gt lt type gt lt pass gt lt section gt lt ext gt where square parentheses indicate elements that might be omitted lt basename gt is the name portion of the as file lt args gt is any argument strings built into the output names by use of the RENAME qualifier lt type gt indicates the contents of the figure
72. control tray no bloodworms and a treated tray bloodworms added were grown in a controlled environment room for the duration of the experiment At the end 15 Examples 273 of this time rice plants were carefully extracted the root system washed and root area determined for the tray using an image analysis system described by Stevens et al 1999 Two pairs of trays each pair corresponding to a different variety were included in each run A new batch of bloodworm larvae was used for each run A total of 44 varieties was investigated with three replicates of each Unfortunately the variety concurrence within runs was less than optimal Eight varieties occurred with only one other variety 22 with two other varieties and the remaining 14 with three different varieties In the next three sections we present an exhaustive analysis of these data using equivalent univariate and multivariate techniques It is convenient to use two data files one for each approach The univariate data file consists of factors pair run variety tmt unit and variate rootwt The factor unit labels the individual trays pair labels pairs of trays to which varieties are allocated and tmt is the two level bloodworm treatment factor control treated The multivariate data file consists of factors variety and run and variates for root weight of both the control and exposed treatments labelled yc and ye respectively Preliminary analyses indicated variance heteroge
73. data field labels for the fields being modified created additional data fields can be created by transformation qualifiers 5 Command file Reading the data 48 Data field definition syntax Data field definitions appear in the ASReml command file in the form SPACE label field_type transformations e SPACE is a required space e label is an alphanumeric string to identify the field has a maximum of 31 characters of which only 20 are printed the remaining characters are not displayed must begin with a letter must not contain the special characters or reserved words Table 6 1 and Table 7 3 must not be used e field_type defines how a variable is interpreted if specified in the linear model for a variate leave field_type blank or specify 1 for a model factor various qualifiers are required depending on the form of the factor coding where n is the number of levels of the factor and s is a list of labels to be assigned to the levels xorn is used when the data field has values 1 directly coding for the factor unless the levels are to be labelled see L A n is required if the data field is alphanumeric I n is required if the data is numeric but not l n I must be followed by n if more than 1000 codes are present IL s is used when the data field is numeric with values 1 n and labels are to be assigned to the n levels for example Sex L Male Female If there
74. from a fitted line for possible turning points and if found report them and save them internally in a vector which can be accessed by subsequent parts of the same job using TPn This was added to facilitate location of putative QTL TWOSTAGEWEIGHTS is intended for use with variety trials which will subsequently be New combined in a meta analysis It forms the variance matrix for the predictions inverts it and writes the predicted variety means with the corresponding diagonal elements of this matrix to the pvs file These values are used in some variety testing programs in Australia for a subsequent second stage analysis across many trials A data base is used to collect the results from the indi vidual trials and write out the combined data set The diagonal elements are used as weights in the combined analysis 10 Tabulation of the data and prediction from the model 165 New List of prediction qualifiers qualifier action VPV requests that the variance matrix of predicted values be printed to the pvs file PLOT graphic control qualifiers This functionality was developed and this section was written by Damian Collins The PLOT qualifier produces a graphic of the predictions Where there is more than one prediction factor a multi panel trellis arrangement may be used Al ternatively one or more factors can be superimposed on the one panel The data can be added to the plot to assist informal examinatio
75. included to document the file In this case there are 11 space separated data fields variety column and the complete file has 224 data lines one for each variety in each replicate variety id pid raw repl nloc yield lat long row column optional field labels LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 data line sampling unit 1 BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 data line sampling unit 2 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602 1 4 30 1 4 3 22 8 19 ARAPAHOE 5 1105 661 1 4 33 05 4 3 24 20 NE83404 6 1106 605 1 4 30 25 4 3 25 2 21 1 NE83406 7 1107 704 1 4 35 2 4 3 26 4 22 1 NE83407 8 1108 388 1 4 19 4 8 6 1 2 1 2 CENTURA 9 1109 487 1 4 24 35 8 6 2 4 2 2 2 1 3 i SCOUT66 10 1110 511 1 4 25 55 8 6 3 6 3 2 COLT 11 1111 502 1 4 25 1 8 6 4 8 4 2 NE83498 12 1112 492 1 4 24 6 8 6 6 5 2 NE84557 13 1113 509 1 4 25 45 8 6 7 2 6 2 NE83432 14 1114 268 1 4 13 4 8 6 8 4 7 2 NE85556 15 1115 633 1 4 31 65 8 6 9 6 8 2 NE85623 16 1116 513 1 4 25 65 8 6 10 8 9 2 CENTURAK78 17 1117 632 1 4 31 6 8 6 12 10 2 NORKAN 18 1118 446 1 4 22 3 8 6 13 2 11 2 KS831374 19 1119 684 1 4 34 2 8 6 14 4 12 2 Note that in Chapter 7 these data are analysed using spatial methods of analysis see model 3a in Section 7 3 For spatial analysis using a separable error structure dvaLsaNou VLOONVI GNVTXNOIS ataya Z8t9I8AN EISL8AN TOS98AN aNv1dq4u SESLEAN ANNAAADHO 9068AN ZZ LOTNVL 60S98aN 109984N 909984N OT TLSAN VLOONVT CAVaLSaNOH LOTIN
76. is being used The response variable nominated by the YVAR command line qualifier is not in the data The data values are out of the expected range for binary binomial data there is a problem with forming one of the generated factors The most probable cause is that an interaction cannot be formed You must either use the US error structure or use the ASUV qualifier and maybe include mv in the model a term in the model has no levels a term in the model specification is not among the terms that have been defined Check the spelling there is a problem with the named variable The second field in the R structure line does not refer to a variate inthe data the weight and filter columns must be data fields Check the data summary See the discussion of AISINGULARITIES Maybe increase workspace or restruc ture simplyfy the model special structures are weights the Ainverse and GIV structures The limit is 98 and so no more than 96 GIV structures can be defined The limit is 1500 It may be possible to re structure the job so the limit is not exceeded assuming that the acrual number of parame ters to be estimated is less 14 Error messages 236 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Missing faulty SKIP or A needed for Missing values in design variables factors Missing Value Miscount forming design Missing
77. it could be used by another For tran program or package Factors will have level codes if they were coded using A or I All the data from the run plus an extra column of residuals is in the file Records omitted from the analysis are omitted from the file The pvc file The pvc file contains functions of the variance components produced by running a pin file on the results of an ASReml run as described in Chapter 11 The pin and pvc files for a half sib analysis of the Coopworth data are presented in Section 15 10 The pvs file The pvs file contains the predicted values formed when a predict statement is included in the job Below is an edited version of nin89a pvs See Section 3 6 for the pvs file for the simple RCB analysis of the NIN data considered in that chapter nin alliance trial 14 Jul 2005 12 41 18 nin89a 13 Description of output files 200 predicted variety means SED summary Ecode is E for Estimable for Not Estimable Warning mv_estimates is ignored for prediction Predicted values of yield variety Predicted_Value Standard_Error Ecode LANCER 24 0894 2 4645 E BRULE 27 0728 2 4944 E REDLAND 28 7954 2 5064 E CODY 23 7728 2 4970 E ARAPAHOE 27 0431 2 4417 E NE83404 25 7197 2 4424 E NE83406 25 3797 2 5028 E NE83407 24 3982 2 6882 E CENTURA 26 3532 2 4763 E SCOUT66 29 1743 2 4361 E NE87615 25 1238 2 4434 E NE87619 30 0267 2 4666 E NE87627 19 7126 2 4833 E SED Overall Standard Error of Differen
78. joined by lines by default they are only joined if the x axis variable is numeric Predictions involving two or more factors If these arguments are used all prediction factors except for those specified with only one prediction level must be listed once and only once otherwise these arguments are ignored xaxis factor specifies the prediction factor to be plotted on the x axis superimpose factors specifies the prediction factors to be superimposed on the one panel condition factors specifies the conditioning factors which define the panels These should be listed in the order that they will be used Layout goto n specifies the page to start at for multi page predictions saveplot filename specifies the name of the file to save the plot to layout rows cols specifies the panel layout on each page pycols specifies that the panels be arranged by columns default is by rows plankpanels n specifies that each page contains n blank panels This sub option can only be used in combination with the layout sub option extrablanks n and specifies that an additional n blank panels be used every p pages extraspan p These can only be used with the layout sub option Improving the graphical appearance and readability labcharsize n specifies the relative size of the data points labels default 0 4 panelcharsize n specifies the relative size of the labels used for the panels de fault 1 0 ve
79. likelihood lr log det L H L y L5HL2 ty 1 2 1 9 log det X H X log det H y Pyp 2 4 where P H H X X H X xX H Note that y Py y X7 H y X7 The log likelihood 2 4 depends on X and not on the particular non unique transformation defined by L The log residual likelihood ignoring constants can be written in terms of the mixed model equations see equation 2 11 with W X Z as 1 le 5 log det C log det R log det G y Py 2 5 o 0 where C W R W G G ogi and P R R wc w R 2 Some theory 13 Letting k y Q the REML estimates of k are found by calculating the score 1 U ri lgr ki 5r PH y PH Py 2 6 and equating to zero Note that H 0H rk The elements of the observed information matrix are O70 R 1 1 j tr PH PH Bring 5 tt PHij 5 tt i j 1 y PH PH Py zY PH Py 2 7 where Hj 0 H OK OK The elements of the expected information matrix are ela e lr PH PH 2 8 ri rj 2 vee Given an initial estimate k an update of k 6 using the Fisher scoring FS algorithm is KY 6O 4 TKO KOHU 6 2 9 where U is the score vector 2 6 and I is the expected infor mation matrix 2 8 of k evaluated at 6 For large models or large data sets the evaluation of the trace terms in either 2 7 or 2 8 is either not feasible or is very computer intensive To over
80. lines in the MERGE file It is proposed to extend this so the orders do not need to agree and that multiple lines in the primary file could be merged with the same line of the MERGE file 5 Command file Reading the data 62 Qualifiers relating to data input and output action qualifier IREAD n RECODE IRREC n New IRSKIP n s New For example assuming the field definitions define 10 fields PRIMARY DAT skip 1 IMERGE 6 SECOND DAT SKIP 1 MATCH 1 6 would obtain the first five fields from PRIMARY DAT and the next five from SECOND DAT checking that the first field in each file has the same value Thus each input record is obtained by combining information from each file before any transformations are performed formally instructs ASReml to read n data fields from the data file It is needed when there are extra columns in the data file that must be read but are only required for combination into earlier fields in transformations or when ASReml attempts to read more fields than it needs to is required when reading a binary data file with pedigree iden tifiers that have not been recoded according to the pedigree file It is not needed when the file was formed using the SAVE option but will be needed if formed in some other way see Section 4 2 causes ASReml to read n records or to read up to a data reading error if n is omitted and then process the records it has This allows data to be extracted fro
81. multivariate animal breeding and genetics data and the analysis of regular or irregular spatial data ASReml provides a stable platform for delivering well established procedures while also delivering current research in the application of linear mixed models The strength of ASReml is the use of the Average Information Al algorithm and sparse matrix methods for fitting the linear mixed model This enables it to analyse large and complex data sets quite efficiently One of the strengths of ASReml is the wide range of variance models for the ran dom effects in the linear mixed model that are available There is a potential cost for this wide choice Users should be aware of the dangers of either overfitting or attempting to fit inappropriate variance models to small or highly unbalanced data sets We stress the importance of using data driven diagnostics and encour age the user to read the examples chapter in which we have attempted to not only present the syntax of ASReml in the context of real analyses but also to indicate some of the modelling approaches we have found useful Preface ASReml is one of several user interfaces to the underlying computational engine Genstat in its REML directive and the asreml class of S language functions Butler et al 2007 available for S Plus ASRemI S and R ASReml R use the same engine These are available from VSN http www vsni co uk and have good data manipulation and graphical facilities
82. ol pl 0 lt lt 10 lt lt 1 New 7 Command file Specifying variance structures 124 Details of the variance models available in ASReml base description algebraic number of parameters identifier form corr homo s hetero s variance variance AGAU anisotropic C 1 2 3 2 w gaussian ecm eau Ci 3 os va 0 lt lt 10 lt lt 1 MATk Mat rn with Ci Mat rn see text k k 1 k w first 1 lt k lt 5 gt 0 range v shape 0 5 parameters specified by th 6 gt 0 anisotropy ratio 1 iser a anisotropy angle 0 A 1 2 metric 2 Additional heterogeneous variance models DIAG diagonal IDH Ze 9 x 0 14 9 w US unstructured Xij Qi wer general covari ance matrix OWNk user explicitly z k forms V and OV ANTEL1 1 k order UDU eeki ANTEk k antede Di d D 0 445 pendence a ee Uy 1 Uy ty 1S 9 4 lt k Ui 0 i gt j CHOL 1 1 k order LDL EL CHOLk k cholesky D 4 D 0 145 banded L 1 L l 1 lt i j lt k orea L 0 i j gt k tiks ui k te CHOL 1 C 1 k order LDL ww CHOLKC cholesky D 4 D 0 5 column _ ia ee form L 1 Ly lija 1 lt j lt i 1 lt k lt w 1 Ly 0 k 1 lt 5 lt i 7 Command file Specifying variance structures 125 Details of the variance models available in ASReml base description algebraic number of parameters identifier form corr homo s hetero s variance variance FA 1 i k order DCD
83. previous record by extracting a value for LagA from working variable V4 before loading V4 with the current value of A Transformation syntax Transformation qualifiers have one of six forms namely operator to perform an operation on the current field for example ABS to take absolute values operator value to perform an operation involving an argu p P ment on the current field for example 3 to add 3 to all elements in the field operator V field to perform an operation on the current field using the data in another field for example V2 to subtract field 2 from the current field V target to reset the focus for subsequent transforma tions to field number target V target value to change all of the data in a target field to a given value 1V target V field to overwrite the data in a target field by the data values of another field a special case is when field is 0 instructing ASReml to put the record number into the target field e flags the presence of the transformation e operator is one of the symbols defined in Table 5 1 value is the argument required by the transformation V is the literal character and is followed by the number target or field of a data field the data field is used or modified depending on the context Vfield may be replaced by the label of the field if it already has a label e in the first three forms the operation is performed on the current field this will be the fie
84. required In this example skip 1 tells ASReml to read the data from nin89 asd but to ignore the first line in this file the line containing the field labels The data file line can also contain qualifiers that control other aspects of the analysis These qualifiers are presented in Section 5 8 Tabulation Tabulate statements provide a simple way of exploring the structure of a data file They ap pear immediately before or after the model line In this case the 56 simple variety means for yield are formed and written to a tab output file See Chapter 10 for a discussion of tabulation Specifying the terms in the mixed model The linear mixed model is specified as a list of model terms and qualifiers All elements must be space separated ASReml accommodates a wide range of analyses See Section 2 1 for a brief discussion and general algebraic formu lation of the linear mixed model The model specified here for the NIN data is a simple ran dom effects RCB model including fixed vari ety effects and random replicate effects The reserved word mu fits a constant term inter NIN Alliance trial 1989 variety A id pid row 22 column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r repl predict variety 001 tepl 1 repl 0 IDV 0 1 column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r repl predict variety NIN Alliance trial 1989 variety
85. run are derived from the same batch of larvae whereas between runs the bloodworms come from different sources This defines a block structure of the form run tmt variety run run tmt run tmt variety run run tmt pair tmt Combining the two provides the full block structure for the design namely run run variety run tmt run tmt variety run run variety run tmt units run pair run tmt pair tmt In line with the aims of the experiment the treatment structure comprises va riety and treatment main effects and treatment by variety interactions In the traditional approach the terms in the block structure are regarded as random and the treatment terms as fixed The choice of treatment terms as fixed or random depends largely on the aims of the experiment The aim of this example is to select the best varieties The definition of best is somewhat more com plex since it does not involve the single trait sqrt rootwt but rather two traits 15 Examples 275 namely sqrt rootwt in the presence absence of bloodworms Thus to minimise selection bias the variety main effects and thence the tmt variety interactions are taken as random The main effect of treatment is fitted as fixed to allow for the likely scenario that rather than a single population of treatment by variety effects there are in fact two populations control and treated with a different mean for each There is evidence of this prior to analysis wi
86. separated values file from Excel e Prepare a job file with filename extension as e Run the job file with ASReml Review the various output files revise the job and re run it or extract pertinant results for your report So you need an ASCII editor to prepare input files and review and print output files We directly provide two options ASReml W ASReml W is a graphical tool distributed by VSN http www vsni co uk allowing the user to edit and run ASReml programs and then view the output It is available on the following platforms e Windows 32 bit e Windows 64 bit e Linux 32 bit e Linux 64 bit and e Sun Solaris 32 bit ASReml W has a built in help system explaining its use ConTEXT ConTEXT is a third party freeware text editor with programming extensions which make it a suitable environment for running ASReml under Windows The ConTEXT directory on the CD ROM includes installation files and instructions for configuring it for use in ASReml Full details of ConTEXT are available from http www context cx 1 Introduction 4 1 4 How Theory Getting started Examples Data file Linear model Variance model Prediction Output 1 5 Help to use this guide The guide consists of 15 chapters Chapter 1 introduces ASReml and describes the conventions used in the guide Chapter 2 outlines some basic theory which you may need to come back to New ASReml users are advised to read Chapte
87. the filename does not end in csv or the CSV qualifier is not set commas are treated as white space characters following on a line are ignored so this character may not be used in alphanumeric fields blank spaces tabs and commas must not be used embedded in alphanumeric fields unless the label is enclosed in quotes for example the name Willow Creek would need to be appear in the data file as Willow Creek to avoid error the symbol must not be used in the data file alphanumeric fields have a default size of 16 characters Use the LL qualifier to extend the size of factor labels stored extra data fields on a line are ignored if there are fewer data items on a line than ASReml expects the remainder are taken from the following line s except in csv files were they are taken as missing If you end up with half the number of records you expected this is probably the reason all lines beginning with followed by a blank are copied to the asr file as comments for the output their contents are ignored a data file line may not exceed 2000 characters if the data fields will not fit in 2000 characters put some on the next line 4 Data file preparation 44 Fixed format data files The format must be supplied with the FORMAT qualifier which is described in Table 5 5 However if all fields are present and are separated the file can be read free format Preparing data files in Excel Many users find it
88. the in ner dimension less 10 ASReml prints an observed variance matrix calculated from the BLUPs The observed correlations are printed in the upper tri angle Since this matrix is not well scaled as an es timate of the underlying variance component ma trix a rescaled version is also printed scaled ac cording to the fitted variance parameters The primary purpose for this output is to provide rea sonable starting values for fitting more complex variance structure The correlations may also be of interest After a multivariate analysis a sim ilar matrix is also provided calculated from the residuals placed in the pvc file when postprocessing with a pin file these are residuals that are more than 3 5 stan dard deviations in magnitude these in the are printed in the second column given if a predict statement is supplied in the as file the REML log likelihood is given for each iteration The REML log likelihood should have converged 13 Description of output files 211 Table of output objects and where to find them ASReml output object found in comment residuals yht file and in binary form in dpr file these are printed in column 3 Furthermore for multivariate analyses the residuals will be in data order traits within records However in a univariate analysis with missing values that are not fitted there will be fewer residuals than data records there will be no residual where the data was missi
89. the title for the job and is used to identify the analysis for future ref erence Reading the data The data fields are defined before the data file name is specified Field definitions must be given for all fields in the data file and in the order in which they appear in the data file Data field definitions must be indented In this case there are 11 data fields variety column in nin89 asd see Section 3 3 The A after variety tells ASReml that the first field is alphanumeric and the 4 after rep1 tells ASReml that the field called repl the fifth field read is a numeric factor with 4 lev NIN Alliance trial 1989 variety A id pid raw repl 4 nloc yield lat long row 22 column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r repl predict variety 001 repl 1 repli 0 IDV 0 1 NIN Alliance trial 1989 variety A id NIN Alliance trial 1989 variety A id pid raw repl 4 nloc yield lat long row 22 column 11 nin89 asd skip 1 3 A guided tour 32 See Section 5 7 See Section 5 8 See Chapter 10 See Chapter 6 els Similarly for row and column The other data fields include variates yield and various other variables The data file line The data file name is specified immediately after the last data field definition Data file qualifiers that relate to data input and output are also placed on this line if they are
90. updating of the variance parameters The exact action of IGZ these codes in setting bounds for parameters depends on the particular model IGP the default in most cases attempts to keep the parameter in the theoretical parameter space and is activated when the update of a parameter would take it outside its space For example if an update would make a variance negative the negative value is replaced by a small positive value Under the GP condition repeated attempts to make a variance negative are detected and the value is then fixed at a small positive value This is shown in the output in that the parameter will have the code B rather than P appended to the value in the variance component table GU unrestricted does not limit the updates to the parameter This allows variance parameters to go negative and correlation parameters to exceed 1 Negative variance components may lead to problems the mixed model coefficient matrix may become non positive definite In this case the sequence of REML log likelihoods may be erratic and you may need to experiment with starting values IGF fixes the parameter at its starting value IGZ only applies to FA and FACV models and fixes the corresponding parameter in to zero 0 00 7 Command file Specifying variance structures 134 List of R and G variance structure definition line qualifiers qualifier action 1S2 r 52 52 For multiple parameters the form GXXX
91. use of the AVERAGE and or PRESENT qualifiers will usually resolve the problem The PRESENT qual ifier enables the construction of means by averaging only the estimable cells of the hyper table It is reguarly used for nested factors for example locations nested in regions Table 10 1 is a list of the prediction qualifiers with the following syntax e fis an explanatory variable which is a factor e tis a list of terms in the fitted model e visa list of explanatory variables 10 Tabulation of the data and prediction from the model 163 Table 10 1 List of prediction qualifiers qualifier action Controlling formation of tables AVERAGE f weights is used to formally include a variable in the averaging set and to explicitly set the weights for averaging Variables that only appear in random model terms are not included in the averaging set unless included with this qualifier The default for weights is equal weights weights can be expressed like 3 1 0 2 1 5 to represent the sequence 0 2 0 2 0 2 0 0 2 0 2 The string inside the curly brace is expanded first and the expression n v means n occurrences of v A separate AVERAGE qualifier is re quired for each variable requiring explicit weights or to be added to the default averaging set PARALLEL v without arguments means all classify variables are expanded in parallel Otherwise list the variables from the classify set whose levels are to be taken in parallel
92. v leg v k spl v k and pol v k The points are specified here so that they can be included in the appropriate design matrices vis the name of a data field p is the list of values at which prediction is required See GKRIGE for special conditions per taining to fac x y prediction is used to read predict_points for several variables from a file f vlist is the names of the variables having values defined If the file contains unwanted fields put the pseudo variate label skip in the appropriate position in vlist to ignore them The file should only have numeric values predict_points cannot be specified for design factors is used with SECTION v and COLFAC v to instruct ASReml to setup the R structures for multi environment spatial analysis v is the name of a factor or variate containing row numbers 1 where n is the number of rows on which the data is to be sorted See SECTION for more detail specifies the factor in the data that defines the data sections This qualifier enables ASReml to check that sections have been correctly dimensioned but does not cause ASReml to sort the data unless ROWFAC and COLFAC are also specified Data is assumed to be presorted by section but will be sorted on row and column within section The following is a basic example assuming 5 sites sections When ROWFAC v and COLFAC v are both specified ASReml generates the R structures for a standard AR amp AR spatial analysis The R structu
93. value indent them to avert this message 14 Error messages 230 List of warning messages and likely meaning s warning message likely meaning Warning Fixed levels for factor Warning Initial gamma value is Zero Warning Invalid argument Warning It is usual to include Trait Warning LogL Converged Parameters Not Converged Warning LogL not converged Warning Missing cells in table Warning More levels found in term Warning PREDICT LINE IGNORED TOO MANY Warning PREDICT statement is being ignored Warning Second occurrence of term dropped Warning Spatial mapping information for side Warning Standard errors Warning SYNTAX CHANGE text may be invalid Warning The A qualifier ignored when reading BINARY data Warning The SPLINE qualifier has been redefined Warning The X Y G qualifiers are ignored There is no data to plot Warning The estimation was ABORTED Warning The labels for predictions are erroneous user nominated more levels than are permit ted constraint parameter is probably wrongly as signed fix the argument in multivariate analysis model you may need more iterations restart to do more iterations see CONTINUE missing cells are normally not reported consider setting levels correctly limit is 100 because it contains errors if you really want to fit this term twice create a copy with another name gives details so you can
94. value from the previous non missing record is assumed in that position supplies a Fortran like FORMAT statement for reading fixed for mat files A simple example is FORMAT 314 5F6 2 which reads 3 integer fields and 5 floating point fields from the first 42 characters of each data line A format statement is en closed in parentheses and may include 1 level of nested paren theses for example e g FORMAT 4x 3 14 8 2 Field descriptors are e rX to skip r character positions e rAw to define r consecutive fields of w characters width e rIw to define r consecutive fields of w characters width and e rFw d to define r consecutive fields of w characters width d indicates where to insert the decimal point if it is not explicitly present in the field where r is an optional repeat count In ASReml the A and I field descriptors are treated identi cally and simply set the field width Whether the field is interpreted alphabetically or as a number is controlled by the A qualifier 5 Command file Reading the data 61 New Qualifiers relating to data input and output qualifier action Other legal components of a format statement are e the character required to separate fields blanks are not permitted in the format e the character indicates the next field is to be read from the next line However a on the end of a format to skip a line is not honoured e BZ the default action is to read blank
95. values in variable v MBF stands for My Basis Function and uses the same mechanism as the leg pol Q and spl model functions but with covariates supplied by the user The file f should contain 1 n fields where the first field contains the values which are in the data variable or at which prediction is required and the remaining n fields define the corresponding covariate values SKIP k is an optional qualifier which requests the first k lines of the file be ignored When missing values occur in the design ASRemlwill report this fact and abort the job unless MVINCLUDE is specified see Section 6 10 then missing values are treated as zeros Use the D transformation to drop the records with the missing values instructs ASReml to discard records which have missing values in the design matrix see Section 6 10 suppresses the graphic display of the variogram and residuals which is otherwise produced for spatial analyses in the PC and SUN versions This option is usually set on the command line using the option letter N see Section 12 3 on graphics The text version of the graphics is still written to the res file sets hardcopy graphics file type to ps 5 Command file Reading the data 70 List of occasionally used job control qualifiers qualifier action IPVAL v p PVAL f vlist ROWFAC v SECTION v is a mechanism for specifying the particular points to be predicted for covariates modelled using fac
96. var u2 o3Igg we assume gt var u2 fi 2 Q La4 where o and c are the tmt variety interaction variances for control and treated respectively This model can be achieved using a diagonal variance struc ture for the treatment part of the interaction We also fit a separate run variance for each level of tmt and heterogeneity at the residual level by including the uni tmt 2 term We have chosen level 2 of tmt as we expect more variation for the exposed treatment and thus the extra variance component for this term should be positive Had we mistakenly specified level 1 then ASReml would have estimated a negative component by setting the GU option for this term The portion of the ASReml output for this analysis is 6 LogL 343 428 52 1 1498 262 df i 1 components constrained 7 LogL 343 234 S2 1 1531 262 df 8 LogL 343 228 S2 1 1572 262 df 9 LogL 343 228 S2 1 1563 262 df Source Model terms Gamma Component Comp SE C variety 44 44 2 01903 2 33451 3 01 QP run 66 66 0 276045 0 319178 0 59 OP pair 132 132 0 853941 0 987372 2 59 QP uni tmt 2 264 264 0 176158 0 203684 032 QP Variance 264 262 1 00000 1 15625 2 17 OF tmt variety DIAGonal 1 1 30142 1 50477 2 26 0 tmt variety DIAGonal 2 0 321901 0 372199 0 82 OU tmt run DIAGonal 1 1 20098 1 38864 2 18 OY tmt run DIAGonal 2 1 92457 2 22530 3 07 OU Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 56 5 1276 73 lt 001 4 tmt 1 60 6 448 83 lt 001 The estimated variance com
97. we cannot recommend the use of this technique for general use It is included in the current version of ASReml for advanced users It is highly recom mended that its use be accompanied by some form of cross validatory assessment for the specific dataset concerned For instance one way of doing this would be by simulating data using the same design and using parameter values similar to the parameter estimates achieved such as used in Millar and Willis 1999 The standard GLM Analysis of Deviance A0D should not be used when there are random terms in the model as the variance components are reestimated for each submodel 6 Command file Specifying the terms in the mixed model 101 6 10 Missing values Missing values in the response It is sometimes computationally convenient to NIN Alliance Trial 1989 estimate missing values for example in spa variety tial analysis of regular arrays see example 3a in Section 7 3 Missing values are estimated if row 22 column 11 nin89 asd skip 1 Formally mv creates a factor with a covari yield mu variety r repl the model term mv is included in the model ate for each missing value The covariates are If mv i2 11 column AR1 424 22 row AR1 904 coded 0 except in the record where the par ticular missing value occurs where it is coded 1 The action when mv is omitted from the model depends on whether a univariate or multivariate analysis is being performe
98. weights Check the data Either the field has alphanumeric values but has not been declared using the A qualifier or there is not enough space to hold the levels of the factor To increase the levels insert the expected number of levels after the A or I qualifier in the field definition Use WORKSPACE s to increase the workspace available to ASReml If the data set is not extremely big check the data summary Maybe the response variable is all missing there must be at least 3 distinct data values for a spline term 14 Error messages 235 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Insufficient workspace invalid analysis trait number Invalid binary data Invalid Binomial Variable Invalid definition of factor Invalid error structure for Multivariate Analysis Invalid factor definition Invalid factor in model Invalid model factor Invalid SOURCE in R structure definition Invalid weight filter column number Iteration aborted because of singularities Iteration failed Maximum number of special structures exceeded Maximum number of variance parameters exceeded If ASReml has not obtained the maximum available workspace then use WORKSPACE to increase it The problem could be with the way the model is specified Try fitting a sim pler model or using a reduced data set to dis cover where the workspace
99. which sets command line options and arguments from within the job If the first line of the as file contains a qualifier other than DOPATH it is interpreted as setting command line options and the Title is taken as the next line The option string actually used by ASReml is the combination of what is on the command line and what is on the job control line with options set in both places taking arguments from the command line Arguments on the top job control line are ignored if there are arguments on the command line This section defines the options Arguments are discussed in detail in a following section Command line options are not case sensitive and are combined in a single string preceded by a minus sign for example LNW128 The options can be set on the command line or on the first line of the job either as a concatenated string in the same format as for the command line or as a list of qualifiers For example the command line ASReml h22r jobname 1 2 3 could be replaced with ASReml jobname if the first line of jobname as was either leh2gr 1 2 3 or HARDCOPY EPS RENAME ARGS 1 2 3 Table 12 1 presents the command line options available in ASReml with brief descriptions It also specifies the equivalent qualifier name used on the top job control line Detailed descriptions follow 12 Command file Running the job 180 Table 12 1 Command line options option qualifier type action Frequently use
100. z O O O w O o i T T T T T T 2 1 o 1 2 3 control BLUP Figure 15 11 Estimated difference between control and treated for each variety plotted against estimate for control 15 Examples 284 The independence of and u and dependence between 6 and wy is clearly illustrated in Figures 15 10 and 15 11 In this example the two measures have provided very different rankings of the varieties The choice of tolerance mea sure depends on the aim of the experiment In this experiment the aim was to identify tolerance which is independent of inherent vigour so the deviations from regression measure is preferred 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges We now illustrate the use of random coefficients and cubic smoothing splines for the analysis of balanced longitudinal data The implementation of cubic smoothing splines in ASReml was originally based on the mixed model formulation presented by Verbyla et al 1999 More recently the technology has been enhanced so that the user can specify knot points in the original approach the knot points were taken to be the ordered set of unique values of the explanatory variable The specification of knot points is particularly useful if the number of unique values in the explanatory variable is large or if units are measured at different times The data we use was originally reported by Draper and Smith 1998 ex24N p559 and has r
101. 0 000 IDIAG causes the pedigree identifiers the diagonal elements of the Inverse of the Relationship and the inbreeding coefficients for the individuals calculated as the diagonal of A I to be written to AINVERSE DIA GIV instructs ASReml to write out the A inverse in the format of giv files 9 Command file Genetic analysis 153 List of pedigree file qualifiers qualifier description GROUPS g includes genetic groups in the pedigree The first g lines of the pedigree identify genetic groups with zero in both the sire and dam fields All other lines must specify one of the genetic groups as sire or dam if the actual parent is unknown INBRED generates pedigree for inbred lines Each cross is assumed to be selfed New several times to stabilize as an inbred line as is usual for cereals before being evaluated or crossed with another line Since inbreeding is usually associated with strong selection it is not obvious that a pedigree assumption of covariance of 0 5 between parent and offspring actually holds Do not use the INBRED qualifier with the MGS or SELF qualifiers MAKE tells ASReml to make the A inverse rather than trying to retrieve it from the ainverse bin file IMGS indicates that the third identity is the sire of the dam rather than the dam REPEAT tells ASReml to ignore repeat occurrences of lines in the pedigree file Warning Use of this option will avoid the check that animals occur in chron
102. 0 35006 df 1 components constrained 6 LogL 1417 98 S2 1 0000 35006 df 1 components constrained 7 LogL 1417 77 52 1 0000 35006 df 1 components constrained 8 LogL 1417 62 52 1 0000 35006 df 1 components constrained 9 LogL 1417 28 52 1 0000 35006 df 10 LogL 1417 23 52 1 0000 35006 df 16 LogL 1417 23 S2 1 0000 35006 df Source Model terms Gamma Component Comp SE C at Trait 1 age grp 49 49 0 132682E 02 0 132682E 02 2 02 0P at Trait 2 age grp 49 49 0 908220E 03 0 908220E 03 1 15 OP at Trait 4 age grp 49 49 0 175614E 02 0 175614E 02 1 13 OP at Trait 5 age grp 49 49 0 223617E 03 0 223617E 03 1 73 OP 15 Examples 305 at Treat 1 sex at Trait 2 sex ailirait 3 8ex at Trait 5 sex Residual grp grp grp grp 49 49 0 902586 49 49 15 3623 49 49 0 280673 49 49 1 42136 Unstru 1 1 7 47555 0 902586 15 3623 0 280673 1 42136 7 47555 Covariance Variance Correlation Matrix UnStructured Residual 7 476 0 4918 4 768 1257 0 1139 0 5049 0 9377 2 221 0 4208 1 612 0 1339 0 1875 0 4381 0 3425 0 1056 0 4864 0 2691 3 345 0 4869E 01 0 2473 0 1333 0 3938 0 1298 0 1174 1 333 Covariance Variance Correlation Matrix UnStructured Tr tag 3 898 0 8164 4 877 9 154 0 3029 0 2971 0 6021E 01 0 4375 0 6154 1 107 Covariance Variance Correlation Matrix UnStructured at Tr 1 dam 0 9988 0 5881 0 7024 0 7018 Covariance Variance Correlation Matrix UnStructured 0 6157E 01 3 714 0 5511 2
103. 0000 0 164524 12 13 O P Analysis of Variance NumDF DenDF_con F_inc F_con M P_con 7 mu 1 32 0 8981 48 1093 05 lt 001 3 littersize 1 31 4 27 85 46 43 A lt 001 1 dose 2 24 0 12 05 11 42 A lt 001 2 sex i B01 7 58 27 58 27 A lt 001 Part 4 shows what happens if we wrongly drop dam from this model Even if a random term is not significant it should not be dropped from the model if it represents a strata of the design as in this case 15 Examples 250 Source Model terms Gamma Component Comp SE C Variance 322 317 1 00000 0 253182 12 59 OP Analysis of Variance NumDF DenDF_con F_inc F_con M P_con 7 mu 1 317 0 47077 31 3309 42 lt 001 3 littersize 1 317 0 68 48 146 50 A lt 001 1 dose 2 21120 60 99 58 43 A lt 001 2 sex 1 317 0 24 52 24 52 A lt 001 15 4 Source of variability in unbalanced data Volts In this example we illustrate an analysis of unbalanced data in which the main aim is to determine the sources of variation rather than assess the significance of imposed treatments The data are taken from Cox and Snell 1981 and involve an experiment to examine the variability in the production of car voltage regulators Standard production of regulators involves two steps Regulators are taken from the production line to a setting station and adjusted to operate within a specified voltage range From the setting station the regulator is then passed to a testing station where it is tested and returned if outside the requ
104. 019 3 614 0 4506E 01 0 1407 0 1021 0 7166 Analysis of Variance 15 Tr 16 Tr 17 Tr 19 Tr age sex age brr sex 0 5763 0 3689 0 1849 0 3899E 01 0 6148 0 7217 0 7085E 01 0 2415E 01 0 3041 Q b027E 02 0 6117 0 4104E 01 0 1853 0 1635 0 5176 0 4380 0 2045E 01 0 3338 0 4108E 01 0 7407 NumDF 5 15 5 4 0 4672 0 2570 ereTe i Lit F ine 99 16 116252 59 94 5 10 2 89 3 50 eNA 1 80 13 86 OP OP OP Q P OU There is no guarantee that unstructured variance component matrices will be positive definite unless GP qualifier is set This example highlights this issue We used the GU qualifier on the maternal component to obtain the matrix 0 9988 0 5881 0 5881 0 7018 ASReml reports the correlation as 0 7024 which it obtains by ignoring the sign in 0 7018 This is the maternal component for ywt Since it is entirely reasonable to expect maternal influences on growth to have dissipated at 12 months of age it would be reasonable to refit the model omitting at Tr 2 dam and changing the dimension of the G structure Bibliography Abramowitz M and Stegun I A eds 1965 Handbook of Mathematical Func tions Dover Publications New York Breslow N 2003 Whither PQL Technical Report 192 UW Biostatistics Working Paper Series University of Washington Breslow N and Lin X 1995 Bias correction in generalised linear mixed models with a single component of dis
105. 05 178 7688 E 3 0000 28172 7615 176 9880 E 4 0000 2986 4725 178 7424 E 522 0000 2784 7683 179 1541 E 523 0000 2904 9421 179 5383 E 524 0000 2740 0330 178 8465 E 525 0000 2669 9565 179 2444 E 526 0000 2385 9806 44 2159 E 527 0000 2697 0670 133 4406 E 528 0000 2727 0324 112 2650 E 529 0000 2699 8243 103 9062 E 530 0000 3010 S907 112 3080 E 531 0000 3020 0720 112 2553 E 532 0000 3067 4479 112 6645 E SED Overall Standard Error of Difference 245 8 Note that the replicated check lines have lower SE than the unreplicated test lines There will also be large diffeneces in SEDs Rather than obtaining the large table of all SEDs you could do the prediction in parts predict var 1 525 column 5 5 predict var 526 532 column 5 5 SED to examine the matrix of pairwise prediction errors of variety differences 15 8 Paired Case Control study Rice This data is concerned with an experiment conducted to investigate the tolerance of rice varieties to attack by the larvae of bloodworms The data have been kindly provided by Dr Mark Stevens Yanco Agricultural Institute A full description of the experiment is given by Stevens et al 1999 Bloodworms are a significant pest of rice in the Murray and Murrumbidgee irrigation areas where they can cause poor establishment and substantial yield loss The experiment commenced with the transplanting of rice seedlings into trays Each tray contained 32 seedlings and the trays were paired so that a
106. 1 2 5 5 75 5 19 9 0103 3 21 1 3 7 0 76 6 21 9 4013 3 43 35 1 7 9 75 9 22 6 4013 3 43 35 2 7 8 70 3 23 9 4013 3 43 35 3 9 0 76 2 25 4 4014 3 43 35 1 8 3 66 5 22 2 4014 3 43 35 2 7 8 63 9 23 3 4014 3 43 35 3 9 9 69 8 25 5 4015 3 43 35 1 6 9 75 1 20 0 4015 3 43 35 2 7 6 71 2 20 3 4015 3 43 35 3 8 5 78 1 21 7 8 2 Model specification The syntax for specifying a multivariate linear model in ASReml is Y variates fixed r random f sparse_fixed e Y variates is a list of traits e fixed random and sparse_fized are as in the univariate case see Chapter 6 but involve the special term Trait and interactions with Trait The design matrix for Trait has a level column for each trait Trait by itself fits the mean for each variate In an interaction Trait Fac fits the factor Fac for each variate and Trait Cov fits the covariate Cov for each variate ASReml internally rearranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait i e nt analysis records ordered Trait within data record If the data is already in this long form use the ASMV t qualifier to indicate that a multivariate analysis is required 8 Command file Multivariate analysis 144 8 3 Variance structures Using the notation of Chapter 7 consider a multivariate analysis with t traits and n units in which the data are ordered traits within units An algebraic express
107. 1 4 30 1 4 3 22 8 19 1 nloc yield lat long row column Note that optional field labels file augmented by missing values for first 15 plots and 3 buffer plots and variety coded LANCER to complete 22x11 array buffer plots between reps original data e the pid raw repl and yield data for the missing plots have all been made NA one of the three missing value indicators in ASReml see Section 4 2 e variety is coded LANCER for all missing plots one of the variety names must be used but the particular choice is arbitrary 3 4 The ASReml command file See Chapters 5 6 and 7 for de tails e a title line to describe the job By convention an ASReml command file has a as extension The file defines e labels for the data fields in the data file and the name of the data file e the linear mixed model and the variance model s if required output options including directives for tabulation and prediction 3 A guided tour 31 Below is the ASReml command file for an RCB analysis of the NIN field trial data highlighting the main sections Note the order of the main sections title line gt data field definition gt data field definition gt data file name and qualifiers gt tabulate statement gt linear mixed model definition gt predict statement gt variance model specification gt The title line The first non blank line in an ASReml com mand file is taken as
108. 11i 1 SLOPES FOR LOG ABS RES on LOG PV for Section 1 0 15 x xk kkk xk OK KOK OK xk kkk kkk kkk OK 2K KK KOK KOK KOK kk OK KOK 2 2 2 K K k kK k k k kk K kK kK k OK k x x DK KK k k k k 2K kK kk kk k k k k x ORK xk kk kk RK k k kk 2 2 2K kkk k kk k k k k k k 2k k 2K k k K k xk Min Mean Max 24 873 0 27954 16 915 omitting 18 zeros 13 Description of output files 203 0 0 T2 29 87 1 44 11 18 18 40 29 29 75 99 114 257 333 227 167 183 277 Spatial diagnostic statistics of Residuals Residual Plot and Autocorrelations lt LOo xXH gt se 0 077 xxx X O x x x gt X lo XxXXxX X x xxxt Oo xxtxHxx xxx xXx x xX o XxxXXx xXXK ooL lt Oo x x xXx x H lt lt lt lt lt 00 xX lt O lt lt LLLoo L lt lt lt lt 0 OL o 1 0 28 0 38 0 50 2 0 47 6 27 0 39 3 0 05 0 11 0 19 Residuals Percentage 0 0 52 20 32 14 0 3 2 84 87 103 43 24 218 332 352 319 356 335 352 323 ie 64 26 6 a9 81 90 174 253 183 288 ie 11 30 0 x 6 t Ete ee 0 65 0 77 0 51 0 56 0 28 0 35 of sigma 6 979 0 0 0 0 0 132 26 0 63 15 87 8 4 23 Pa 41 15 51 25 45 18 30 56 9 81 130 94 10 55 23 64 130 84 122 19 38 29 58 63 162 52 26 0 97 189 118 124 14 52 56 130 188 29 78 oOo0Or e O u 99 32 32 12
109. 180D 02 452 0 2445 0 09648 0 09049 10 1 100 SE SK 3 3 0 1843D 01 452 0 2666 0 01507 0 00853 0 11 1 0 0 4 3 0 1095D 02 452 0 2464 0 08957 0 08353 1 1 0 1 0 0 5 3 0 1271D 02 452 0 2425 0 10390 0 09795 10 0411 0 SE SK 6 3 0 9291D 01 452 0 2501 0 07594 0 06981 9 ipii 7 3 0 9362D 01 452 0 2499 0 07652 0 07039 oO OO tt 2 8 8 3 0 1357D 02 452 0 2406 0 11091 0 10501 O e he OG Bh oD SE SRK 9 3 0 9404D 01 452 0 2498 0 07687 0 07074 OC 2 tt Oo 2 10 3 0 1266D 02 452 0 2426 0 10350 0 09755 t a Op 2 Dp 11 3 0 1261D 02 452 0 2427 0 10313 0 09717 4 O70 2 i 12 3 0 9672D 01 452 0 2492 0 07906 0 07295 Ct Oo 2 3 13 3 0 9579D 01 452 0 2494 0 07830 0 07218 Oo o gt m idi i 14 3 0 9540D 01 452 0 2495 0 07797 0 07185 oO 0 Oo 2 2 15 3 0 1089D 02 452 0 2465 0 08907 0 08302 tO Go 2 16 3 0 2917D 01 452 0 2642 0 02384 0 01736 0 Bade wd ot 17 3 0 2248D 01 452 0 2657 0 01838 0 01187 O O 2 gt 2 ok i 18 3 0O 1111D 02 452 0 2460 0 09088 0 08484 fo oO 2 19 3 0 1746D 01 452 0 2668 0 01427 0 00773 opit ogi 20 3 0 1030D 02 452 0 2478 0 08423 0 07815 1 be 8 E 21 3 0 1279D 02 452 0 2423 0 10454 0 09860 io 0 oO 2 2 22 3 O 8086D 01 452 0 2527 0 06609 0 05989 oO 1 oO oO 2 23 3 0O 7437D 01 452 0 2542 0 06079 0 05456 oo 2 2 24 3 0 1071D 02 452 0 2469 0 08755 0 08149 fo Oo 2 2 25 3 0 1370D 02 452 0 2403 0 11200 0 10611 oO oO O BD i i SRE KKK 26 3 0 1511D 02 452 0 2372 0 12351 0 11770 1 0 001 0 SE SK 27 3 0 1353D 02 452 0 2407 0 11064 0 10473 0 1001 0
110. 247 ricem asd 279 voltage asd 250 wether dat 145 wheat asd 267 debug options 181 Denominator Degrees of Freedom 20 dense 101 design factors 101 diagnostics 18 diallal analysis 91 direct product 7 9 106 direct sum 9 discussion list 4 Dispersion parameter 98 distribution conditional 12 marginal 12 Ecode 39 Eigen analysis 209 Eigen analysis example 299 EM update 134 environment variable job control 63 equations mixed model 14 error variance heterogeneity 9 errors 213 Excel 44 execution time 209 expected information matrix 13 F statistics 20 Factor qualifier DATE 48 DMY 49 LL Label Length 49 MDY 49 PRUNE 50 SKIP fields 50 SORT 50 SORTALL 50 TIME 49 factors 42 file GIV 154 pedigree 150 Fisher scoring algorithm 13 fixed effects 7 Fixed format files 60 fixed terms 83 88 multivariate 143 primary 88 sparse 89 Forming a job template 34 Index 314 free format 42 functions of variance components 38 170 correlation 173 heritability 172 linear combinations 171 syntax 171 G structure 106 definition lines 115 120 header 120 more than one term 135 Gamma distribution 98 Generalized Linear Models 96 genetic data 2 groups 152 links 149 models 149 qualifiers 149 relationships 150 GIV 154 GLMM 99 graphics options 181 half sib analysis 293 help via email 4 heritability 172 209 heterogeneity error variance 9 identi
111. 3 List of commonly used job control qualifiers qualifier action CONTINUE CONTRAST s t p is used to restart resume iterations from the point reached in a previous run This qualifier can alternately be set from the command line using the option letters C continue or F fi nal see Section 12 3 on command line options After each iteration ASReml writes the current values of the variance pa rameters to a file with extension rsv re start values with information to identify individual variance parameters The CONTINUE qualifier causes ASReml to scan the rsv file for parameter values related to the current model replacing the values obtained from the as file before iteration resumes If the model has changed ASReml will pick up the values it recognises as being for the same terms Furthermore AS Reml will use estimates in the rsv file for certain models to provide starting values for certain more general models in serting reasonable defaults where necessary The transitions recognised are listed and discussed in Section 7 10 DIAG to FA1 DIAG to CORUH uniform heterogeneous CORUH to FA1 FAi to FAi 1 FAi to CORGH full heterogeneous FAi to US full heterogeneous CORGH heterogeneous to US provides a convenient way to define contrasts among treat ment levels CONTRAST lines occur as separate lines between the datafile line and the model line s is the name of the model term being defined t is
112. 3 24 25 26 27 28 29 30 10 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Slate Hall example Rep 6 Six replicates of 5x5 plots in 2x3 arrangement RowBlk 30 Rows within replicates numbered across replicates ColBlk 30 Columns within replicates numbered across replicates row 10 Field row column 15 Field column 15 Examples 263 variety 25 yield barley asd skip 1 DOPATH 1 PATH 1 AR1 x AR1 y mu var predict var 12 15 column AR1 0 1 Second field is specified so ASReml can sort 10 row AR1 0 1 records properly into field order PATH 2 AR1 x AR1 units y mu var r units predict var 12 15 column AR1 0 1 10 row AR1 0 1 PATH 3 incomplete blocks y mu var r Rep Rowblk Colblk predict var Abbreviated ASReml output file is presented below The iterative sequence has converged to column and row correlation parameters of 68377 45859 respec tively The plot size and orientation is not known and so it is not possible to ascertain whether these values are spatially sensible It is generally found that the closer the plot centroids the higher the spatial correlation This is not always the case and if the highest between plot correlation relates to the larger spatial distance then this may suggest the presence of extraneous variation see Gilmour et al 1997 for example Figure 15 5 presents a plot of the sample variogram of the residuals from this model The plot appears in reasonable agreement with the
113. 315 df df df df df df df df oo oOo of 0 2 Ou 0 58667 Approximate stratum variance decomposition Component Coefficients Stratum Degrees Freedom dam 22 56 Residual Variance 292 44 Source Model dam 27 Variance 322 Analysis of Variance 7 mu 3 littersize 1 dose 2 sex 8 dose sex Variance 127762 0 165300 terms Gamma 27 0 586674 315 t NumDF 1 1 2 1 2 00000 1 1 5 0 0 1 0 1 0 Component O 969770E 01 2 92 Q 165300 DenDF_con F_inc 32 0 9049 48 31 5 23 9 299 302 8 oo 27 99 12 15 57 96 0 40 1000 1 000 1488 1 000 2446 1 000 4254 1 000 5521 1 000 5854 1 000 5867 1 000 5867 1 000 1 0000 Comp SE 12 09 Yo OP OP F_con M P _ con 1099 20 b 46 25 B 11 51 A 57 96 A 0 40 B ES 0 lt Ss lt 001 001 001 001 673 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 27 effects fitted 4 dam SLOPES FOR LOG ABS RES on LOG PV for Section Zit 3 possible outliers see res file 1 The iterative sequence has converged and the variance component parameter for dam hasn t changed for the last three iterations indicate that the interaction between dose and sex is not significant The F_con column helps us to assess the significance of the other terms in the model It confirms littersize is significant after the other ter
114. 378585E 01 0 442457E 01 0 502071E 01 0 430512E 01 0 444776E 01 0 351386E 01 0 352370E 01 0 475935E 01 0 000000E 00 0 886687E 01 0 876708E 01 0 416124E 01 0 428109E 01 0 000000E 00 0 412130E 01 0 419780E 01 0 985202E 01 0 439485E 01 0 901191E 01 0 423753E 01 0 527289E 01 0 369983E 01 0 359516E O1 0 000000E 00 The first 5 rows of the lower triangular matrix in this case are 48 6802 0 2 98660 4 70711 0 313123 O oo The vvp file 8 07551 4 56648 8 86687 4 10031 4 76546 8 76708 The vvp file contains the inverse of the average information matrix on the com ponents scale The file is formatted for reading back under the control of the pin file described in Chapter 11 The matrix is lower triangular row wise in the order the parameters are printed in the asr file This is nin89a vvp with the parameter estimates in the order error variance spatial row correlation spatial column correlation Variance of Variance components 3 51 0852 0 217089 0 318058E 02 0 677748E 01 0 201181E 02 0 649355E 02 13 Description of output files 209 13 5 ASReml output objects and where to find them Table 13 2 presents a list of objects produced with each ASReml run and where to find them in the output files Table 13 2 ASReml output objects and where to find them output object found in comment analysis of variance asr file the analysis of vari
115. 4 at Trait 5 at Trait 1 at Trait 2 Residual LogL 1424 LogL 1421 LogL 1420 LogL 1419 LogL 1419 LogL 1419 LogL 1419 LogL 1419 LogL 1419 LogL 1419 age age age age sex BEX at Trait 3 at Trait 5 sex sex singularities 37 s2 1 58 o2 1 or 52 1 ii 52s f 93 52 1 92 52 1 92 S2 1 92 S2 1 92 52 1 92 52 f 92 52 1 Model grp 49 grp 49 grp 49 grp 49 grp 49 grp 49 grp 49 grp 49 UnStru detected in design matrix 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 terms 49 49 49 49 49 49 49 49 1 1 35006 35006 35006 35006 35006 35006 35006 35006 35006 35006 35006 df df df df df df df df df df df Gamma 0 135360E 02 0 101561E 02 0 176505E 02 0 209279E 03 0 919610 15 3912 0 279496 1 44032 9 46220 20000 00 2 components constrained 1 components constrained Component 0 135360E 02 0 101561E 02 0 176505E 02 0 209279E 03 0 919610 15 3912 0 279496 1 44032 9 46220 Comp SE 2 1 24 1213 1 68 Zs 3 3 1 03 89 oO Sy gil 80 33 30 x o o o Co Oo Oo 6 So Oo hg ht he hg a A el 15 Examples 299 Covariance Variance Correlation Matrix UnStructured Residual 9 462 0 5691 0 2356 0 1640 0 2183 7 3382 17 54 0 4241 0 2494 0 4639 0 2728 0 6686 0 1417 0 3994 0 1679 0 9625 1 994 0 2870 3 642 0 4875E 01 0 8336 2 412 0 7846E 01 0 1155 1 541
116. 4 AINV 125 ANTE 1 124 AR2 121 AR3 121 ARMA 122 AR 1 121 CHOL 1 C 124 CHOL 1 124 CIR 123 CORB 122 CORGB 122 CORGH 122 CORU 122 DIAG 124 EXP 122 FACV 1 125 FA 1 125 GAU 123 GIV 125 IDH 124 ID 121 IEUC 123 IEXP 123 IGAU 123 LVR 123 MA2 122 MAT 124 MA 1 122 OWN 124 SAR2 121 SAR 121 SPH 123 US 124 XFA 1 125 residual error 7 likelihood 12 response 83 running the job 33 scale parameter 7 score 13 Score test 66 section 9 separability 10 separable 111 singularities 102 sparse 101 sparse fixed 83 spatial analysis 261 data 2 model 110 specifying the data 47 split plot design 242 tabulation 32 qualifiers 157 syntax 157 tests of hypotheses 19 title line 31 47 trait 42 142 transformation 50 syntax 52 typographic conventions 5 unbalanced data 250 nested design 246 UNIX 177 unreplicated trial 267 variance Index 320 parameter 7 variance components functions of 170 variance header line 115 117 variance model combining 16 134 description 120 forming from correlation models 126 qualifiers 133 specification 106 specifying 107 variance parameters 11 constraining 115 137 between structures 138 equality 137 variance structures 33 115 multivariate 144 Wald F tests 20 weight 83 96 weights 42 workspace options 183
117. 4 the second argument would be 304 For a G structure relating to the model term fac z y use fac z y For example y mu r fac x y fae x y 1 fac x y fac x y IEUCV 7 1 3 e FAK FACVk and XFAk are different parameterizations of the factor analytic model in which is modelled as II Y where T is a matrix of loadings on the covariance scale and W is a diagonal vector of specific variances See Smith et al 2001 and Thompson et al 2003 for examples of factor analytic models in multi environment trials The general limitations are that Y may not include zeros except in the XFAk formulation constraints are required in I for k gt 1 for identifiability Typically one zero is placed in the second column two zeros in the third column etc The total number of parameters fitted kw w k k 1 2 may not exceed w w 1 2 wxw e in FAk models the variance covariance matrix X is modelled on the cor relation scale as amp DCD where D is diagonal such that DD diag X C is a correlation matrix of the form FF E where F is a matrix of loadings on the correlation scale and E is diagonal and is defined by difference the parameters are specified in the order loadings for each factor F fol lowed by the variances diag when k is greater than 1 constraints on 7 Command file Specifying variance structures 131 difficult difficult the elements of F ar
118. 5 YWTh2 72 57 GFWh2 75 60 FDMh2 79 64 FATh2 84 69 GenCor 24 38 MatCor 85 90 defines defines defines defines F oo e ee e e e e e e e ea eo a a 15 Examples 301 55 56 57 70 T1 T T3 85 86 87 WWTh2 YWTh2 GFWh2 FDMh2 FATh2 GenCor GenCor GenCor GenCor GenCor GenCor GenCor GenCor GenCor GenCor MatCor MatCor MatCor phenWYG phenWYG i phenWYG i 9 0 i Direct 24 Direct 25 Direct 26 Direct 27 Maternal Maternal Maternal wuu naonna A A BWWD NPrPRPWNHRWNRFPNFR KB 39 40 41 0 0 Direct Direct Direct Direct Direct Te Trs Tr Tr Tr Tr Tr Te Tr Tr Mater Mater Si si si si si gi si si si si Mater 15 76 11 76 23 92 2 376 2 698 6 174 2120 1 567 1 521 6419 w NNN 70 phenWYG 72 phenWYG 75 phenWYG 79 phenD 18 64 0 3130 0 3749 0 6313 0 6458 0 8487 1 585 0 7330E 0 3788 0 4368 0 7797 01 55 57 60 3 84 phenF 23 69 25 SQR Tr 27 SQR Tr 28 SQR Tr 30 SQR Tr 31 SQR Tr 32 SQR Tr 34 SQR Tr 35 SQR Tr 36 SQR Tr 37 SQR Tr 86 SQR Mater 88 SQR Mater 89 SQR Mater si 24 Tr si 24 Tr si 26 Tr si 24 Tr si 26 Tr si 29 Tr si 24 Tr si 26 Tr si 29 Tr si 33 Tr 85 Mater 85 Mater 87 Mater si si si si si si si si si gi 0 0 0 0 0 26 0 29 29 33 33 0 33 38 38 38 0 38 0 87 90 90
119. 5 9 71503 9 71503 7 68295 9 71503 9 71503 80 0000 9 71503 9 71503 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 SED Standard Error of Difference Min 7 6830 Mean 9 1608 Max 9 7150 15 3 Unbalanced nested design Rats The second example we consider is a data set which illustrates some further aspects of testing fixed effects in linear mixed models This example differs from the split plot example as it is unbalanced and so more care is required in assessing the significance of fixed effects The experiment was reported by Dempster et al 1984 and was designed to compare the effect of three doses of an experimental compound control low and high on the maternal performance of rats Thirty female rats dams were randomly split into three groups of 10 and each group randomly assigned to the three different doses All pups in each litter were weighed The litters differed in total size and in the numbers of males and females Thus the additional covariate littersize was included in the analysis The differential effect of the compound 15 Examples 247 on male and female pups was also of interest Three litters had to be dropped from experiment which meant that one dose had only 7 dams The analysis must account for the presence of between dam variation but must also recognise the stratification of the experimental units pups within litters and that doses and littersize belong to the dam stratum Table 15 2
120. 5918 1 0000 0 67205 Source Model terms Gamma Component Comp SE C variety 532 532 0 959184 82758 6 8 98 oP Variance 670 666 1 00000 86280 2 Sale OP Residual AR AutoR 67 0 672052 0 672052 16 04 1U 15 Examples 269 Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 83 6 9799 18 lt 001 3 weed 1 477 0 109 33 lt 001 The iterative sequence converged the REML estimate of the autoregressive pa rameter indicating substantial within column heterogeneity The abbreviated output from the two dimensional AR1xAR1 spatial model is 1 LogL 4277 99 S2 0 12850E 06 666 df 2 LogL 4266 13 S2 0 12097E 06 666 df 3 LogL 4253 05 S2 0 10777E 06 666 df 4 LogL 4238 72 S2 83156 666 df 5 LogL 4234 53 S2 79868 666 df 6 LogL 4233 78 S2 82024 666 df 7 LogL 4233 67 S2 82725 666 df 8 LogL 4233 65 S2 82975 666 df 9 LogL 4233 65 S2 83065 666 df 10 LogL 4233 65 S2 83100 666 df Source Model terms Gamma Component Comp SE C variety 532 532 1 06038 88117 5 2 92 OP Variance 670 666 1 00000 83100 1 8 90 OP Residual AR AutoR 67 0 685387 0 685387 16 65 OU Residual AR AutoR 10 0 265909 0 285909 3 87 OU Analysis of Variance NumDF DenDF Fang Prob 7 mu 1 41 7 6248 65 lt 001 3 weed 1 491 2 85 84 lt 001 The change in REML log likelihood is significant x 12 46 p lt 001 with the inclusion of the autoregressive parameter for columns Figure 15 6 presents the sample variogram of the residuals for the AR1xAR1 model There is an
121. 63 OU 14 Error messages ANOVA outliers Residual AR AutoR 11 0 437483 0 437483 5 43 0U Analysis of Variance NumDF DenDF F_inc Prob 12 mu 1 25 0 331 85 lt 001 1 variety 55 110 8 2222 lt 001 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 13 mv_estimates 6 possible outliers in section Finished 14 Jul 2005 12 41 26 862 14 4 An example See 2a in Sec This is the command file for a simple RCB tion 7 3 analysis of the NIN variety trial data in the first part However this file contains eight common mistakes in coding ASReml We also show two common mistakes associated with spatial analyses in the second part The errors are highlighted and the numbers indicate the order in which they are detected Each error is discussed with reference to the output written to the asr file A summary of the errors is as follows 1 data file not found 2 unrecognised qualifier 3 incorrectly defined alphanumeric factor 4 comma missing from first line of model 5 misspelt variable label in linear model 6 misspelt variable label in G structure header line 7 wrong levels declared in G structure model line 18 effects fitted 1 see res file LogL Converged nin alliance trial variety 56 3 id pid raw repl 4 nloc yield lat long row 22 column 11 nine asd slip 1 dopart 1 1 amp 2 part 1 yield mu
122. 680 3 0 1057D 02 452 0 2472 0 08641 0 08035 1 1 0000 The siln file fitted fitted fitted fitted fitted 33 32 31 0 0 o ooo PRPRPRROOCOCOOCCOCOO 0 0 o E E Soon oe on oo ooo oo Ss 0 ooo eooo0oocoooooCcoCcoos FOORO wW 30 29 28 0 coo coco oCoC OCOD OOOO COOD 0 ooo Seeocooocoocaoooesooe s 0 coo SeSoOooCoOSoo oOo eo 6 OS Oo 42 04 102 50 v2 03 40 01 JQ1 226 i 44 w w w w w w www B 2199 2198 ooo ooo ooo Roo Room omomome ooo SSGooecoaoescooescoos 0 929 0 088 0 613 0 234 7 are zero 672 are zero singular singular 25 24 23 9 0 Q 0 oo 9 CEE EEEE EREE EEEE 0 e ooo eoooueoooaoop ooo oOo oO Ss 0 e ooo C E E E Z E E E Z T I E E E E The sln file contains estimates of the fixed and random effects with their stan dard errors in an array with four columns ordered as 13 Description of output files 194 variety estimates intercept missing value estimates factor_name level estimate standard_error Note that the error presented for the estimate of a random effect is the square root of the prediction error variance In a genetic context for example where a relationship matrix A is involved the accuracy is 1 ae where s is the standard error reported with the BLUP u for the ith individual f is the inbreeding coefficient report
123. 80 FORMAT 60 I GAMMA distribution 98 IGF 133 IGIV 152 GKRIGE 69 IGP 133 GRAPHICS 180 GROUPS 153 GU 133 1GZ 133 1G 49 66 68 HARDCOPY 180 HPGL 69 IDENTITY link 98 INBRED 153 INCLUDE 63 INTERACTIVE 180 IT 48 JOIN 66 69 180 Jddm 54 Jmmd 54 Jyyd 54 KNOTS 80 ILAST 75 LOGARITHM 98 Index 317 LOGFILE 180 LOGIT 98 LOGIT link 98 ILOG link 97 IL 48 IMAKE 153 MATCH 61 IMAXIT 65 IMAX 54 MBF 69 MERGE 61 IMGS 153 IMIN 54 IMM transformation 54 57 MOD 54 MVREMOVE 69 IM 54 INA 54 NEGBIN distribution 98 NOCHECK 80 NOGRAPHS 180 NOREORDER 80 NORMAL 54 I NORMAL distribution 97 NOSCRATCH 80 OFFSET variable 99 ONERUN 180 OWN 75 PEARSON residuals 99 PLOT 164 POISSON distribution 98 POLPOINTS 80 PPOINTS 80 PRINTALL 164 PRINT 76 PROBIT 98 IPS 69 PVAL 70 IPVR GLM fitted values 99 PVSFORM 76 1PVW GLM fitted values 99 IP 48 QUIET 180 READ 62 RECODE 62 IRENAME 180 REPEAT 153 REPLACE 54 REPORT 80 RESCALE 54 RESIDUALS 76 RESPONSE residuals 99 ROWFAC 67 70 IRREC 62 I RSKIP 62 1 2 1 134 1S2 r 134 1S 2 134 ISAVE 76 I SCALE 81 SCORE 81 SCREEN 77 SECTION 70 ISED 164 I SEED 55 I SELECT 60 ISELF 153 ISEQ 55 ISETN 55 I SETU 55 ISET 55 ISIN 53 ISKIP 60 153 SLNFORM
124. 9 grass asd skip 1 ASUV 15 Examples 254 The focus is modelling of the error variance for the data Specifically we fit the multivariate regression model given by Y DT E 15 1 where Y 4 is the matrix of heights D is the design matrix T is the matrix of fixed effects and E is the matrix of errors The heights taken on the same plants will be correlated and so we assume that var vec E J14 8 X 15 2 where X3 5 is a symmetric positive definite matrix The variance models used for X are given in Table 15 4 These represent some commonly used models for the analysis of repeated measures data see Wolfinger 1986 The variance models are fitted by changing the last four lines of the input file The sequence of commands for the first model fitted is yl y3 y5 y7 y10 Trait tmt Tr tmt r units 120 14 Trait Table 15 4 Summary of variance models fitted to the plant data number of REML model parameters log likelihood BIC Uniform 2 196 88 401 95 Power 2 182 98 374 15 Heterogeneous Power 6 171 50 367 57 Antedependence order 1 9 160 37 357 51 Unstructured 15 158 04 377 50 The split plot in time model can be fitted in two ways either by fitting a units term plus an independent residual as above or by specifying a CORU variance model for the R structure as follows yi y3 y5 y7 yi0O Trait tmt Tr tmt 120 14 Trait O CORU 5 15 Examples 255 The two forms for are given by
125. ARNING Unrecognised qualifier at character 9 slip 1 followed by the fault message ERROR Reading the data The warning does not cause the job to terminate immediately but arises because slip is not a recognised data file line qualifier the correct qualifier is skip The job terminates when reading the header line of the nin asd file which is alphabetic when it is expecting numeric values The following output displays the error message produced ASReml 1 99a 01 Aug 2005 nin alliance trial Build c 26 Jul 2005 32 bit 27 Jul 2005 15 41 38 987 64 00 Mbyte Windows ninerr2 14 Error messages 219 error hint give away Licensed to Arthur Gilmour Folder C data ex manex QUALIFIERS SLIP 1 Warning Unrecognised qualifier at character 9 SLIP 1 QUALIFIER DOPART 1 is active Reading nin asd FREE FORMAT skipping O lines Univariate analysis of yield Error at field 1 wariety of record 1 line 1 Since this is the first data record you may need to skip some header lines see SKIP or append the A qualifier to the definition of factor variety Fault O Missing faulty SKIP or A needed for variety Last line read was variety id pid raw rep nloc yield lat long row column Currently defined structures COLS and LEVELS oi o gt 1 variety 1 56 id pid raw repl nloc yield lat Oo AON Do FPF UDN e e PP BP BP RP RB BP e rere BP BP BP eB long o AN DT A WN KF OO jo N N N N row m RB 11
126. ASReml User Guide Release 2 0 A R Gilmour NSW Department of Primary Industries Orange Australia B J Gogel Department of Primary Industries Brisbane Australia B R Cullis NSW Department of Primary Industries Wagga Wagga Australia R Thompson Rothamsted Research Harpenden United Kingdom ASReml User Guide Release 2 0 ASReml is a statistical package that fits linear mixed models using Residual Maximum Likelihood REML It is a joint venture between the Biometrics Pro gram of NSW Department of Primary Industries and the Biomathematics Unit of Rothamsted Research Statisticians in Britain and Australia have collaborated in its development Main authors A R Gilmour B J Gogel B R Cullis and R Thompson Other contributors D Butler M Cherry D Collins G Dutkowski S A Harding K Haskard A Kelly S G Nielsen A Smith A P Verbyla S J Welham and I M S White Author email addresses Arthur Gilmour dpi nsw gov au Beverley Gogel dpi qld gov au Brian Cullis dpi nsw gov au Robin Thompson bbsrc ac uk Copyright Notice Copyright 2006 NSW Department of Primary Industries All rights reserved Except as permitted under the Copyright Act 1968 Commonwealth of Aus tralia no part of the publication may be reproduced by any process electronic or otherwise without specific written permission of the copyright owner Nei ther may information be stored electronically in any form whatever wi
127. Brian Cullis and Arthur Gilmour wish to thank the NSW Department of Primary Indus tries for providing a stimulating and exciting environment for applied biometri cal research and consulting Rothamsted Research receives grant aided support from the Biotechnology and Biological Sciences Research Council of the United Kingdom We sincerely thank Ari Verbyla Sue Welham Dave Butler and Alison Smith the other members of the ASReml team Ari contributed the cubic smoothing splines technology information for the Marker map imputation on going testing of the software and numerous helpful discussions and insight Sue Welham has over seen the incorporation of the core into Genstat and contributed to the predict functionality Dave Butler has developed the asreml class of functions for S plus and R Alison contributed to the development of many of the approaches for the analysis of multi section trials We also thank Ian White for his contribution to the spline methodology and Simon Harding for the licensing and installa tion software and for his development of the ASReml W environment for running ASReml The Mat rn function material was developed with Kathy Haskard a PhD student with Brian Cullis and the denominator degrees of freedom mate rial was developed with Sharon Nielsen a Masters student with Brian Cullis Damian Collins contributed the PREDICT PLOT material Greg Dutkowski has contributed to the extended pedigree options The asreml
128. Covariance Variance Correlation Matrix UnStructured Tr sire 0 5941 0 7044 0 2966 0 2032 0 2703 0 6745 1 544 0 1364E 01 0 1224 0 5726 0 2800E 01 0 2076E 02 0 1500E 01 0 1121 0 4818E 02 0 6238E 01 0 6056E 01 0 5469E 02 0 1586 0 6331 0 3789E 01 0 1294 Covariance Variance Correlation Matrix UnStructured at Tr 1 dam Covariance Variance Correlation Matrix UnStructured at Tr 1 1lit 0 4096E 01 0 1073E 03 0 4586E 01 0 3308E 01 2 161 1 010 0 7663 2 196 2 186 0 8301 0 1577 0 1718 0 1959E 01 3 547 0 5065 0 1099 1 555 2 657 0 1740 0 5150 0 2787E 01 0 3821E 01 0 1915E 01 0 3282 0 7312E 01 0 7957 0 4191E 01 0 8984 Analysis of Variance 15 Tr de Te 1 Tes 19 Tr In the age brr sex age sex NumDF 5 15 5 4 Eigen Analysis of UnStructured matrix for Residual Eigen values Percentage 1 op WN oo o S Eigen Analysis of Eigen values Percentage 1 arP WD Eigen Analysis of 22 458 69 474 O 8509 0335 1168 1187 4970 5 210 16 118 0 8663 0 4765 0 0230 0 0871 0 1196 3 395 10 502 0 0141 0 1316 0 0585 0 9843 O 1010 1 160 3 588 0 0470 0 1746 0 0048 0 0769 0 9805 UnStructured matrix for Tr sire 1 904 81 199 4578 8860 sQOOTT 0163 Or 10 UnStructured matrix 0 304 12 963 0 7476 0 3646 0 0798 0 5260 0 1587 0 114 4 859 0 4695 0 2766 0 0826 0 8015 0 2320 0 013 0 535 0 1052 0 0248 0 9438 0 1116 0 2918
129. EGION cannot be adjusted for LOCATION when locations are actually nested in regions although they are coded independently sets the maximum number of iterations the default is 10 ASReml iterates for n iterations unless convergence is achieved first Convergence is presumed when the REML log likelihood changes less than 0 002 current iteration number and the individual variance parameter estimates change less than 1 If the job has not converged in n iterations use the CONTINUE qualifier to resume iterating from the current point To abort the job at the end of the current iteration create a file named ABORTASR NOW in the directory in which the job is running At the end of each iteration ASReml checks for this file and if present stops the job producing the usual output but not producing predicted values since these are calculated in the last iteration Creating FINALASR NOW will stop ASReml after one more iteration during which predictions will be formed On case sensitive operating systems eg Unix the filename ABORTASR NOW or FINALASR NOW must be upper case Note that the ABORTASR NOW file is deleted so nothing of importance should be in it If you perform a system level abort CTRL C or close the program window output files other than the rsv file will be incomplete The rsv file should still be functional for resuming iteration at the most recent parameter estimates see CONTINUE Use MAXIT 1 where you want est
130. ESIDUALS 2 ISAVE n causes ASReml to print the transformed data file to base name asp If n lt 0 data fields 1 mod n are written to the file n 0 nothing is written n 1 all data fields are written to the file if it does not exist n 2 all data fields are written to the file overwriting any previous contents n gt 2 data fields n t are written to the file where t is the last defined column modifies the format of the tables in the pvs file and changes the file extension of the file to reflect the format PVSFORM 1 is TAB separated pvs pvs txt PVSFORM 2 is COMMA separated pvs _pvs csv PVSFORM 3 is Ampersand separated pvs _pvs tex See TXTFORM for more detail instructs ASReml to write the transformed data and the resid uals to a binary file The residual is the last field The file basename srs is written in single precision unless the argu ment is 2 in which case basename drs is written in double precision The file will not be written from a spatial analysis two dimensional error when the data records have been sorted into field order because the residuals are not in the same order that the data is stored The residual from a spatial analysis will have the units part added to it when units is also fitted The drs file could be renamed with extension db1 and used for input in a subsequent run instructs ASReml to write the data to a binary file The file asrdata bin is writ
131. R phencorr 7 8 9 calculates the phenotypic covariance by calculating component 8 component 7 x component 9 where components 7 8 and 9 are created with the first line of the pin file and R gencorr 4 6 calculates the genotypic covariance by calculating component 5 component 4 x component 6 where components 4 5 and 6 are variance components from the analysis 11 Functions of variance components 174 A more detailed example For convenience in preparing the pin file some users copy the variance compo nent lines from the asr file and insert them at the top of the pin file being careful to retain one leading space so that the lines will be just copied and oth erwise ignored as the pin file is processed The following sample pin file is a little more Bivariate sire model complicated It relates to the bivariate sire sire I f t Tai model in bsiremod as shown in the code box eae bsiremod asd to the right ywt fat Trait r Trait sire 12 1 The first six lines of the pin file contain the 0 ASReml will count units i Trait 0 US variance component table on which the pin 3 5 file is based copied from the the asr file for Trait sire 2 converience in coding the rest of the file Trait 0 US 3 0 sire Residual UnStruct 1 26 2197 26 2197 18 01 0 U Residual UnStruct 1 2 85090 2 85090 9 55 0y Residual UnStruct 2 1 71556 1 71556 18 00 0 y Tr sire UnStruct 1 16 5262 16 5262 2 69 OU Tr
132. R structure is required DATEER TEA i o yield mu variety r repl no G structures are being explicitly defined 1f my and there are no parameter constraints see 1 2 1 VCC and examples 1 and 2a 22 TOW ARL Og 11 column AR1 0 3 s is used to code the number of independent repl 1 sections in the error term repl 0 IDV 0 1 if s 0 the default IID R structure is assumed and no R structure definition lines are required as in examples 2b and 5 if s gt 0 s R structure definitions are required one for each of the s sections as in examples 3a 3b 3c and 4 for the analysis of multi section data s can be replaced by the name of a factor with the appropriate number of levels one for each section c is the number of component variance models involved in the variance struc ture for the error term for each section for example 3a 3b and 3c have column row as the error term and the variance structure for column row in volves 2 variance models the first for column and the second for row c has a default value of 2 when s is not specified as zero g is the number of variance structures G structures that will be explicitly specified for the random terms in the model R and G structures are now discussed with reference to s c and g As already noted each variance structure may involve several variance models which relate to the individual terms involved in the random effect or error For example a two fa
133. S2 1 0000 2964 df Source Model terms Gamma Component Comp SE C final Residual UnStru 1 1 0 198351 0 198351 21 94 OU components Residual Unstru 2 1 0 128890 0 128890 12 40 OU Residual Unstru 2 2 0 440601 0 440601 21 93 OU Trait TEAM UnStru 1 0 374493 0 374493 388 0 y Trait TEAM Unstru 2 1 0 388740 0 388740 2 60 OU Trait TEAM UnStru 2 2 1 36533 1 36533 3 74 OU Trait TAG UnStru 1 1 0 257159 0 257159 12 08 0 y Trait TAG UnStru 2 1 0 219557 0 219557 5 55 OU Trait TAG Unstru 2 2 1 92082 1 92082 14 35 OU 8 Command file Multivariate analysis 147 Residual 0 4360 is the correlation Trait TEAM Trait TAG Covariance Variance Correlation Matrix UnStructured 0 1984 0 4360 0 1289 0 4406 Covariance Variance Correlation Matrix UnStructured 0 3745 0 5436 0 3887 1 365 Covariance Variance Correlation Matrix UnStructured 0 2572 0 3124 0 2196 1 921 Analysis of Variance NumDF DenDF F_inc 9 Trait 2 33 0 5761 58 10 Trait YEAR 4 1162 2 1094 90 Prob lt 001 lt 001 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using empirical derivative S Estimate Standard Error T value 10 Trait YEAR 2 0 102262 0 290190E 01 3 52 3 1 06636 0 290831E 01 36 67 5 1 17407 0 433905E 01 27 06 6 2 53439 0 434880E 01 58 28 9 Trait 4 Te13717 0 107933 66 13 2 21 0569 0 209095 100 71 11 Trait TEAM 70 effects fitted 12 Trait TAG 1042 effects fitted SLOPES FOR LOG ABS RES on
134. SECTORS s5 New WMF 1YHTFORM f New 1YSS r New TXTFORM 3 replaces multiple spaces with Ampersand ap pends a double backslash to each line and changes the file extension to say _sln tex Latex style Additional significant digits are reported with these formats Omitting the qualifier means the standard fixed field format is used For yht and sln files setting n to 1 means the file is not formed modifies the appearance of the variogram calculated from the residuals obtained when the sampling coordinates of the spa tial process are defined on a lattice The default form is based on absolute distance in each direction This form dis tinguishes same sign and different sign distances and plots the variances separately as two layers in the same figure specifies that n constraints are to be applied to the variance parameters The constraint lines occur after the G structures are defined The constraints are described in Section 7 9 The variance header line Section 7 4 must be present even ifonly 0 0 0 indicating there are no explicit R or G structures see Section 7 9 requests that the variogram formed with radial coordinates see page 18 be based on s 4 6 or 8 sectors of size 180 s degrees The default is 4 sectors if VGSECTORS is omitted and 6 sectors if it is specified without an argument The first sector is centred on the X direction Figure 5 1 is the variogram using radial coordinates obtained us
135. VERSE The model is fitted on the log inverse scale but the residuals are on the natural scale 6 Command file Specifying the terms in the mixed model 98 GLM qualifiers qualifier action BINOMIAL LOGIT IDENTITY PROBIT COMPLOGLOG TOTAL n v p 1 p n d 2n yln y p 1 y In POISSON LOGARITHM v yp d 2 yln y 1 y p GAMMA INVERSE v u n d 2n p1n A INEGBIN LOGARITHM v ptp o d 2 y in 443 yln 4 Proportions or counts r are indicated if TOTAL specifies the variate containing the binomial totals Proportions are assumed if no response value exceeds 1 A binary variate 0 1 is indicated if TOTAL is unspecified The expression for d on the left applies when y is proportions or binary The logit is the default link function The variance on the underlying scale is 12 3 3 3 underlying logistic distribution for the logit link IDENTITY SQRT Natural logarithms are the default link function ASReml assumes the Poisson variable is not negative IDENTITY LOGARITHM PHI TOTAL n The inverse is the default link function n is defined with the TOTAL qualifier and would be degrees of freedom in the typical application to mean squares The default value of is 1 IDENTITY INVERSE PHI fits the Negative Binomial distribution Natural logarithms are the default link funct
136. VL ETSL8AN LOTNVL rOrEscIN IZ LS9V89 N LSVL80N 90vE85N amp 9TLSAN YAON VI L09984N ANVTIXNOIS UAONV T ISLESIN 60899841N MOHVdIVUV oz ZZSL8AN 909985N UAONVI 6LOLSAN ADVD 9PVL8AN aOHVdVuV 667L85N LOSISAN Adoo 61 OOZINVL LZ9L80N LIOO 8S98AN IStL8HN 999L98AN vOvesaN LZ9L8AN 89TLSAN 80S98IN GNWIGEY SI TOS98AN 8LAVUNLNAO 80PL8AN 80S98AN ANVTXNOIS e19L8GN 86VE8AN e19L8GN LSVLSAN 108984N a144 LI eT9L8aN ZILESAN SOS98AN totsgaN ANNAAAHO ZI9L8AN 209984N 60VL8AN TS7L8AN VLOONVT YAONWT 91 S6VE8aN eTSL8GN VUNLNAIO dVALSANOH ZEGLEIN TTAN AINNAATHO ZILESAN 9YYL8AN QAYVALSAWOH z GT 619L84N 99LN008 9bPL89N 9GGG8AN TOS9STN Z8ST98SAN 667L8AN 8LAVUNLNAD 6OTLSSIN Z8tI8SAN p ZevesaNn NVMUON ZISL8GN 60S98AIN ST9LSAN 0S984N 619L84N ZEvEsaN 80tL8AN 00 WVL gf anv1dqa4u 660280 N e0vL8aN 86 9S8AN 999LISAN WLOONV I ADVO TSPL8GN E0tL8AN LETESSJH 3 ZI ANNAATHO Z8998AN 999 L984N E0tL8AN LZIL8AN OOZINVL DLETESSM 807L8AN 999LISAN NVMUON II AOHVAVUV E9PL8AN I9LNOOS SLMVYNLNAO 99998AN L99089N Z8S98aN LLESAN SLMVUNLNAO OI ST9L8AN AdOO S6FESAN AOHVdIVUV LOTAVL ST9L8GN 909984N AdOO ADVO 869SSAN 5 6 LOvE8aN vOvesaN YACINHDNOU amp L9LSAN VNOA NVMUON 609984 N 79S8GN ANVTXNOIS 9898AN gt 8 aDVD VNOA ZISLSAN o0z WYVL LEG98HN ZZSL8AN 99LNO00S VUNLNAO VNOA ZETESAN z L 60PL84N UACIMHDNOU ZETESAN LOS9SEIN LIOO LIOO 229989 N YaCIMHDAOU LSS TSN 9 TSPL8GN 209989 N VUNLNIO LLESAN TLETESSM 90PE8SAN UFCIMHDNOU LEILAN LO998EIN 86FESAN G LZ9984N ataua 66tL8AN LSST8AN
137. X can be used to specify F P Uor Z for the parameters individually A shorthand notation allows a repeat count before a code letter Thus GPPPPPPPPPPPPPPZPPPZP could be written as G14PZ3PZP For a US model GP makes ASReml attempt to keep the matrix posi tive definite After each AI update it extracts the eigenvalues of the updated matrix If any are negative or zero the AI update is discarded and an EM update is performed Notice that the EM update is applied to all of the variance parameters in the particular US model and cannot be applied to only a subset of them sets the initial value of the error variance within the section to r the 1S2 qualifier may be used in R structure definitions to represent the residual variance associated with the particular section There is always an error variance parameter associated with a section R structure In multiple section analyses the S2 qualifier is used on the first of the c lines for each section to set the initial error variance for that section If this is not supplied ASReml calculates an initial value that is half the simple variance of the data in the site To fix the variance S2 r is used is similar to S2 except that the variance is fixed at r is used in the analysis of multiple section data and in multivariate analyses and when variance parameters are included in the R structure 7 7 Rules for combining variance models As noted in Section 2 1 under Combining varia
138. Y qualifier Table 5 4 By default the variogram and field plan are displayed NIN alliance Variogram o Outer displacement Figure 13 2 Variogram of residuals resi bial 197 sur 2005112 41 18 Inner displacement 13 Description of output files 205 The sample variogram is a plot of the semi variances of differences of residuals at particular distances The 0 0 position is zero because the difference is identically zero ASReml displays the plot for distances 0 1 2 8 9 10 11 14 15 20 The plot of residuals in field plan order Figure 13 3 contains in its top and right margins a diamond showing the minimum mean and maximum residual for that row or column Note that a gap identifies where the missing values occur The plot of marginal means of residuals shows residuals for each row column as well as the trend in their means rield AH iR Pirs 74 gun ZA 2 41 18 a ee _ i eee eee lt p pe ae E ee ec ee lt p se ie tla Se lt 1 lt p a iis iE NEE a REE EOL FRE EF RN Sa AS A eT PT ee TRS oo ER i oT EE er ee TSE enna ae eee ee ae oa easing anges se ees eee ee Figure 13 3 Plot of residuals in field plan order The rsv file The rsv file contains the variance parameters from the most recent iteration of a model The primary use of the rsv file is to supply the values for the CONTINUE qualifier see Table 5 4 and the C command line option see Table 12 1 It contains sufficient inform
139. _inc Prob 12 mu 1 3 0 242 05 lt 001 1 variety 55 165 0 0 88 0 708 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 5 repl 4 effects fitted Finished 11 Jul 2005 13 55 25 309 LogL Converged The sln file The following is an extract from nin89 sln containing the estimated variety effects intercept and random replicate effects in this order column 3 with stan dard errors column 4 Note that the variety effects are returned in the order of their first appearance in the data file see replicate 1 in Table 3 1 variety LANCER 0 000 0 000 variety BRULE 2 487 4 979 variety REDLAND 1 938 4 979 variety CODY Z 990 4 979 variety ARAPAHOE 0 8750 4 979 variety NE83404 i 175 4 979 variety NE83406 4 287 4 979 variety NE83407 5 875 4 979 variety CENTURA 6 912 4 979 variety SCOUT66 037 4 979 variety COLT 1 562 4 979 variety NE83498 1 563 4 979 variety NE84557 8 037 4 979 variety NE83432 8 830 4 979 variety NE87615 2 B19 4 979 variety NE87619 2 700 4 979 variety NE87627 5 300 4 979 mu 1 28 56 3 856 repl 1 1 880 1 755 repl 2 2 843 1 755 3 A guided tour 38 repl 3 0 8713 1 755 repl 4 3 852 1 755 The yht file The following is an extract from nin89 yht containing the predicted values of the observations column 2 the residuals column 3 and the diagonal elements of the hat matrix This final column can be used in tests involving the residu
140. ach term is easily determined as the number of non singular equations involved in the term However in general calculation of the denominator degrees of freedom is not trivial ASReml will by default attempt the calculation for small analyses by one of two methods In larger analyses users can request the calculation be attempted using the DDF qualifier page 68 Use DDF 1 to prevent the calculation T Command file Specifying variance structures Introduction Non singular variance matrices Variance model specification in ASReml A sequence of structures for the NIN data Variance structures General syntax Variance header line R structure definition G structure header and definition lines Variance model description Forming variance models from correlation models Additional notes of variance models Variance structure qualifiers Rules for combining variance models G structures involving more than one random term Constraining variance parameters Parameter constraint within a variance model Constraints between and within variance models Model building using the CONTINUE qualifier 105 7 Command file Specifying variance structures 106 7 1 Introduction The subject of this chapter is variance model specification in ASReml ASReml allows a wide range of models to be fitted The key concepts you need to under stand are e the mixed linear model y XT Zu e has a residual term e N 0 R and random eff
141. al AR AutoR 67 0 671436 0 671436 15 66 0U Residual AR AutoR 10 0 266088 0 266088 3 53 OU Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 42 5 7073 70 lt 001 3 weed 1 457 4 91 91 lt 001 8 pol column 1 1 50 8 8 73 0 005 H AR1xAR1 units pol column 1 1 LogL 4272 74 S2 0 11683E 06 665 df 1 components constrained 2 LogL 4266 07 S2 50207 665 df z 1 components constrained 3 LogL 4228 96 S2 76724 665 df 4 LogL 4220 63 S2 55858 665 df 5 LogL 4220 19 S2 54431 665 df 6 LogL 4220 18 S2 54732 665 df 7 LogL 4220 18 S2 54717 665 df 8 LogL 4220 18 S2 54715 665 df Source Model terms Gamma Component Comp SE C variety 532 532 1 34824 73769 0 7 08 OP units 670 670 0 556400 30443 6 23 77 QF Variance 670 665 1 00000 54715 2 B15 0 P Residual AR AutoR 67 0 837503 0 837503 18 67 OV Residual AR AutoR 10 0 375382 0 375382 3 26 OU Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 13 6 4241 53 lt 001 3 weed 1 469 0 86 39 lt 001 8 pol column 1 1 18 5 4 84 0 040 The increase in REML log likelihood is significant The predicted means for the varieties can be produced and printed in the pvs file as Warning mv_estimates is ignored for prediction Warning units is ignored for prediction 15 Examples 272 column evaluated at 5 5000 weed is evaluated at average value of 0 4597 Predicted values of yield variety Predicted_Value Standard_Error Ecode 1 0000 2917 1782 179 2881 E 2 0000 2957 74
142. ally if the model up to this point has p effects and t has a effects the a columns of the design matrix for t are multiplied by the scalar r default value 1 0 and added to the last a of the p columns already defined The overlaid term must agree in size with the term it overlays This can be used to force a correlation of 1 between two terms as in a diallel analysis male and female assuming the ith male is the same individual as the ith female Note that if the overlaid term is complex it must be predefined e g Tr male Tr female and Tr female 6 Command file Specifying the terms in the mixed model 92 New Alphabetic list of model functions and descriptions model function action at f n OC f n at f f at f m n OC f m n cos v 7 con f c f fac v fac v y giv f n g f n defines a binary variable which is 1 if the factor f has level n for the record For example to fit a row factor only for site 3 use the expression at site 3 row The string is equivalent to at for this function at f is expanded to a series of terms like at f 7 where i takes the values 01 to the number of levels of factor f Since this command is interpreted before the data is read it is necessary to declare the number of levels correctly in the field definition This extended form may only be used as the first term in an interaction at f i j k is expanded to a series of terms at f i at f 7 at f k S
143. als see Section 2 5 under Diagnostics Record Yhat Residual Hat 1 30 442 1 192 1301 2 27 955 3 595 13 01 3 32 380 2 670 13 01 4 23 092 7 008 13 01 5 21 317 1 763 13 01 6 29 267 0 9829 13 01 7 26 155 9 045 13 01 8 24 567 5 167 13 01 9 23 530 0 8204 13 91 222 16 673 9 877 13 01 223 24 548 1 052 139i 224 23 786 3 114 13 01 3 7 Tabulation predicted values and functions of the variance com ponents It may take several runs of ASReml to determine an appropriate model for the data that is the fixed and random effects that are important During this process you may wish to explore the data by simple tabulation Having identified an appropriate model you may then wish to form predicted values or functions of the variance components The facilities in ASReml to form predicted values and functions of the variance components are described in Chapters 10 and 11 respectively Our example only includes tabulation and prediction The statement tabulate yield variety 3 A guided tour 39 in nin89 as results in nin89 tab as follows NIN alliance trial 1989 11 Jul 2005 13 55 21 Simple tabulation of yield variety LANCER 28 56 BRULE 26 07 REDLAND 30 50 CODY 21 21 ARAPAHOE 29 44 NE83404 27 939 NE83406 24 28 NE83407 22 69 CENTURA 21 65 SCOUT66 27 52 COLT 2700 NE87522 25 00 NE87612 21 80 NE87613 29 40 NE87615 25 69 NE87619 81 26 NE87627 29 A3 The predict variety statement after the model statement in n
144. ance table contains F statistics table for each term in the fixed part of the model These provide for an incremental or optionally a condi tional test of significance see Section 6 12 data summary asr file includes the number of records read and retained ass file for analysis the minimum mean maximum number of zeros number of missing values per data field factor variate field distinction An extended report of the data is written to the ass file if the SUM qualifier is specified It in cludes cell counts for factors histograms of vari ates and simple correlations among variates eigen analysis res file When ASReml reports a variance matrix to the asl file asr file it also reports an eigen analysis of the matrix eigen values and eigen vectors to the res file elapsed time asr file this can be determined by comparing the start asl file time with the finishing time The execution times for parts of the Iteration pro cess are written to the asl file if the DEBUG LOGFILE command line qualifiers are invoked fixed and random ef sln file if BRIEF 1 is invoked the effects that were in fects cluded in the dense portion of the solution are also printed in the asr file with their standard error a t statistic for testing that effect and a t statistic for testing it against the preceding effect in that factor heritability pvc file placed in the pvc file when postprocessing with a pin file histogram o
145. and the diameter of that wool Much of the wool is produced from wethers and most major producers have traditionally used a particular strain or bloodline To as sess the importance of bloodline differences many wether trials were conducted One trial was conducted from 1984 to 1988 at Borenore near Orange It involved 35 teams The file wether dat shown below contains greasy fleece weight kg yield percentage of clean fleece weight to greasy fleece weight and fibre of wethers representing 27 bloodlines diameter microns The code wether as to the right performs a basic bivariate analysis of this data Wolfinger rat data treat A wtO wti wt2 wt3 wt4 rat dat wtO wti wt2 wt3 wt4 Trait treat Trait treat 120 27 O ID error variance Trait 0 US 15 0 Orange Wether Trial 1984 8 SheepID I TRIAL BloodLine I TEAM YEAR GFW YLD FDIAM wether dat skip 1 GFW FDIAM Trait Trait YEAR lr Trait TEAM Trait SheepID 122 1485 0O ID Trait 0 US 2 124 Trait TEAM 2 Trait 0 US 0 4 0 3 1 3 TEAM O ID Trait SheepID 2 Trait 0 US 0 2 0 2 2 SheepID 0 ID predict YEAR Trait 8 Command file Multivariate analysis 143 SheepID Site Bloodline Team Year GFW Yield FD 0101 3 21 1 1 5 6 74 3 18 5 0101 3 21 1 2 6 0 71 2 19 6 0101 3 21 13 8 0 75 7 21 5 0102 3 21 f 1 5 3 70 9 20 8 0102 3 21 1 2 6 7 66 1 20 9 0102 3 21 1 3 6 8 70 3 22 1 0103 3 21 1 1 5 0 80 7 18 9 0103 3 21
146. are many labels they may be written over several lines by using a trailing comma to indicate continuation of the list IP indicates the special case of a pedigree factor ASReml will de termine the levels from the pedigree file see Section 9 3 A warning is printed if the nominated value for n does not agree with the actual number of levels found in the data and if the nominated value is too small the correct value is used DATE specifies the field has one of the date formats dd mm yy dd mm ccyy dd Mon yy dd Mon ccyy and is to be converted into a Julian day where dd is al or 2 digit day of the month mm is a 1 or 2 digit month of the year Mon is 5 Command file Reading the data 49 New a three letter month name Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec yy is the year within the century 00 to 99 cc is the century 19 or 20 The separators and must be present as indicated The dates are converted to days starting 1 Jan 1900 When the century is not specified yy of 0 32 is taken as 2000 2032 33 99 taken as 1933 1999 DMY specifies the field has one of the date formats dd mm yy or dd mm ccyy and is to be converted into a Julian day MDY specifies the field has one of the date formats mm dd yy or mm dd ccyy and is to be converted into a Julian day TIME specifies the field has the time format hh mm ss and is to be con verted to seconds past midnight where hh is hours 0 to 23 mm is minutes
147. at Tr 1 dam at Tr 2 dam lt at Tr 3 dam 003 attTr 1 2it abtir 2 1ity at Tr 3 lit at ir 4 lit at Trait 1 age grp 0024 at Trait 2 age grp 0019 at Trait 4 age grp 0020 at Trait 5 age grp 00026 at Trait 1 sex grp 93 at Trait 2 sex grp 16 0 at Trait 3 sex grp 28 at Trait 5 sex grp 1 18 i Tr grp 12 3 One multivariate R structure 3 G structures 000 No structure across lamb records First zero lets ASReml count te number of records Tr 0 US General structure across traits 66 15 Examples 303 5 33 13 18 66 10 278 2 2 27 B22 73 2 02 08 20 1 44 Tr tag 2 Dire PATH 2 Tr 0 FA1 IGP 0 5 0 5 01 lt 01 0 2 2 45 2 0 06 8 14 PATH 3 Tr O US 2 4800 2 8 6 4 0 0128 0 03 0 06 Sige ee OO Le Oe 0 24 0 55 0 0026 0 0202 0 PATH tag at Tr 1 dam 2 PATH 2 2 0 CORGH GFU 99 1 6 2 54 IPATH 3 2 0 US GU 1 1 59 31 PATH dam stiTr 1 lait 2 IPATH 2 4 0 FA1 GP 5 4 8 04 2 01 4 95 4 63 0 037 0 941 0 102 IPATH 3 40 US 5 073 3 545 3 914 0 1274 0 08909 0 02865 0 07277 0 05090 0 001829 1 019 PATH lit The term Tr tag now replaces ct animal effects 14 Maternal effects Litter effects Factor Analytic Unstructured the sire and part of dam terms in the half sib analysis This analysis uses information from both sires and dams to estimate 15 Examples 304 additive genetic variance The dam variance component is this
148. ation to match terms so that it can be used when the variance model has been changed This is nin89a rsv 76 6 1690 120 0 000000 0 000000 0 000000 1 000000 0 6555046 0 4374830 RSTRUCTURE I 2 VARIANCE I i 0 1 00000 STRUCTURE 22 1 1 0 655505 N W oO O Bw Residuals V ki ana CF inh pasii Range 8 Figure 13 4 Plot of the marginal means of the residuals Hisense a daaa d RA 2005 1 18 Peak Count 17 Range 24 87 15 91 Figure 13 5 Histogram of residuals 13 Description of output files 207 STRUCTURE if 1 1 0 437483 The tab file The tab file contains the simple variety means and cell frequencies Below is an edited version of nin89 tab nin alliance trial 10 Sep 2002 04 20 15 Simple tabulation of yield variety LANCER 28 56 4 BRULE 26 07 4 REDLAND 30 50 4 CODY 21 21 4 ARAPAHOE 29 44 4 NE83404 27 39 4 NE83406 24 28 4 NE83407 22 69 4 CENTURA 21 65 4 SCOUT66 27 52 4 COLT 27 00 4 NE87615 25 69 4 NE87619 31 26 NE87627 23 23 The vrb file The vrb file contains the estimates of the effects together with their approxi mate prediction variance matrix corresponding to the dense portion The file is formatted for reading back for post processing The number of equations in the dense portion can be increased to a maximum of 800 using the DENSE option Table 5
149. ations for multilevel models with binary response Journal of the Royal Statistical Society A General 159 505 518 Goldstein H Rasbash J Plewis I Draper D Browne W Yang M Wood house G and Healy M 1998 A user s guide to MLwiN Institute of Education London Green P J and Silverman B W 1994 Nonparametric regression and gener alized linear models London Chapman and Hall Harvey W R 1977 Users guide to LSML76 The Ohio State University Columbus Harville D A 1997 Matrix algebra from a statisticians perspective Springer Verlag New York Harville D and Mee R 1984 A mixed model procedure for analysing ordered categorical data Biometrics 40 393 408 Haskard K A 2006 Anisotropic Mat rn correlation and other issues in model based geostatistics PhD thesis BiometricsSA University of Adelaide Kammann E E and Wand M P 2003 Geoadditive models Applied Statistics 52 1 1 18 Bibliography 309 Keen A 1994 Procedure IRREML GLW DLO Procedure Library Manual Agricultural Mathematics Group Wageningen The Netherlands pp Re port LWA 94 16 Kenward M G and Roger J H 1997 The precision of fixed effects estimates from restricted maximum likelihood Biometrics 53 983 997 Lane P W and Nelder J A 1982 Analysis of covariance and standardisation as instances of predicton Biometrics 38 613 621 McCullagh P and Nelder J A
150. aused by a non positive definite matrix Use better initial values or a structured vari ance matricx that is positive definite You may use FA or FACV The R structure must be positive definite 15 Examples Introduction Split plot design Oats Unbalanced nested design Rats Source of variability in unbalanced data Volts Balanced repeated measures Height Spatial analysis of a field experiment Barley Unreplicated early generation variety trial Wheat Paired Case Control study Rice Standard analysis A multivariate approach Interpretation of results Balanced longitudinal data Oranges Initial analyses Random coefficients and cubic smoothing splines Multivariate animal genetics data Sheep Half sib analysis Animal model 241 15 Examples 242 15 1 Introduction In this chapter we present the analysis of a variety of examples The primary aim is to illustrate the capabilities of ASReml in the context of analysing real data sets We also discuss the output produced by ASReml and indicate when problems may occur Statistical concepts and issues are discussed as necessary but we stress that the analyses are illustrative not prescriptive 15 2 Split plot design Oats The first example involves the analysis of a split plot design originally presented by Yates 1935 The experiment was conducted to assess the effects on yield of three oat varieties Golden Rain Marvellous and Victory with four
151. b with increased workspace If the system has already allocated all available memory the job will stop Examples ASReml code action asreml LW64 rat as increase workspace to 64 Mbyte send screen output to rat asl and suppress interactive graphics asreml IL rat as send screen output to rat asl but display interactive graph ics asreml N rat as allow screen output but suppress interactive graphics asreml ILW512 rat as increase workspace to 512 Mbyte send screen output to rat asl but display interactive graphics asreml rs3 coop wwt ywt runs coop as twice writing results to coopwwt as and coopywt as using 64Mb workspace and substituting wwt and ywt for 1 in the two runs 12 Command file Running the job 185 12 4 Advanced processing arguments Standard use of arguments Command line arguments are intended to facilitate the running of a sequence of jobs that require small changes to the command file between runs The output file name is modified by the use of this feature if the R option is specified This use is demonstrated in the Coopworth example of Section 15 10 see page 297 Command line arguments are strings listed on the command line after basename the command file name or specified on the top job control line after the ARGS qualifier These strings are inserted into the command file at run time When the input routine finds a n in the command file it substitutes the nth argument string n may take th
152. backcross or 1 0 1 F2 design s length n 1 should be the n marker positions relative to a left telomere position of zero and an extra value being the length of the linkage group the position of the right telomere The length right telomere may be omitted in which case the last marker is taken as the end of the linkage group The positions may be given in Morgans or centiMorgans if the length is greater than 10 it will be divided by 100 to convert to Morgans The recombination rate between markers at sy and sp L is left and R is right of some putative QTL at Q is Orr 1 gaan 9 Consequently for 3 markers L Q R OLR Org QR 20LQOQR The expected value of a missing marker at Q between L and R depends on the marker states at L and R E q 1 1 1 Oro 9gr 1 OLR E q 1 1 99r 61Q Otr E q 1 1 OL 9Qr PrR and E q 1 1 1 819 09x 1 ra 8 1 90 1 20 Let Az E ql1 1 B ql1 1 2 2220 2080 2919 5 Command file Reading the data 57 New and Az E ql 1 1 Bla 1 1 2 29G e Han Then E q rL R ALTE ARXR Where there is no marker on one side E q zr 1 0gr R PgR R ER 1 26QR IDOM A is used to form dominance covariables from a set of additive marker covariables previously declared with the MM marker map qualifier It assumes the argument A is an existing group of marker variables relati
153. bels have been defined and are in the correct case e some errors arise from conflicting information the error may point to some thing that appears valid but is inconsistent with something earlier in the file e reduce to a simpler model and gradually build up to the desired analysis this should help to identify the exact location of the problem If the problem is not resolved from the above list you may need to email Customer Support at support asreml co uk Please send the as file a sample of the data the asr file and the as1 file produced by running asreml dl basename as The dl command line option invokes debug mode and sends all non graphics screen output to the as1 file In this chapter we show some of the common typographical problems Errors arising from attempts to fit an inappropriate model are often harder to resolve Following is an example of output when the data file is not correctly named or is not present ASReml tries to interpret the filename as a variable name when the file is not found ASReml 1 99a 01 Aug 2005 nin alliance trial Build c 26 Jul 2005 32 bit 27 Jul 2005 15 41 25 267 64 00 Mbyte Windows ninerri Licensed to Arthur Gilmour Folder C data ex manex Warning FIELD DEFINITION lines should be INDENTED There is no file called nine asd Invalid label for data field nine asd contains a reserved character 14 Error messages 214 or may get confused with a previous label or res
154. ble Error setting MBF design matrix IMBF mbf x k filename Error structures are wrong size Error when reading knot point values Failed forming R G scores Failed ordering Level labels Failed to parse R G structure line Failed to read R G structure line Failed to process MYOWNGDG files Failed when sorting pedigree Failed when processing pedigree file Failed while ordering equations the data file could not be interpreted al phanumeric fields need the A qualifier data file name may be wrong the model specification line is in error a vari able is probably misnamed The VCC constraints are specified last of all and require knowing the position of each pa rameter in the parameter vector the specified dependent variable name is not recognised It is likely that the covariate values to not match the values supplied in the file The val ues in the file should be in sorted order the declared size of the error structures does not match the actual number of data records There is some problem on the SPLINE line It could be a wrong variable name or the wrong number of knot points Knot points should be in increasing order Try increasing workspace The problem may be due to the use of the ISORT qualifier May be an unrecognised factor model term name or variance structure name or wrong count of initial values possible on an earlier line May be insufficient lines in the job Check yo
155. bmodels evaluated Use the DENSE qualifier to control which terms are screened The screen is conditional on all other terms those in the SPARSE equations being present modifies the format of the s1n file SLNFORM 1 prevents the sln file from being written SLNFORM 1 is TAB separated sln becomes _sln txt SLNFORM 2 is COMMA separated sln becomes _ sln csv SLNFORM 3 is Ampersand separated sln becomes _s1n tex See TXTFORM for more detail increases the amount of information reported on the residuals obtained from the analysis of a two dimensional regular grid field trial The information is written to the res file controls form of the tab file TABFORM 1 is TAB separated tab becomes _tab txt TABFORM 2 is COMMA separated tab becomes _tab csv TABFORM 3 is Ampersand separated tab becomes _tab tex See TXTFORM for more detail sets the default argument for PVSFORM SLNFORM TABFORM and YHTFORM if these are not explicitly set TXTFORM or TXTFORM 1 replaces multiple spaces with TAB and changes the file extension to say _sln txt This makes it easier to load the solutions into Excel TXTFORM 2 replaces multiple spaces with COMMA and changes the file extension to say _sln csv However since factor labels sometimes contain COMMAS this form is not so convenient 5 Command file Reading the data 78 List of rarely used job control qualifiers action qualifier TWOWAY New IVCC n VG
156. cale parameter 0 has been fitted univariate single site analysis it becomes the scale for G This parameterisation is bizarre and is not recommended Mod els 7 9 have too many variance parameters and ASReml will arbitrarily fix one of the variance parameters leading to possible confusion for the user If you fix the variance parameter to a particular value then it does not count for the purposes of applying the principle that there be only one scaling variance parameter That is models 7 9 can be made identifiable by fixing all but one of the nonidentifiable scaling parameters in each of G and R to a particular value Table 2 1 Combination of models for G and R structures model G Go Ry Ro 0 comment 1 VC C C y valid 2 V C V C n valid 3 C C V y valid but not recommended 4 n inappropriate as R is a correlation model 5 C C C C y inappropriate same scale for R and G 6 C C V C n inappropriate no scaling parameter for G 7 V V nonidentifiable 2 scaling parameters for G 8 V C V y nonidentifiable scale for R and overall scale 9 V V nonidentifiable 2 scaling parameters for R indicates the entry is not relevant in this case Note that G1 and G are interchangeable in this table as are R and R 2 Some theory 17 2 5 Inference Random effects Tests of hypotheses variance parameters Inference concerning variance parameters of a linear mixed effects model usu ally relies on approximate dist
157. ce 2 925 The res file The res file contains miscellaneous supplementary information including a list of unique values of x formed by using the fac model term a list of unique x y combinations formed by using the fac z y model term legandre polynomials produced by leg model term orthogonal polynomials produced by pol model term the design matrix formed for the sp1 model term predicted values of the curvature component of cubic smoothing splines the empirical variance covariance matrix based on the BLUPs when a X amp I or I gt structure is used this may be used to obtain starting values for another run of ASReml a table showing the variance components for each iteration some statistics derived from the residuals from two dimensional data multi variate repeated measures or spatial 13 Description of output files 201 the residuals from a spatial analysis will have the units part added to them defined as the combined residual unless the data records were sorted within ASReml in which case the units and the correlated residuals are in different orders data file order and field order respectively the residuals are printed in the yht file but the statistics in the res file are calculated from the combined residual the Covariance Variance Correlation C V C matrix calculated directly from the residuals it contains the covariance below the diagonals the vari ances on the diagonal a
158. ch as the Al algorithm can be unreliable when fitting complex variance structures unless good starting values are available Poor starting values may result in divergence of the algorithm or slow convergence A particular problem with fitting unstructured variance models is keeping the estimated variance matrix positive definite These are not simple issues and in the following we present a pragmatic approach to them 15 Examples 293 The data are taken from a large genetic study on Coopworth lambs A total of 5 traits namely weaning weight wwt yearling weight ywt greasy fleece weight gfw fibre diameter fdm and ultrasound fat depth at the C site fat were measured on 7043 lambs The lambs were the progeny of 92 sires and 3561 dams produced from 4871 litters over 49 flock year combinations Not all traits were measured on each group No pedigree data was available for either sires or dams The aim of the analysis is to estimate heritability h of each trait and to estimate the genetic correlations between the five traits We will present two approaches a half sib analysis and an analysis based on the use of an animal model which directly defines the genetic covariance between the progeny and sires and dams The data fields included factors defining sire dam and lamb tag covariates such as age the age of the lamb at a set time brr the birth rearing rank 1 born single raised single 2 born twin raised single 3 born twi
159. check ASReml is doing what you intend that is these standard errors are approximate use the correct syntax the A fields will be treated as factors but are not encoded use correct syntax revise the qualifier arguments do not accept the estimates printed the labels for predicted terms are probably out of kilter Try a simpler predict statement If the problem persists send for help 14 Error messages 231 List of warning messages and likely meaning s warning message likely meaning Warning This US structure is not positive definite Warning Unrecognised qualifier at character Warning US matrix was not positive definite MODIFIED Warning User specified spline points Warning Variance parameters were modified by BENDing WARNING Likelihood decreased Check gammas and singularities check the initial values the qualifier either is misspelt or is in the wrong place the initial values were modified the points have been rescaled to suit the data values ASReml may not have converged to the best estimate a common reason is that some constraints have restricted the gammas Add the GU qualifier to any factor definition whose gamma value is approaching zero or the correlation is ap proaching 1 Alternatively more singular ities may have been detected You should identify where the singularities are expected and modify the data so that they are omitted or consistentl
160. come this problem ASReml uses the Al algorithm Gilmour Thompson and Cullis 1995 The matrix denoted by Z4 is obtained by averaging 2 7 and 2 8 and approximating y PH Py by its expectation tr PHj in those cases when H 0 For variance components models that is those linear with respect to variances in H the terms in Z4 are exact averages of those in 2 7 and 2 8 The basic idea is to use Z4 K kj in place of the expected information matrix in 2 9 to update kK The elements of Z4 are 1 LA his Kj 3Y PHiPH Py 2 10 The Z4 matrix is the scaled residual sums of squares and products matrix of y yi Yk 2 Some theory 14 where y is the working variate for k and is given by yi HiPy H R t R R ki Q ZGiG li Ker where y XT Zu 7 and are solutions to 2 11 In this form the Al matrix is relatively straightforward to calculate The combination of the Al algorithm with sparse matrix methods in which only non zero values are stored gives an efficient algorithm in terms of both computing time and workspace Estimation prediction of the fixed and random effects To estimate T and predict u the objective function log fy y u T R log fulu G is used The is the log joint distribution of Y u Differentiating with respect to 7 and u leads to the mixed model equations Robinson 1991 which are given by X R X X R Z _ X Ry 2 11 ZROX Z R Z G
161. control treated variety 2 378 2 334 tmt variety 0 492 1 505 0 372 run 0 321 0 319 tmt run 1 748 1 388 2 223 variety run pair 0 976 0 987 tmt pair 1 315 1 156 1 359 REML log likelihood 345 256 343 22 The two paths in the input file define the two univariate analyses we will conduct We consider the results from the analysis defined in PATH 1 first A portion of the output file is 5 LogL 345 306 S52 1 3216 262 df 6 LogL 345 267 S52 1 3155 262 df 7 LogL 345 264 S52 1 3149 262 df 8 LogL 345 263 S52 1 3149 262 df Source Model terms Gamma Component Comp SE C variety 44 44 1 80947 2 37920 3501 0 P run 66 66 0 244243 0 321144 0 59 OP variety tmt 88 88 0 374220 0 492047 1 79 OP pair 132 132 0 742328 0 976057 2 51 OP run tmt 132 132 1 32973 1 74841 3 65 OP Variance 264 262 1 00000 1 31486 4 42 OP Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 53 5 1484 27 lt 001 4 tmt 1 60 4 469 36 lt 001 The estimated variance components from this analysis are given in column a of table 15 8 The variance component for the variety main effects is large There is evidence of tmt variety interactions so we may expect some discrimination between varieties in terms of tolerance to bloodworms 15 Examples 277 Given the large difference p lt 0 001 between tmt means we may wish to allow for heterogeneity of variance associated with tmt Thus we fit a separate variety variance for each level of tmt so that instead of assuming
162. convenient to prepare their data in Excel or Access How ever the data must be exported from these programs into either csv Comma separated values or txt TAB separated values form for ASReml to read it ASReml can convert an x1s file to a csv file When ASReml is invoked with an xls file as the filename argument and there is no csv file or as with the same basename it exports the first sheet as a csv file and then generates a template as command file from any column headings it finds see page 178 It will also convert a Genstat gsh spreadsheet file to csv format The data extracted from the x1s file are labels numerical values and the results from formulae Empty rows at the start and end of a block are trimmed but empty rows in the middle of a block are kept Empty columns are ignored A single row of labels as the first non empty row in the block will be taken as column names Empty cells in this row will have default names C1 C2 etc assigned Missing values are commonly represented in ASReml data files by NA or ASReml will also recognise empty fields as missing values in csv x1s files Binary format data files Conventions for binary files are as follows e binary files are read as unformatted Fortran binary in single precision if the filename has a bin or BIN extension Fortran binary data files are read in double precision if the filename has a db1 or DBL extension e ASReml recognises the value 1e37 as
163. cross repl to produce variety predictions GFW Fdiam Trait Trait Year r Trait Team predict Trait Team forms the hyper table for each trait based on Year and Team with each linear combination in each cell of the hyper table for each trait using Team and Year effects Team predictions are produced by averaging over years yield variety r site variety predict variety will ignore the site variety term in forming the predictions while predict variety AVERAGE site forms the hyper table based on site and variety with each linear combination in each cell using variety and site variety effects and then forms averages across sites to produce variety predictions yield site variety r site variety at site block predict variety puts variety in the classify set site in the averaging set and block in the ignore set Consequently it forms the sitexvariety hyper table from model terms site variety and site variety but ignoring all terms in at site block and then forms averages across sites to produce variety predictions 11 Functions of variance components Introduction Syntax Linear combinations of components Heritability Correlation A more detailed example 170 11 Functions of variance components 171 11 1 Introduction ASReml includes a post analysis procedure to F phenvar 1 2 pheno var calculate functions of variance components F genvar 1 4 geno var Its intended use is when the variance compo
164. ction problems and then forms averages across sites to produce variety predictions There are often situations in which the fixed effects design matrix X is not of full column rank These can be classified according to the cause of aliasing 1 linear dependencies among the model terms due to over parameterisation of the model 2 no data present for some factor combinations so that the corresponding effects cannot be estimated 3 linear dependencies due to other usually unexpected structure in the data The first type of aliasing is imposed by the parameterisation chosen and can be determined from the model The second type of aliasing can be detected when setting up the design matrix for parameter estimation which may require revision of imposed constraints The third type can then be detected during the absorption of the mixed model equations Dependencies aliasing can be dealt with in several ways and ASReml checks that predictions are of estimable functions in the sense defined by Searle 1971 p160 and are invariant to the constraint method used ASReml doesn t print predictions of non estimable functions unless the PRINTALL qualifier is specified However using PRINTALL is rarely a satisfactory solution Failure to report predicted values normally means that the predict statement is averaging over some cells of the hyper table that have no information and there fore cannot be averaged in a meaningful way Appropriate
165. ctor interaction may have a variance model for each of the two factors involved in the interaction Variance models are listed in Table 7 3 As indicated in the discussion of 2b care must be taken with respect to scale parameters when combining variance models see also Section 7 7 7 Command file Specifying variance structures 118 R structure definition For each of the s sections there must be c R structure definitions Each definition may take several lines Each R structure definition specifies a variance model and has the form order field model initial_values qualifiers NIN Alliance Trial 1989 additional_initial_values variety A e order is either the number of levels in the oy 55 corresponding term or the name of a factor column 11 that has the same number of levels as the im89aug asd skip 1 ield mu variety r repl term for example y y P If mv 11 column AR1 0 5 121 11 column ARi 0 3 is equivalent to 22 row AR1 0 3 repl 1 column column AR1 0 5 repl 0 IDV 0 1 when column is a factor with 11 levels field is the name of the data field variate or factor that corresponds to the term and therefore indexes the levels of the term ASReml uses this field to sort the units so they match the R structure in the example the data will be sorted internally rows within columns for the analysis but the residuals will be printed in the yht file in the original order which is actually ro
166. cture As discussed in 2b e all but one of the models supplied when the G structure involves more than one variance model must be correlation models e the other model must not be a correlation model that is the other model must be either an homogeneous or a heterogeneous variance model and an initial value for the scale parameter must be supplied NIN Alliance Trial 1989 variety A id row 22 column 11 nin89 asd skip 1 yield mu variety Ir row column 001 row column 2 row O AR1V 0 3 0 1 column O AR1 0 3 For this reason the model for rows is now AR1V and an initial value of 0 1 has been supplied for the scale parameter In this case V 0 YreNe Pc 8 Ur Pr L4 Use of row column as a G structure is a useful approach for analysing incomplete spatial arrays This approach will often run faster for large trials but requires more memory Note that we have used the original version of the data and f mv is omitted from this analysis since row column is fitted as a G structure If we had used the augmented data nin89aug asd we would still omit mv and ASReml would discard the records with missing yield 7 Command file Specifying variance structures 114 Table 7 1 Sequence of variance structures for the NIN field trial data ASReml syntax extra random terms term G structure models 1 2 residual error term term R structure models 1 2a 2b 3a 3b 3c
167. ctures 127 ARIV 0 3 0 5 0 3 is the initial spatial correlation parameter and 0 5 is the initial variance parameter value Similarly if pom then is the heterogeneous variance matrix corresponding to C x DCD where D diag o In this case there are an additional w parameters For example the heterogeneous variance model corresponding to ID is specified 2 2 IDH in the ASReml command file see below involves the w parameters of 0 and is the variance matrix o 0 0 0 o 0 Un 0 0 a Notes on the variance models These notes provide additional information on the variance models defined in Table 7 3 e the IDH and DIAG models fit the same diagonal variance structure the CORGH and US models fit the same completely general variance structure parameterized differently e in CHOLk models LDL where L is lower triangular with ones on the diagonal D is diagonal and kis the number of non zero off diagonals in L in CHOLKC models LDL where L is lower triangular with ones on the diagonal D is diagonal and k is the number of non zero sub diagonal columns in L This is somewhat similar to the factor analytic model e in ANTEk models Xt U DU where U is upper triangular with ones on the diagonal D is diagonal and k is the number of non zero off diagonals in U the CHOLk CHOLAC and ANTEK models are equivalent to the US structure that is the full variance structure when k
168. d For a univariate analysis ASReml discards records which have a missing response In multivariate analyses all records are retained and the R matrix is modified to reflect the missing value pattern Missing values in the explanatory variables ASReml will abort the analysis if it finds missing values in the design matrix unless IMVINCLUDE or MVREMOVE is specified see Section 5 8 MVINCLUDE causes the missing value to be treated as a zero MVREMOVE causes ASReml to discard the whole record Records with missing values in particular fields can be explicitly dropped using the D transformation Table 5 1 Covariates Treating missing values as zero in covariates is usually only sensible if the covariate is centred has mean of zero Design factors Where the factor level is zero or missing and the MVINCLUDE qualifier is specified no level is assigned to the factor for that record 6 11 Some technical details about model fitting in ASReml Sparse versus dense ASReml partitions the terms in the linear model into two parts a dense set and a sparse set The partition is at the r point unless explicitly set with 6 Command file Specifying the terms in the mixed model 102 the DENSE data line qualifier or mv is included before r see Table 5 5 The special term mv is always included in sparse Thus random and sparse terms are estimated using sparse matrix methods which result in faster processing The inverse coefficient matrix
169. d 6 gt 0 or by considering 0 lt a lt m and either 0 lt lt 1 or gt 1 With A 2 isotropy occurs when 6 1 and then the rotation angle a is irrelevant correlation contours are circles compared with ellipses in general With A 1 correlation contours are diamonds e power models rely on the definition of distance for the associated term for example the distance between time points in a one dimensional longitudinal analysis the spatial distance between plot coordinates in a two dimensional field trial analysis Information for determining distances is supplied by the key argument on the structure line For one dimensional cases key may be 7 Command file Specifying variance structures 130 the name of a data field containing the coordinate values when it relates to an R structure 0 in which case a vector of coordinates of length order must be supplied after all R and G structure lines fac x when it relates to model term fac x In two directions IEXP IGAU IEUC AEXP AGAU MATn the key argument also depends on whether it relates to an R or G structure For an R structure use the form rrcc where rr is the number of a data field containing the coordinates for the first dimension and cc is the number of a data field containing the coordinates for the second direction For example in the analysis of spatial data if the x coordinate was in field 3 and the y coordinate was in field
170. d G is the residual variance The default output for testing fixed effects used by ASReml is a table of so called incremental F statistics These F statistics are described in Section 6 12 The statistics are simply the appropriate Wald test statistics divided by the number of estimable effects for that term In this example there are four terms included in the summary The overall mean denoted by mu is of no interest for these data The tests are sequential that is the effect of each term is assessed by the change in sums of squares achieved by adding the term to the current model defined by the model which includes those terms appearing above the current term given the variance parameters For example the test of nitrogen is calculated from the change in sums of squares for the two models mu variety nitrogen and mu variety No refitting occurs that is the variance parameters are held constant at the REML estimates obtained from the currently specified fixed model The incremental Wald statistics have an asymptotic x distribution with degrees of freedom df given by the number of estimable effects the number in the DF column In this example the incremental F statistics are numerically the same as the ANOVA F statistics and ASReml has calculated the appropriate denominator df for testing fixed effects This is a simple problem for balanced designs such as the split plot design but it is not straightforward to determine the relevant deno
171. d Nelder 1982 is the choice of whether to include or exclude model terms when forming predictions In linear models since all terms are fixed terms not in the classify set must be in the averaging set Predict syntax The first step is to specify the classify set of NIN Alliance trial 1989 explanatory variables after the predict direc variety A tive The predict statement s may appear immediately after the model line before or af column 11 ter any tabulate statements or after the R 9 38 skip 1 g yield mu variety r repl and G structure lines The syntax is A 02 repl 1 repl O IDV 0 1 predict factors qualifiers e predict must be the first element of the predict statement commencing in column 1 in upper or lower case e factors is a list of the variables defining a multiway table to be predicted each variable may be followed by a list of specific values to be predicted the qualifiers listed in Table 10 1 instruct ASReml to modify the predictions in some way e apredict statement may be continued on subsequent lines by terminating the current line with a comma several predict statements may be specified ASReml parses the predict statement before fitting the model If any syntax problems are encountered these are reported in the pvs file after which the statement is ignored the job is completed as if the erroneous prediction statement did not exist The predictions are fo
172. d command line options Cc N Ww Other command line options Bb Gg Hg Rr Ss Yv CONTINUE FINAL LOGFILE NOGRAPHS WORKSPACE w ARGS a ASK IBRIEF b DEBUG DEBUG 2 GRAPHICS g HARDCOPY g INTERACTIVE JOIN ONERUN NA QUIET RENAME NA YVAR v NA job control job control screen output graphics workspace job control job control output control debug debug graphics graphics graphics output control job control post processing graphics job control workspace job control license continue iterations using previous esti mates as initial values continue for one more iteration using previous estimates as initial values copy screen output to basename asl suppress interactive graphics set workspace size to w Mbyte to set arguments a in job rather than on command line prompt for options and arguments reduce output to asr file invoke debug mode invoke extended debug mode set interactive graphics device set interactive graphics device graphics screens not displayed display graphics screen concatenate CYCLE output files override rerunning requested by RENAME calculation of functions of variance components suppress screen output repeat run for each argument renaming output filenames set workspace size over ride y variate specified in the com mand file with variate number v reports c
173. d line options C F O R 183 Workspace command line options S W 183 Examples 3 2 42 g 0h oe eee 8S 8 Oe meee ee ee ee g a 184 12 4 Advanced processing arguments 2000 185 Standard use of arguments 0000 185 Prompting for input 2 020000022 eee 186 Paths and Loops 00000 eet eee 186 13 Description of output files 188 13 1 Introduction 2 2 0 020000 2 189 13 2 Am example 2 0 6 Yo wk Yee E oe ee Pee ee Ye ee ee 190 13 3 Key output files 0 0 0 00 000002 eee 190 Contents xiii Whe 2as file soseri ey pte PEE ga ya eaa ee e 190 The ssim Hl se oa e a a ORK eee a E pe eA 193 The yht file aaa aaa a 195 13 4 Other ASReml output files o a a a 196 Thecaov file a 42 0 4 6 2 opao i i R ied dakor E i Aarona 196 Th esdpr file seere ae eos Ml ee ea we ee ar e i 199 The spve file oo 2 646 sece oa ado ee pe aa 199 The pvs file aaa aaa a 199 Thesres PG iae y ee a An ee ae wate ai bg i fg E tle R 200 Whe sve 2459 4 6 3 440504 4 064 Sa 4 fb oo 205 The etabatile zc oe 2 4p Oe ak ee we IS ee oe ae e Ea 207 The VED tile oo e 6 eee OA ee eee eB eR eS 207 The vvp file 2 aaa ee 208 13 5 ASReml output objects and where to find them 209 14 Error messages 212 14 1 Introduction 2 i244 554 24 50h pae ag oek SEE Eee 213 14 2 Common problems 2 200000 0 0 214 14 3 Things to check inthe a
174. d was Repl Currently defined structures COLS and LEVELS 1 variety 2 id pid raw repl ann F W ND e e re e Be Be e A e FP be BP A e re re N O O O OQO O a e WN KF O o o Oo O So O nloc Finished 28 Jul 2005 10 06 49 173 Error reading model factor list 14 Error messages 223 6 Misspelt factor name and 7 Wrong levels declaration in the G struc ture definition lines The next fault ASReml detects is nin alliance trial G structure header Factor order nin89 asd skip 1 indicating that there is something wrong in yield mu variety the G structure definition lines In this case 7 Repl j 001 the replicate term in the first G structure def Repl 1 inition line has been spelt incorrectly To cor 2 0 IDV 0 1 rect this error replace Repl with repl ASReml 1 99a 01 Aug 2005 nin alliance trial Build c 26 Jul 2005 32 bit 27 Jul 2005 15 41 52 606 64 00 Mbyte Windows ninerr6 Licensed to Arthur Gilmour Folder C data ex manex variety A QUALIFIERS SKIP 1 QUALIFIER DOPART 1 is active Reading nin asd FREE FORMAT skipping 1 lines Univariate analysis of yield Using 224 records of 242 read Model term Size miss zero MinNonO Mean MaxNonO 1 variety 56 0 0 1 28 5000 56 2 id 0 O 1 000 28 50 56 00 3 pid 0 1101 2628 4156 4 raw 0 0 21 00 510 65 840 0 5 repl 4 0 0 1 2 5000 4 6 nloc 0 0 4 000 4 000 4 000 7 yield Variate 0 0 1 050 25 53 42 00 8 lat 0 0 4 300 27 22 47 30 9 lo
175. data values in a field by 10 deriving new forms of the data for analysis for example summing the data in two fields or creating temporary data for example a test variable used immediately to discard some records from analysis Occasional users may find it easier to use a spreadsheet to calculate derived variables than to modify variables using ASReml transformations 5 Command file Reading the data 51 Transformation qualifiers are listed after data field labels They define an op eration e g often involving an argument a constant or another variable which is performed on a target variable The target is usually implicit but can be changed to a new variable Note that e there may be up to 1000 variables and these are internally labeled V1 V2 V1000 e values from the data file are read into the leading variables alpha A integer I pedigree P and date DATE fields are converted to real numbers level codes as they are read and before any transformations are applied e transformations may be applied to any variable since every variable is nu meric but it may not be sensible to change factor level codes e transformations are performed in the order of appearance for each record in turn after completing the transformations for each record the values in the record for variables associated with a label are held for analysis or the record all values is discarded see D transformation and Sectio
176. dd ad 83 Examples lt 2 goemai at mete a Be Ghee Pot ee 2 ee ee 88 6 3 Fixed terms in the model 0 2000004 88 Primary fixed terms oa 88 Sparse fixed terms ooa a 89 6 4 Random terms inthe model 2000 89 6 5 Interactions and conditional factors 90 Interactions 2 2 90 Conditional factors 2 2 91 6 6 Alphabetic list of model functions 04 91 6 7 Weights soes i 248 8440442845 hd 28444 24 96 6 8 Generalized Linear Models 2 00 0 00 2000005 96 6 9 Generalized Linear Mixed Models 204 99 6 10 Missing values 2 2 2 000000 eee 101 Missing values in the response 2005 101 Contents ix Missing values in the explanatory variables 101 6 11 Some technical details about model fitting in ASReml 101 Sparse versus dense 2 2 2 a a 101 Ordering of terms in ASReml 200 102 Aliassing and singularities 20 102 Examples of aliassing 0 0 2 200022 ee 103 6 12 Analysis of variance table 0 0 0 0 200000 008 104 7 Command file Specifying variance structures 105 7 1 Introduction ss sorse a pa a E 0000 Oa a a a ae a E E 106 Non singular variance matrices 2 a a a 106 7 2 Variance model specification in ASReml 107 7 3 A sequence of structures for the NIN data 107 7 4 Variance structures
177. del If a linear mixed model is not supplied tabulation is based on all records The tabulate statement has the form tabulate response_variables WT weight COUNT DECIMALS d SD RANGE STATS FILTER filter SELECT value factors e tabulate is the directive name and must begin in column 1 e response_variables is a list of variates for which means are required e WI weight nominates a variable containing weights e COUNT requests counts as well as means to be reported DECIMALS d 1 lt d lt 7 requests means be reported with d decimal places If omitted ASReml reports 5 significant digits if specified without an argument 2 decimal places are reported 10 Tabulation of the data and prediction from the model 158 e RANGE requests the minimum and maximum of each cell be reported e SD requests the standard deviation within each cell be reported e STATS is shorthand for COUNT SD RANGE e FILTER filter nominates a factor for selecting a portion of the data e SELECT value indicates that only records with value in the filter column are to be included e factors identifies the factors to be used for classifying the data Only factors not covariates may be nominated and no more than six may be nominated ASReml prints the multiway table of means omitting empty cells to a file with extension tab 10 3 Prediction Underlying principles Our approach to prediction is a generalization of that of Lane
178. dels the Average Information algorithm can have difficulty max imising the REML log likelihood when starting values are not reasonably close to the REML solution ASReml has several internal strategies to cope with this problem but these are not always successful When the user needs to provide better starting values one method is to fit a simpler variance model For example it can be difficult to guess reasonable starting values for an unstructured variance matrix A first step might be to assume independence and just estimate the variances If all the variances are not positive there is little point proceeding to try and estimate the covariances 7 Command file Specifying variance structures 140 The CONTINUE qualifier instructs ASReml to retrieve variance parameters from the rsv file if it exists rather than using the values in the asr file When reading the rsv file it will take results from some matrices as supplying starting values for other matrices The transitions recognised are DIAG to CORUH DIAG to FA1 CORUH to FA1 FAz to FAt 1 FAt to CORGH FAz to US CORGH to US The use of the rsv file with CONTINUE in this way reduces the need for the user to type in the updated starting values The various models may be written in various PART s of the job and controlled by the DOPART qualifier When used with the r qualifier on the command line see Chapter 12 the output from the various parts has the partnumber appended
179. dels we fit are the antedependence model of order 1 and the unstructured model These require as starting values the lower triangle of the full variance matrix We use the REML estimate of from the heterogeneous 15 Examples 258 power model shown in the previous output The antedependence model models X by the inverse cholesky decomposition 5 UDU where D is a diagonal matrix and U is a unit upper triangular matrix For an antedependence model of order q then u 0 for j gt i q 1 The antedependence model of order 1 has 9 parameters for these data 5 in D and 4 in U The input is given by yl y3 y5 y7 yiO Trait tmt Tr tmt 120 14 2 Tr O ANTE 60 16 54 65 12265 91 560 1233 306 4 89 17 120 2 298 6 431 8 62 24 83 85 208 3 301 2 379 8 The abbreviated output file is 1 LogL 171 501 S2 1 0000 60 df 2 LogL 170 097 S2 1 0000 60 df 3 LogL 166 085 S2 1 0000 60 df 4 LogL 161 335 S2 1 0000 60 df 5 LogL 160 407 S2 1 0000 60 df 6 LogL 160 370 S2 1 0000 60 df 7 LogL 160 369 S2 1 0000 60 df Source Model terms Gamma Component Residual ANTE UDU 1 O 268657E 01 0 26S657E 01 Residual ANTE UDU 1 0 628413 0 628413 Residual ANTE UDU 2 Q 372801E 01 0 372801E 01 Residual ANTE UDU 2 1 49108 1 49108 Residual ANTE UDU 3 0 599632E 02 0 599632E 02 Residual ANTE UDU 3 1 28041 1 28041 Residual ANTE UDU 4 0 789713E 02 0 789713E 02 Residual ANTE UDU 4 0 967815 0 967815 Residual ANTE UDU 5 0 390635E 01
180. derlying principles Syntax Examples 156 10 Tabulation of the data and prediction from the model 157 10 1 Introduction This chapter describes the tabulate directive and the predict directive intro duced in Section 3 4 under Prediction Tabulation is the process of forming simple tables of averages and counts from the data Such tables are useful for looking at the structure of the data and numbers of observations associated with factor combinations Multiple tabulate directives may be specified in a job Prediction is the process of forming a linear function of the vector of fixed and random effects in the linear model to obtain an estimated or predicted value for a quantity of interest It is primarily used for predicting tables of adjusted means If a table is based on a subset of the explanatory variables then the other variables need to be accounted for It is usual to form a predicted value either at specified values of the remaining variables or averaging over them in some way 10 2 Tabulation A tabulate directive is provided to enable simple summaries of the data to be formed for the purpose of checking the structure of the data The summaries are based on the same records as are used in the analysis of the model i e leaving out records eliminated from the analysis because of missing values in variates and factors in the model Multiple tabulate statements are permitted either immediately before or after the linear mo
181. dicate that some of the original code is the display omitted Data examples are displayed in larger boxes in the body of the text see for example page 42 Other conventions are as follows e keyboard key names appear in SMALLCAPS for example TAB and ESC e example code within the body of the text is in this size and font and is highlighted in bold type see pages 33 and 49 e in the presentation of general ASReml syntax for example path asreml basename as arguments typewriter font is used for text that must be typed verbatim for example asreml and as after basename in the example italic font is used to name information to be supplied by the user for exam ple basename stands for the name of a file with an as filename extension square brackets indicate that the enclosed text and or arguments are not always required e ASReml output is in this size and font see page 35 e this font is used for all other code Some theory The linear mixed model Introduction Direct product structures Variance structures for the errors R structures Variance structures for the random effects G structures Estimation Estimation of the variance parameters Estimation prediction of fixed and random effects What are BLUPs Combining variance models Inference Random effects Tests of hypotheses variance parameters Diagnostics Inference Fixed effects Introduction Incremental and Conditional Wald S
182. e 6 2 for example at site 1 row will fit row as a factor only for site 1 New e a complete set of conditional terms are specified by omitting the level spec ification in the at f function provided the correct number of levels of f is specified in the field definitions Otherwise a list of levels may be specified at f b creates a series of model terms representing b nested within a for any model term b A model term is created for each level of a each has the size of b For example if site and geno are factors with 3 and 10 lev els respectively then for at site geno ASReml constructs 3 model terms at site 1 geno at site 2 geno at site 3 geno each with 10 levels this is similar to forming an interaction except that a separate model term is created for each level of the first factor this is useful for random terms when each component can have a different variance The same effect is achieved by using an interaction e g site geno and associating a DIAG variance structure with the first component see Section 7 5 6 6 Alphabetic list of model functions Table 6 2 presents detailed descriptions of the model functions discussed above Note that some three letter function names may be abbreviated to the first letter Table 6 2 Alphabetic list of model functions and descriptions model function action and t r overlays adds r times the design matrix for model term t to the existing a t r design matrix Specific
183. e 0 O 1 line is called the variance header line In general the first two elements of this line refer to the R structures and the third el ement is the number of G structures In this case 0 0 tells ASReml that there are no ex plicit R structures but there is one G structure 1 The next two lines define the G structure The first line a G structure header line links the structure that follows to a term in the lin ear model rep1 and indicates that it involves NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89 asd skip 1 yield mu variety r repl oO 7 repl i 40 IDV 0 1 one variance model 1 a 2 would mean that the structure was the direct product of two variance models The second line tells ASReml that the variance model for replicates is IDV of order 4 o7 The 0 1 is a starting value for yr 02 02 a starting value must be specified Finally the second element 0 on the last line of the file indicates that the effects are in standard order There is almost always a O no sorting in this position for G structures The following points should be noted the 4 on the final line could have been written as rep1 to give repl 0 IDV 0 1 This would tell ASReml that the order or dimension of the IDV variance model is equal to the number of levels in rep1 4 in this case when specifying G structures the user should ensure that one scale parameter is present ASReml does
184. e 3 traits log y1 yz 1 and Yz 3 are indicated This diagnostic strategy works better when based on grouped data regressing log standard deviation on log mean Also SIND RES 16 2 35 6 58 5 64 indicates that for the 16th data record the residuals are 2 35 6 58 and 5 64 times the respective standard deviations The standard deviation used in this 13 Description of output files 202 test is calculated directly from the residuals rather than from the analysis They are intended to flag the records with large residuals rather than to pre cisely quantify their relative size They are not studentised residuals and are generally not relevant when the user has fitted heterogeneous variances This is nin89a res Convergence sequence of variance parameters Iteration 1 2 3 4 5 6 LogL 401 827 400 780 399 807 399 353 399 326 399 324 Change 59 80 83 21 5 1 Adjusted 0 0 0 0 0 0 StepSz 0 316 0 562 1 000 1 000 1 000 1 000 5 0 500000 0 538787 0 589519 0 639457 0 651397 0 654445 0 5 6 0 500000 0 487564 0 469768 0 448895 0 440861 0 438406 0 6 Plot of Residuals 24 8730 15 9145 vs Fitted values 16 7724 35 9355 RvE f 1 1 2 a 1 3 11 M2 aAa i i 1 211 24 1311 112 1 1 1 411 111122 12 1 it 42 13 1131111132 2 12 2 i 1 42 2 21221 Ti 1 2 i 24 21 2 13231113 2 2 ae eae Ailes s es Si eee 1 1 11 1 Z 31113 12 1 2 1 1 1 a1 1 12 ii 1 1 1i 11 1 gt 11 i 11 a i
185. e default for error NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89 asd skip 1 yield mu variety repl Important The error term is always present in the model but its variance structure does not need to be formally declared when it has the default IID structure 2a Random effects RCB analysis The random effects RCB model has 2 random terms to indicate that the total variation in the data is comprised of 2 components a ran dom replicate effect up N 0 7 02 where pr 02 02 and error as in 1 This model in volves both the original implicit IID R struc ture and an implicit IID G structure for the random replicates In ASReml e IID variance structure is the default for ran dom terms in the model NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89 asd skip 1 yield mu variety r repl For this reason the only change to the former command file is the insertion of r before repl Important All random terms other than error which is implicit must be written after r in the model specification line s 7 Command file Specifying variance structures 109 See Section 7 4 See page 120 See Sections 2 1 and 7 5 2b Random effects RCB analysis with a G structure specified This model is equivalent to 2a but we explic itly specify the G structure for repl that is ur N 0 yro2I to introduce the syntax Th
186. e giv SKIP n The giv file presented in the code box gives the following G inverse matrix the named file must have a giv or grm extension the G inverse files must be specified on the line s immediately prior to the data file line after any pedigree file up to 98 G inverse matrices may be de fined the file must be free format with three num bers per line namely row column value defining the lower triangle row wise of the matrix the file must be sorted column within row DOOAONAAORPWNH Ee 9 1 2 3 4 5 5 6 T 8 9 PRR 1 1 0666667 0 2666667 1 0666667 1 0666667 0 2666667 1 0666667 1 0666667 10 9 0 2666667 10 10 1 0666667 11 11 1 0666667 12 11 0 2666667 12 12 1 0666667 every diagonal element must be represented missing off diagonal elements are assumed to be zero cells the file is used by associating it with a factor in the model The number and order of the rows must agree with the size and order of the associated factor the SKIP n qualifier tells ASReml to skip n header lines in the file I 0 1 067 0 267 0 e 0 267 1 067 9 Command file Genetic analysis 155 The giv file can be associated with a factor in two ways e the first is to declare a G structure for the model term and to refer to the giv file with the corresponding identifier GIV1 GIV2 GIV3 for example animal 1 for a one dimensional structure put the scal
187. e heterogeneity is associated with the two level factor tmt the analysis is equivalent to a bivariate analysis in which the two traits correspond to the two levels of tmt namely sqrt rootwt for control and treated The model for each trait is given by Yj XTj Zouo Zur e j c t 15 9 where y is a vector of length n 132 containing the sqrtroot values for variate j j c for control and j t for treated 7 corresponds to a constant term and wy and ur correspond to random variety and run effects The design ma trices are the same for both traits The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures var Uy o Is var ur a Ies and var e 01432 The bivariate model can be written as a direct extension of 15 9 namely y Io X 7 4 Le Zy Uy Lo Zr up e 15 10 where y yl yj Uy ul wi ur u uy and e e ey There is an equivalence between the effects in this bivariate model and the uni variate model of 15 7 The variety effects for each trait u in the bivariate model are partitioned in 15 7 into variety main effects and tmt variety in teractions so that uy 12 u1 ug There is a similar partitioning for the run effects and the errors see table 15 9 15 Examples 279 Table 15 9 Equivalence of random effects in bivariate and univariate analyses bivariate univariate effects model
188. e pa animal O GIV1 0 12 rameter 0 12 in this case after the GIVg identifier site variety 2 for a two dimensional structure site 0 CORUH 0 5 8 1 5 variety 0 GIV1 e the second is for one dimensional structures in this case the giv structure can be directly associated with the term using the giv f 7 model function which associates the ith giv file with factor f for example giv animal 1 0 12 is equivalent to the first of the preceding examples The example continued Below is an extension of harvey as to use harvey giv which is partly shown to the right This G inverse matrix is an identity matrix of order 74 scaled by 0 5 that is 0 52 This model is simply an example which is easy to verify Note that harvey giv is specified on the line immediately preceding harvey dat Model term specification associating the harvey giv structure to the coding of sire takes precedence over the relationship matrix structure implied by the P qualifier for sire In this case the P is being used to amalgamate animals and sires into a single list command file giv file GIV file example 01 01 5 animal P 02 02 5 sire P 03 03 5 dam 04 04 5 lines 2 05 05 5 damage adailygain harvey ped ALPHA r harvey giv giv structure file 72 T2 49 harvey dat T3 ES 5 adailygain mu line r giv sire 1 25 74 74 5 10 Tabulation of the data and prediction from the model Introduction Tabulation Prediction Un
189. e required see Table 7 5 e FACVk models CV for covariance are an alternative formulation of FA models in which is modelled as TI W where I is a matrix of loadings on the covariance scale and W is diagonal The parameters in FACV are specified in the order loadings T followed by variances W when k is greater than 1 constraints on the elements of I are required see Table 7 5 are related to those in FA by T DF and Y DED XFAk X for extended is the third form of the factor analytic model and has the same parameterisation as for FACV that is amp IT However XFA models have parameters specified in the order diag W and vec T when kis greater than 1 constraints on the elements of I are required see Table 7 5 may not be used in R structures are used in G structures in combination with the xfa f k model term return the factors as well as the effects permit some elements of to be fixed to zero are computationally faster than the FACV formulation for large problems when kis much smaller than w Special consideration is required when using the XFAk model The SSP must be expanded to have room to hold the k factors This is achieved by using the xfa f k model term in place of fin the model For example y site r geno xfa site 2 001 geno xfa site 2 2 geno xfa site 2 0 XFA2 the OWN variance structure is a facility whereby users may specify their ow
190. e study Journal of the Royal Statistical Society A General 164 2 339 355 Bibliography 310 Schall R 1991 Estimation in generalized linear models with random effects Biometrika 78 4 719 27 Searle S R 1971 Linear Models New York John Wiley and Sons Inc Searle S R 1982 Matrix algebra useful for statistics New York John Wiley and Sons Inc Searle S R Casella G and McCulloch C E 1992 Variance Components New York John Wiley and Sons Inc Self S C and Liang K Y 1987 Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non standard conditions Journal of the American Statistical Society 82 605 610 Smith A B Cullis B R Gilmour A R and Thompson R 1998 Multi plicative models for interaction in spatial mixed model analyses of multi environment trial data Proceedings of the 28th International Biometrics Conference Smith A Cullis B R and Thompson R 2001 Analysing variety by environ ment data using multiplicative mixed models and adjustments for spatial field trend Biometrics 57 1138 1147 Smith A Cullis B R and Thompson R 2005 The analysis of crop cul tivar breeding and evaluation trials an overview of current mixed model approaches review Journal of Agricultural Science 143 449 462 Stein M L 1999 Interpolation of Spatial Data Some Theory for Kriging Springer Verlag New York Stevens
191. e values 1 9 to indicate up to 9 strings after the command file name If the argument has 1 character a trailing blank is attached to the character and inserted into the command file If no argument exists a zero is inserted For example asreml rat as alpha beta tells ASReml to process the job in rat as as if it read alpha wherever 1 appears in the command file beta wherever 2 appears and 0 wherever 3 appears Table 12 2 The use of arguments in ASReml in command file on command line becomes in ASReml run abc 1def no argument abcO def abc 1def with argument X abcX def abc 1def with argument XY abcXYdef abc 1def with argument XYZ abcXYZdef abc 1 def with argument XX abcXX def abc 1 def with argument XXX abcXXX def abc 1 def with argument XXX abcXXX def multiple spaces 12 Command file Running the job 186 Warning New Prompting for input Another way to gain some interactive control of a job in the PC environment is to insert tezt in the as file where you want to specify the rest of the line at run time ASReml prompts with tert and waits for a response which is used to compete the line The qualifier may be used anywhere in the job and the line is modified from that point Unfortunately the prompt may not appear on the top screen under some windows operating systems in which case it may not be obvious that ASReml is waiting for a keyboard response Paths and Loops ASReml is designed t
192. e variances xfa site 2 O XFA2 VVVVO 4P4PZ3P Ao 3 initial specific variances 4 1 2 initial loadings for 1st factor 0 3 3 initial loadings for 2nd factor a 2 factor Factor Analytic model for 4 sites with equal variance is specified us ing this syntax The first loading in the second factor is constrained equal to 0 for identifiability P places restrictions on the magnitude of the loadings and the variances to be positive a 2 factor Factor analytic model in which the specific variances are all equal Constraints between and within variance models More general relationships between variance parameters can be defined using the VCCc qualifier placed on the data file definition line e VCC c specifies that there are c constraint lines defining constraints to be applied e the constraint lines occur after the variance header line and any R and G structure lines that is there must be a variance header line e each constraint is specified in a separate line in the form P V P V P is the name of a random model term or the number of a parameter and V is a coefficient P is the primary parameter number x indicates that the next values Vj is a weighting coefficient if the coefficient is 1 you may omit the 1 if the coefficient is 1 you may write P instead of P x 1 the meaning of the coefficients is as follows P V x P V typically V 1 a variance
193. e variety concurrence within runs Assuming Power transformation was Y 0 000 0 500 run is ignored in the prediction except where specifically included Trait variety Power_value Stand_Error Ecode Retransformed approx_SE sqrt yc A1iCombo 14 9532 0 9181 E 223 5982 27 4571 sqrt ye AliCombo 7 9941 0 7993 E 63 9054 12 7790 sqrt yc Bluebelle 13 1033 0 9310 E 171 6969 24 3980 sqrt ye Bluebelle 6 6299 0 8062 E 43 9559 10 6901 sqrt yc C22 16 6679 0 9181 E 277 8192 30 6057 sqrt ye C22 8 9543 0 7993 E 80 1798 14 3140 sqrt yc YRK1 15 1859 0 9549 E 230 6103 29 0012 sqrt ye YRK1 8 3356 0 8190 E 69 4817 13 6534 sqrt yc YRK3 13 3057 0 9549 E 177 0428 25 4106 sqrt ye YRK3 8 1133 0 8190 E 65 8264 13 2894 SED Overall Standard Error of Difference 1 215 exposed BLUP T T T T T T 2 1 o 1 2 3 control BLUP Figure 15 9 BLUPs for treated for each variety plotted against BLUPs for control 15 Examples 282 Interpretation of results Recall that the researcher is interested in varietal tolerance to bloodworms This could be defined in various ways One option is to consider the regression implicit in the variance structure for the trait by variety effects The variance structure can arise from a regression of treated variety effects on control effects namely Uv Buy F E where the slope 8 Ova o2 Tolerance can be defined in terms of the deviations from regression Varieties with large positive d
194. e with the max imal conditional model MCM under which the conditional F statistic is cal culated The MCIM model omits terms fitted after any terms ignored for the conditional test I after in marginality pattern In the example above MCIM ignores variety sow when calculating DenDF for the test of water and ignores water sow when calculating DenDF for the test of variety When DenDF is not available it is often possible though anti conservative to use the residual degrees of freedom for the denominator Kenward and Roger 1997 pursued the concept of construction of Wald type test statistics through an adjusted variance matrix of 7 They argued that it is useful to consider an improved estimator of the variance matrix of which has less bias and accounts for the variability in estimation of the variance parameters There are two reasons for this Firstly the small sample distribution of Wald tests is simplified when the adjusted variance matrix is used Secondly if measures of precision are required for 7 or effects therein those obtained from the adjusted variance matrix will generally be preferred Unfortunately the Wald statistics are currently computed using an unadjusted variance matrix Approximate stratum variances ASReml reports approximate stratum variances and degrees of freedom for sim ple variance components models For the linear mixed effects model with vari ance components setting oF 1 where G oat bj it is oft
195. ecently been reanalysed by Pinheiro and Bates 2000 p338 The data are displayed in Figure 15 12 and are the trunk circumferences in millimetres of each of 5 trees taken at 7 times All trees were measured at the same time so that the data are balanced The aim of the study is unclear though both previous analyses involved modelling the overall growth curve accounting for the obvious variation in both level and shape between trees Pinheiro and Bates 2000 used a nonlinear mixed effects modelling approach in which they modelled the growth curves by a three parameter logistic function of age given by _ 1 1 exp x 2 3 where y is the trunk circumference x is the tree age in days since December 31 1968 1 is the asymptotic height 2 is the inflection point or the time at which the tree reaches 0 5 1 3 is the time elapsed between trees reaching half and about 3 4 of 1 y 15 11 The datafile consists of 5 columns viz Tree a factor with 5 levels age tree age in days since 31st December 1968 circ the trunk circumference and season The last column season was added after noting that tree age spans several years and if 15 Examples 285 this is the orange data circ age Tree 4 a 5 1 2 3 Figure 15 12 Trellis plot of trunk circumference for each tree converted to day of year measurements were taken in either Spring April May or Autumn September October First we demonstrate the fit
196. ecommended that covariates be centred and scaled to have a mean of zero and a variance of approximately one to avoid failure to detect singularities This can be achieved either e externally to ASReml in data file preparation e using RESCALE mean scale where mean and scale are user supplied values for example age rescale 200 10 5 Command file Reading the data 59 5 6 Datafile line The purpose of the datafile line is to e nominate the data file e specify qualifiers to modify the reading of the data the output produced the operation of ASReml Data line syntax NIN Alliance Trial 1989 variety A row 22 column 11 nin89aug asd skip 1 yield mu variety The datafile line appears in the ASReml command file in the form datafile qualifiers e datafile is the path name of the file that contains the variates factors covari ates traits response variates and weight variables represented as data fields see Chapter 4 enclose the path name in quotes if it contains embedded blanks the qualifiers tell ASReml to modify either the reading of the data and or the output produced see Table 5 2 below for a list of data file related qualifiers the operation of ASReml see Tables 5 3 to 5 6 for a list of job control qual ifiers e the data file related qualifiers must appear on the data file line e the job control qualifiers may appear on the data file line or on following
197. ects u N 0 G e we use the terms R structure and G structure to refer to the independent blocks of R and G respectively e R and G structures are typically formed as a direct product of particular variance models e the order of terms in a direct product must agree with the order of effects in the corresponding model term e variance models may be correlation matrices or variance matrices with equal or unequal variances on the diagonal A model for a correlation matrix eg AR1 can be converted to an equal variance form eg AR1V and to a heterogeneous variance form eg AR1H e variances are sometimes estimated as variance ratios relative to the residual variance These issues are fully discussed in Chapter 2 In this chapter we begin by con sidering an ordered sequence of variance structures for the NIN variety trial see Section 7 3 This is to introduce variance modelling in practice We then present the topics in detail Non singular variance matrices When undertaking the REML estimation ASReml needs to invert each variance matrix For this it requires that the matrices be negative definite or positive definite They must not be singular Negative definite matrices will have neg ative elements on the diagonal of the matrix and or its inverse The exception is the XFA model which has been specifically designed to fit singular matrices Thompson et al 2003 Let xv Ag represent an arbitrary quadratic form for x
198. ed They are performed within the engine of ASReml One process involves estimation of 7 and predic tion of u although the latter may not always be of interest for given 0 and y The other process involves estimation of these variance parameters Note that in the following sections we have set 0 1 to simplify the presentation of results Estimation of the variance parameters Estimation of the variance parameters is carried out using residual or restricted maximum likelihood REML developed by Patterson and Thompson 1971 An historical development of the theory can be found in Searle et al 1992 Note firstly that y N X7 H 2 3 2 Some theory 12 where H R ZGZ REML does not use 2 3 for estimation of variance parameters but rather uses a distribution free of 7 essentially based on error contrasts or residuals The derivation given below is presented in Verbyla 1990 We transform y using a non singular matrix L L Lo such that LX l hX 0 baln ken an Y 0 LLHL LHL The full distribution of L y can be partitioned into a conditional distribution namely y lyY2 for estimation of T and a marginal distribution based on y for estimation of y and the latter is the basis of the residual likelihood The estimate of 7 is found by equating y to its conditional expectation and after some algebra we find 7 X H X X H y Estimation of x y is based on the log residual
199. ed when DIAG qualifier is given on a pedigree file line 1 f is the diagonal element of A and oc is the genetic variance The sln file can easily be read into a GENSTAT spreadsheet or an S PLUS data frame Below is a truncated copy of nin89a sln Note that e the order of some terms may differ from the order in which those terms were specified in the model statement e the missing value estimates appear at the end of the file in this example variety LANCER 0 000 0 000 variety BRULE 2 987 2 842 variety REDLAND 4 707 2 978 variety CODY 0 3131 2 961 variety ARAPAHOE 2 954 2 727 variety NE87615 1 035 2 934 variety NE87619 5 939 2 850 variety NE87627 4 376 2 998 mu 1 24 09 2 465 mv_estimates i 21 91 6 729 mv_estimates 2 20328 Delal mv_estimates 3 22 52 6 708 mv_estimates 4 23 49 6 676 mv_estimates 5 22 26 6 698 mv_estimates 6 24 47 OLOT mv_estimates ti 20 14 6 697 mv_estimates 8 25 01 6 691 mv_estimates 9 24 29 6 676 mv_estimates 10 26 30 6 658 mv_estimates 11 24 99 6 590 mv_estimates 12 27 78 6 492 13 Description of output files 195 The yht file The yht file contains the predicted values of the data in the original order this is not changed by supplying row column order in spatial analyses the residuals and the diagonal elements of the hat matrix Figure 13 1 shows the residuals plotted against the fitted values Yhat and a line printer version of this figure is written to the res file Where an observatio
200. ediction from the model 159 For covariate terms fixed or random the associated effect represents the coef ficient of a linear trend in the data with respect to the covariate values These terms should be evaluated at a given value of the covariate or averaged over sev eral given values Omission of a covariate from the predictive model is equivalent to predicting at a zero covariate value which is often inappropriate Interaction terms constructed from factors generate an effect for each combination of the factor levels and behave like single factor terms in prediction Interactions constructed from covariates fit a linear trend for the product of the covariate values and behave like a single covariate term An interaction of a factor and a covariate fits a linear trend for the covariate for each level of the factor For both fixed and random terms a value for the covariate must be given but the factor may be evaluated at a given level averaged over or for random terms omitted Before considering some examples in detail it is useful to consider the conceptual steps involved in the prediction process Given the explanatory variables used to define the linear mixed model the four main steps are a Choose the explanatory variable s and their respective value s for which predictions are required the variables involved will be referred to as the classify set and together define the multiway table to be predicted b Determine wh
201. ee file and see any messages in the output Check that identifiers and pedi grees are in chronological order the A inverse factors are not the same size as the A inverse Delete the ainverse bin file and rerun the job program will need to be recompiled if file is correct Check the details for the distance based vari ance structure Check the distances specified for the distance based variance structure Try increasing workspace Otherwise send problem to VSN Try increasing the memory simplifying the model and changing starting values for the gammas If this fails send the problem to the VSN mailto support asreml co uk for investigation Check the argument 14 Error messages 238 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Reading distances for POWER structure Reading factor names reading Overdispersion factor READING OWN structures Reading the data Reading Update step size Residual Variance is Zero R header SECTIONS DIMNS GSTRUCT R structure header SITE DIM GSTRUCT Variance header SEC DIM GSTRUCT R structure error ORDER SORTCOL MODEL GAMMAS R structures are larger than number of records REQUIRE ASUV qualifier for this R structure REQUIRE I x E R structure POWER structures are the spatial variance models which require a list of distances Dis tances should be in increasing order If the distanc
202. een im plemented in ASReml for examining the adequacy of the assumed variance matrix for either R or G structures or for examining the distributional assumptions re garding e or u Firstly we note that the BLUP of the residual vector is given by e y WB RPy 2 16 It follows that E 0 var R WC W The matrix WC W is the so called extended hat matrix It is the linear mixed effects model analogue of X X X 1X for ordinary linear models The diagonal elements are returned in the yht file by ASReml The variogram has been suggested as a useful diagnostic for assisting with the identification of appropriate variance models for spatial data Cressie 1991 Gilmour et al 1997 demonstrate its usefulness for the identification of the sources of variation in the analysis of field experiments If the elements of the data vector and hence the residual vector are indexed by a vector of spatial coordinates s i 1 n then the ordinates of the sample variogram are given by I sk vij 5 ei 8i amp s9 1g Seg AZ 2 Some theory 19 2 6 New The sample variogram reported by ASReml has two forms depending on whether the spatial coordinates represent a complete rectangular lattice as typical of a field trial or not In the lattice case the sample variogram is calculated from the triple lij1 lij2 viz where liji Si1 851 and lij2 si2 Sj2 are the displacements As there will
203. eg 0 1000 22 AR AutoReg 0 1000 Warning Spatial mapping information for side 1 of order 11 ranges from 1 0 to 22 0 Warning Spatial mapping information for side 2 of order 22 ranges from 1 0 to Fault Last line read was 11 0 2 Sorting data into field order 22 column ARI 0 100000 ninerriO variety id pid raw rep nloc yield lat Model specification variety mu mv_estimates SECTIONS 242 4 1 STRUCT ii 1 1 22 1 i 13 factors defined max 500 6 variance parameters max1500 Final parameter values 0 10000 Last line read was 13 2 242 242 8000 Finished 27 Jul 2005 15 42 31 733 TERM LEVELS GAMMAS 56 1 18 5 a 1 10 6 i 1 11 2 special structures 0 0000 10000E 360 10000 22 column ARI 0 100000 Sorting data into field order 14 Error messages 228 14 5 Information Warning and Error messages ASReml prints information warning and error messages in the asr file The major information messages are in Table 14 2 A list of warning messages together with the likely meaning s is presented in Table 14 1 Error messages with their probable cause s is presented in Table 14 3 Table 14 1 Some information messages and comments information message comment Logl converged BLUP run done JOB ABORTED by USER Logl converged parameters not converged Logl not converged Warning Only one iteration performed Parameters unchanged after one iteration the REML log like
204. eighted analysis is discussed in Section 6 7 separates response from the list of fixed and random terms fixed represents the list of primary fixed explanatory terms that is variates factors interactions and special terms for which analysis of variance ANOVA type tests are required See Table 6 1 for a brief definition of reserved model terms operators and commonly used functions The full definition is in Section 6 6 random represents the list of explanatory terms to be fitted as random effects see Table 6 1 sparse_fixed are additional fixed terms not included in the ANOVA table General rules The following general rules apply in specifying the linear mixed model all elements in the model must be space separated the character separates the response variables s from the explanatory vari ables in the model 6 Command file Specifying the terms in the mixed model 84 Choose labels e data fields are identified in the model by their labels ana wih avai labels are case sensitive siidi labels may be abbreviated truncated when used in the model line but care must be taken that the truncated form is not ambiguous If the truncated form matches more than one label the term associated with the first match is assumed model terms may only appear once in the model line repeated occurrences are ignored model terms other than the original data fields are defined the first time they appear on the model line
205. en possible to consider a natural ordering of the variance component parameters including o Based on an idea due to Thompson 1980 ASRem computes approximate stratum degrees of freedom and stratum variances by a modified Cholesky diag onalisation of the average information matrix That is if F is the average infor mation matrix for let U be an upper triangular matrix such that F U U 2 Some theory 25 We define U D U where D is a diagonal matrix whose elements are given by the inverse elements of the last column of U ie dei 1 ujr i 1 r The matrix Ue is therefore upper triangular with the elements in the last column equal to one If the vector g is ordered in the natural way with o being the last element then we can define the vector of so called pseudo stratum variance components by E U 0 Th ence var D The diagonal elements can be manipulated to produce effective stratum degrees of freedom Thompson 1980 viz vi 2E d2 In this way the closeness to an orthogonal block structure can be assessed 3 A guided tour Introduction Nebraska Intrastate Nursery NIN field experiment The ASReml data file The ASReml command file The title line Reading the data The data file line Specifying the terms in the mixed model Tabulation Prediction Variance structures Running the job Description of output files The asr file The sln file The yht file Tabulatio
206. ence SYNTAX change A B now means A A B Contact support asreml co uk for licensing and support FEO OO OR BRR ARG Folder C data asr UG2 manex variety A QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Univariate analysis of yield Using 224 records of 224 read Model term Size miss zero MinNonO Mean MaxNonO 1 variety 56 0 0 1 28 5000 56 2 id 0 1 000 28 50 56 00 3 pid 0 o 1101 2628 4156 4 raw 0 O 21 00 510 5 840 0 5 repl 4 0 0 1 2 5000 4 6 nloc 0 O 4 000 4 000 4 000 7 yield Variate 0 O 1 050 25 53 42 00 8 lat 0 O 4 300 27 22 47 30 9 long 0 1 200 14 08 26 40 10 row 22 0 0 4 11 7321 22 11 column 14 0 0 1 6 3304 11 12 mu 1 4 identity 0 1000 Structure for repl has 4 levels defined Forming 61 equations 57 dense Initial updates will be shrunk by factor 0 316 NOTICE 1 singularities detected in design matrix 1 LogL 454 807 S2 50 329 168 df 1 000 0 1000 2 LogL 454 663 S2 50 120 168 df 1 000 0 1173 3 LogL 454 532 S2 49 868 168 df 1 000 0 1463 4 LogL 454 472 S2 49 637 168 df 1 000 0 1866 5 LogL 454 469 S2 49 585 168 df 1 000 0 1986 6 LogL 454 469 S2 49 582 168 df 1 000 0 1993 7 LogL 454 469 S2 49 582 168 df 1 000 0 1993 Final parameter values 1 0000 0 19932 Source Model terms Gamma Component Comp SE a 3 A guided tour 37 parameter Variance 224 168 1 00000 49 5824 9 08 0 P estimates repl identity 4 0 199323 9 88291 1 12 oU ANOVA Analysis of Variance NumDF DenDF F
207. ence of structures for the NIN data See Section 2 1 Eight variance structures of increasing complexity are now considered for the NIN field trial data see Chapter 3 for an introduction to these data This is to give a feel for variance modelling in ASReml and some of the models that are possible Before proceeding it is useful to link this section to the algebra of Chapter 2 In this case the mixed linear model is y XT Zut e where y is the vector of yield data 7 is a vector of fixed variety effects but would also include fixed replicate effects in a simple RCB analysis and might also include fixed missing value effects when spatial models are considered u N 0 G is a vector of random effects for example random replicate effects and the errors are in e N 0 R The focus of this discussion is on e changes to u and e and the assumptions about these terms e the impact this has on the specification of the G structures for u and the R structures for e 7 Command file Specifying variance structures 108 See Section 6 4 1 Traditional randomised complete block RCB analysis The only random term in a traditional RCB analysis of these data is the residual error term e N 0 o2I The model therefore involves just one R structure and no G struc tures u 0 In ASReml e the error term is implicit in the model and is not formally specified on the model line e the IID variance structure R 07I is th
208. er parameters conditional on these values To preceed with further iterations without fixing the matrix values would ultimately make the matrix such that it would be judged singular resulting the analysis being aborted 5 Command file Reading the data 75 List of rarely used job control qualifiers action modifies the algorithm used for choosing the order for solv ing the mixed model equations A new algorithm devised for release 2 is now the default and is formally selected by EQORDER 3 The algorithm used for release 1 is essentially that selected by EQORDER 1 The new order is generally su perior EQORDER 1 instructs ASReml to process the equa tions in the order they are specified in the model Generally this will make a job much slower if it can run at all It is useful if the model has a suitable order as in the IBD model Y m r giv id id giv id invokes a dense inverse of an IBD matrix and id has a sparse structured inverse of an additive relationship matrix While EQORDER 3 generates a more sparse solution EQORDER 1 runs faster forces another mod n 10 rounds of iteration after apparent convergence The default for n is 1 This qualifier has lower priority than MAXIT and ABORTASR NOW see MAXIT for de tails Convergence is judged by changes in the REML log likelihood value and variance parameters However sometimes the vari ance parameter convergence criteria has not been satisfied ILAST
209. erage over years as well Both of the following predict statements produce the required values predict crop 1 pasture lime PRES year month PRWTS 56 55 56 53 57 63 0 0 O 000 36 0 0 53 23 24 54 54 43 350 0700 21170 0 0 700 0 530 53 56 22 92 19 440 0 360 0 490 220 53 70 220 5116 510 0 5 predict crop 1 pasture lime PRES month year PRWTS 56 36 70 53 0 55 0 0 56 22 56 0 21 22 0 53 53 17 92 53 57 23019 70 63 24 0 44 22 0540 00 05470051 043 0 3616 0350 051 O 0 53 5 0 49 0 5 We have presented both sets of predict statements to show how the weights were derived and presented Notice that the order in PRESENT year month implies that the weight coefficients are presented in standard order with the levels for months cycling within levels for years There is a check which reports if non zero weights are associated with cells that have no data The weights are reported in the pvs file PRESENT counts are reported in the res file Examples Examples are as follows yield mu variety r repl predict variety is used to predict variety means in the NIN field trial analysis Random rep1 is ignored in the prediction yield mu x variety r repl predict variety predicts variety means at the average of x ignoring random repl yield mu x variety repl predict variety x 2 forms the hyper table based on variety and rep1 at the covariate value of 2 and 10 Tabulation of the data and prediction from the model 169 then averages a
210. erms it is often necessary to shorten the names of the component factors in a systematic way for example if Time and Treatment are defined in this order the interaction between Time and Treatment could be specified in the model as Time Treat remember that the first match is taken so that if the label of each field begins with a different letter the first letter is sufficient to identify the term interactions can involve model functions indicates factorial expansion up to 5 way a b is expanded to a b a b a b c d is expanded to abcda ba ca db c b d c d a b c a b d a c d b c d a b c d indicates nested expansion a b is expanded to a a b a b c d e is expanded to a b a c a d e This syntax is detected by the string and the closing parenthesis must occur on the same line and before any comma indicating continuation Any number of terms may be enclosed Each may have prepended to suppress it from the model Each enclosed term may have initial values and qualifiers following For example yield site site lin row r variety at site 1 row 3 col 2 expands to yield site site lin row r site variety at site 1 row 3 at site 1 col 2 6 Command file Specifying the terms in the mixed model 91 Conditional factors A conditional factor is a factor that is present only when another factor has a particular level e individual components can be specified using the at f n function see Tabl
211. ers control the form of the residuals returned in the yht file The predicted DEVIANCE PEARSON PVR PVW RESPONSE WORK values returned in the yht file will be on the linear predictor scale if the WORK or PVW qualifiers are used They will be on the observation scale if the DEVIANCE PEARSON RESPONSE or PVR qualifiers are used produces deviance residuals the signed square root of d h from Table 6 4 where h is the dispersion parameter controlled by the DISP qualifier This is the default writes Pearson residuals YF in the yht file writes fitted values on the response scale in the yht file This is the default writes fitted values on the linear predictor scale in the yht file produces simple residuals y yu produces residuals on the linear predictor scale aan 6 9 Generalized Linear Mixed Models This section was written by Damian Collins There is the capacity to fit a wider class of models which include additional random effects for non normal error distributions The inclusion of random terms in a GLM is usually referred to as a Generalized Linear Mixed Model GLMM For GLMMs ASReml uses what is commonly referred to as penalized quasi 6 Command file Specifying the terms in the mixed model 100 Caution Caution likelihood or PQL Breslow and Clayton 1993 The technique is also known by other names including Schall s technique Schall 1991 pseudo likelihood Wo
212. erved word Fault 12 FORMAT error reading data structures Last line read was nine asd slip 1 Currently defined structures COLS and LEVELS 1 variety 1 56 0 0 0 0 2 id a 1 0 0 0 0 3 pad 1 1 0 0 0 0 4 raw 1 1 0 0 0 0 5 repl 1 4 0 0 0 0 6 nloc I 1 0 0 0 0 7 yield 1 si 0 0 0 0 8 lat an 1 0 0 0 0 9 long 1 1 0 0 0 0 10 row 1 22 0 0 0 0 11 column 1 11 0 0 0 0 filename 12 nine asd 0 0 0 0 0 0 ninerri C data ex manex 12 factors defined max 500 O variance parameters max1500 2 special structures last line read Last line read was nine asd slip 1 12 12 f 0 8000 fault message Finished 27 Jul 2005 15 41 26 379 FORMAT error reading data structures 14 2 Common problems Common problems in coding ASReml are as follows e a variable name has been misspelt variable names are case sensitive e a model term has been misspelt model term functions and reserved words mu Trait mv units are case sensitive e the data file name is misspelt or the wrong path has been given enclose the pathname in quotes if it includes embedded blanks e a qualifier has been misspelt or is in the wrong place e there is an inconsistency between the variance header line and the structure definition lines presented e failure to use commas appropriately in model definition lines e there is an error in the R structure definition lines 14 Error messages 215 e there is an error in the G structure definition lines there is a factor
213. es yield var r rep 0 rep var yield mu var r rep 1 rep mu var first level of var is aliassed and set to Zero yield var trt r rep 1 rep var trt var fully fitted first level of trt is aliassed and set to zero yield mu var trt 8 rep mu var trt var trt var trt r rep first levels of both var and trt are aliassed and set to zero together with subsequent interactions yield mu var trt r rep 8 var trt rep mu var trt If var trt var trt fitted before mu var and trt var trt fully fitted mu var and trt are completely singular and set to zero The order within var trt rep is de termined internally 103 6 Command file Specifying the terms in the mixed model 104 6 12 Analysis of variance table The ANOVA table has 4 forms Source DF F_inc Source DF F_inc F_con M Source DF DDF_inc F_inc P_inc Source DF DDF_con F_inc F_con M P_con depending on whether conditional F statistics are reported requested by the FCON qualifier and whether the denominator degrees of freedom are reported ASReml always reports incremental F statistics F_inc for the fixed model terms in the DENSE partition conditional in the order the terms were nominated in the model Users should study Section 2 6 to understand the contents of this table The conditional maximum model used as the basis for the conditional F statistic is spelt out in the aov file described in section The numerator degrees of freedom for e
214. es 1 525 check lines 526 532 15 Examples 268 wheat asd skip 1 DOPATH 1 PATH 1 y mu weed mv r variety i 2 67 row AR1 0 1 10 column I 0 ARI x I PATH 2 y mu weed mv r variety 12 67 row AR1 0 1 10 column AR1 0 1 AR1 x AR1 IPATH 3 AR1 x AR1 column trend y mu weed pol column 1 mv r variety 12 67 row AR1 0 1 10 column AR1 0 1 PATH 4 AR1 x AR1 Nugget column trend y mu weed pol column 1 mv r variety units 12 67 row AR1 0 1 10 column AR1 0 1 predict var The data fields represent the factors variety row and column a covariate weed and the plot yield yield There are three paths in the ASReml file We begin with the one dimensional spatial model which assumes the variance model for the plot effects within columns is described by a first order autoregressive process The abbreviated output file is 1 LogL 4280 75 S2 0 12850E 06 666 df 0 1000 1 000 0 1000 2 LogL 4268 57 S2 0 12138E 06 666 df 0 1516 1 000 0 1798 3 LogL 4255 89 S2 0 10968E 06 666 df 0 2977 1 000 0 2980 4 LogL 4243 76 S2 88033 666 df 0 7398 1 000 0 4939 5 LogL 4240 59 S2 84420 666 df 0 9125 1 000 0 6016 6 LogL 4240 01 S2 85617 666 df 0 9344 1 000 0 6428 7 LogL 4239 91 S2 86032 666 df 0 9474 1 000 0 6596 8 LogL 4239 88 S2 86189 666 df 0 9540 1 000 0 6668 9 LogL 4239 88 S2 86253 666 df 0 9571 1 000 0 6700 10 LogL 4239 88 S2 86280 666 df 0 9585 1 000 0 6714 Final parameter values 0 9
215. es a two dimensional spatial structure for error but with spatial correla tion in the row direction only that is e N 0 o2I p The variance header line tells ASReml that there is one R struc ture 1 which is a direct product of two vari ance models 2 there are no G structures 0 The next two lines define the components of A structure definition line must be specified for each component For V 021 p the first matrix is an iden tity matrix of order 11 for columns ID the the R structure second matrix is a first order autoregressive NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89aug asd skip 1 yield mu variety f mv 120 11 column ID 22 row AR1 0 3 correlation matrix of order 22 for rows AR1 and the variance scale parameter 2 e a2 is implicit Note the following e placing column and row in the second position on lines 1 and 2 respectively tells ASReml to internally sort the data rows within columns before processing the job This is to ensure that the data matches the direct product structure specified If column and row were replaced with O in these two lines ASReml would assume that the data were already sorted in this order which is not true in this case e the 0 3 on line 2 is a starting value for the autoregressive row correlation Note that for spatial analysis in two dimensions using a separable model a complete
216. es are not obtained from variables the SORT field is zero and the distances are pre sented after all the R and G structures are defined something is wrong in the terms definitions It could also be that the data file is misnamed Check the argument There is probably a problem with the output from MYOWNGDG Check the files including the time stamps to check the gdg file is being formed properly if you read less data than you expect there are two likely explanations First the data file has less fields than implied by the data structure definitions you will probably read half the expected number Second there is an alphanumeric field where a numeric field is expected check the STEP qualifier argument either all data is deleted or the model fully fits the data error with the variance header line Often some other error has meant that the wrong line is being interpreted as the variance header line Commonly the model is written over sev eral lines but the incomplete lines do not all end with a comma an error reading the error model Maybe you need to include mv in the model to stop ASReml discarding records with missing values in the response variable Without the ASUV qualifier the multivariate error variance MUST be specified as US 14 Error messages 239 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Scratch Segmen
217. es nin89 asd and nin89aug asd commenced with a line of column headings Since these headings do not contain embedded blanks we can use ASReml to make a template for the as file by running ASReml with the datafile as the command argument see Chapter 12 For example running the command asreml nin89aug asd writes a file nin89aug as if it does not already exist which looks like Title nin89aug variety id pid raw rep nloc yield lat long row column LANCER 1 NA NA 1 4 NA 4 31 2 11 LANCER 1 NA NA 1 4 NA 4 3 2 421 LANCER 1 NA NA 1 4 NA 4 3 3 631 LANCER 1 NA NA 1 4 NA 4 3 4 841 variety A id pid raw rep nloc yield lat long row column Check Correct these field definitions nin89aug asd SKIP 1 3 A guided tour 35 column mu Specify fixed model lr Specify random model 120 column column AR1 0 1 row row AR1 0 1 This is a template in that it needs editing it has nominated an inappropriate response variable but it displays the first few lines of the data and infers whether fields are factors or variates as follows Missing fields and those with decimal points in the data value are taken as covariates integer fields are taken as simple factors and alphanumeric fields are taken as A factors 3 6 Description of output files job heading notices A series of output files are produced with each ASReml run Running the exam ple as above the primary output is written to nin89 asr
218. eviations have greatest tolerance to bloodworms Note that this is similar to the researcher s original intentions except that the regression has been conducted at the genotypic rather than the phenotypic level In Figure 15 9 the BLUPs for treated have been plotted against the BLUPs for control for each variety and the fitted regression line slope 0 61 has been drawn Varieties with large positive deviations from the regression line include YRK3 Calrose HR19 and WC1403 o a o o 0 5 4 8 a 36 L o o 88 o D 2 o o 5 o 8 0 0 4 S o a 5 o gt o 3 7 a a o a o o a o o g 0 5 4 o o o o o o o o T T T T T T 2 3 o ll 2 3 control BLUP Figure 15 10 Estimated deviations from regression of treated on control for each variety plotted against estimate for control 15 Examples 283 An alternative definition of tolerance is the simple difference between treated and control BLUPs for each variety namely 6 uy Uy Unless 6 1 the two measures and 6 have very different interpretations The key difference is that e is a measure which is independent of inherent vigour whereas 6 is not To see this consider whereas control BLUP exposed BLUP cov uy COV Uy BUvo Wy Over 2 Ova 3 Iwe Iss o2 0 cov uy COV Uy Ur Wy a Over Is 3 oO 6 2 o a a pe O O o o 8 a o o ie oo
219. example since it simply nominates the 7 ages in the data file The same analysis would result if the SPLINE line was omitted and spl age 7 in the model was replaced with spl age An extract of the output file is 1 LogL 20 9043 S2 48 470 5 df 0 1000 1 000 2 LogL 20 9017 S2 49 022 5 df 0 9266E 01 1 000 3 LogL 20 8999 S2 49 774 5 df 0 8356E 01 1 000 4 LogL 20 8996 S2 50 148 5 df 0 7937E 01 1 000 5 LogL 20 8996 52 50 213 5 df 0 7866E 01 1 000 Final parameter values 0 78798E 01 1 0000 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients spl age 7 1 49 97 4813 12 9 1 0 Residual Variance 3 51 50 1888 0 0 1 0 Source Model terms Gamma Component Comp SE C spl age 7 5 5 0 787457E 01 3 95215 0 40 OP Variance vi 5 1 00000 50 1888 1 33 OF Analysis of Variance NumDF DenDF F_inc Prob 7 mu 1 3 5 1382 80 004 3 age 1 3 5 217 60 lt 00L Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives Estimate Standard Error T value T prev 3 age 1 0 814772E 01 0 552336E 02 14 75 7 mu il 24 4378 5 75429 4 25 6 spl age 7 5 effects fitted Finished 19 Aug 2005 10 08 11 980 LogL Converged The REML estimate of the smoothing constant indicates that there is some non linearity The fitted cubic smoothing spline is presented in Figure 15 13 The fitted values were obtained from the pvs file The four points below the line were the spr
220. ey should be in increasing order and adequately cover the range of the data or ASReml will modify them before they are applied If you choose to spread them over several lines use a comma at the end of incomplete lines so that ASReml will to continue read ing values from the next line of input If the explicit points do not adequately cover the range a message is printed and the values are rescaled unless NOCHECK is also specified Inade quate coverage is when the explicit range does not cover the midpoint of the actual range See KNOTS PVAL and SCALE reduces the update step sizes of the variance parameters The default value is the reciprocal of the square root of MAXIT It may be set between 0 01 and 1 0 The step size is increased towards 1 each iteration Starting at 0 1 the sequence would be 0 1 0 32 0 56 1 This option is useful when you do not have good starting values especially in multivariate analyses forms a new factor t derived from an existing factor v by selecting a subset p of its levels Missing values are transmitted as missing and records whose level is zero are transmitted as zero The qualifier occupies its own line after the datafile line but before the linear model e g I SUBSET EnvC Env 3 5 8 9 15 21 33 defines a reduced form of the factor Env just selecting the environments listed It might then be used in the model in an interaction The intention is to simplify the model specification in MET
221. f residu res file and graphics file als intermediate results asl file given if the DL command line option is used 13 Description of output files 210 Table of output objects and where to find them ASReml output object found in comment mean variance rela tionship observed variance covariance matrix formed from BLUPs and residuals phenotypic variance plot of residuals against field position possible outliers predicted fitted val ues at the data points predicted values REML log likelihood res file res file pvc file graphics file res file yht file pvs file asr file for non spatial analyses ASReml prints the slope of the regression of log abs residual against log predicted value This regression is ex pected to be near zero if the variance is inde pendent of the mean A power of the mean data transformation might be indicated otherwise The suggested power is approximately 1 b where b is the slope A slope of 1 suggests a log transfor mation This is indicative only and should not be blindly applied Weighted analysis or identi fying the cause of the heterogeneity should also be considered This statistic is not reliable in ge netic animal models or when units is included in the linear model because then the predicted value includes some of the residual for an interaction fitted as random effects when the first outer dimension is smaller than
222. f the symmetric matrix speci fied row wise finding reasonable initial values can be a problem If initial values are written on the next line in the form q O where q is 1 2 and t is the number of traits ASReml will take half of the phenotypic variance matrix of the data as an initial value see as file in code box for example 8 Command file Multivariate analysis 145 e the special qualifiers relating to multivariate analysis are ASUV and ASMV t see Table 5 4 for detail to use an error structure other than US for the residual stratum you must also specify ASUV see Table 5 4 and include mv in the model if there are missing values to perform a multivariate analysis when the data have already been ex panded use ASMV t see Table 5 4 tis the number of traits that ASReml should expect the data file must have t records for each multivariate record although some may be coded missing 8 4 The output for a multivariate analysis Below is the output returned in the asr file for this analysis ASReml 1 630 01 Jun 2005 Orange Wether Trial 1984 88 Build 7 01 Jul 2005 32 bit 13 Jul 2005 09 38 00 928 32 00 Mbyte Windows wether Licensed to Arthur Gilmour Fakk ak k k ak ak ak 3k 3k 3k ak ak 3k 3k 3k 3k ak ak ak 3k 3k 3K k CCI GIGI I a I A ICICI KK 3K 3 3k K ak 21 21 21 21 K a Ak kk ak K K SYNTAX change A B now means A A B Contact support asreml co uk for licensing and support FEO OO
223. fiable 16 IID 7 inbreeding coefficients 152 194 Incremental Wald F Statistics 20 information matrix expected 13 observed 13 initial values 119 input file extension BIN 44 DBL 44 bin 42 44 csv 43 dbl 42 44 pin 171 interactions 90 Introduction 19 isotropic covariance model 10 job control options 183 qualifiers 63 key output files 190 likelihood log residual 12 residual 12 longitudinal data 2 balanced example 284 marginal distribution 12 Mat rn variance structure 129 measurement error 112 MET 9 meta analysis 2 9 missing values 43 94 101 195 NA 43 in explanatory variables 101 in response 101 mixed effects 7 model 7 mixed model 7 equations 14 multivariate 143 specifying 32 model animal 149 302 correlation 10 covariance 11 Index 315 formulae 83 random regression 11 sire 149 model building 139 moving average 93 multi environment trial 2 9 multivariate analysis 142 278 example 292 half sib analysis 293 Nebraska Intrastate Nursery 27 Negative binomial distribution 98 non singular matrices 106 nonidentifiable 16 objective function 14 observed information matrix 13 operators 85 options command line 179 ordering of terms 102 orthogonal polynomials 94 outliers 210 output files 35 multivariate analysis 145 objects 209 output file extension aov 189 196 apj 189 asl 189 asp 189
224. field In the side example for two existing fields Germ and Total containing counts we form the ArcSin for their ratio ASG by copying the Germ field and applying the ArcSin transformation using the Total field as sample size ICOS SIN s takes cosine and sine of the data vari Day able with period s having default 2m CosDay Day COS 365 omit s if data is in radians set s to 360 if data is in degrees ID D lt gt v Dv discards records which have v or yield D lt 0 ID lt D lt v missing value in the field if Dis used yield D lt 1 D gt 100 ID gt D gt v after A or I v should refer to the en coded factor level rather than the value in the data file see also Section 4 2 Use D to discard just those records with a missing value in the field InitialWt D New New New New New 5 Command file Reading the data 54 List of transformation qualifiers and their actions with examples qualifier argument action examples DOM EXP Jddm Jmmd Jyyd IM M lt gt IM lt M lt IM gt M gt MAX MIN MOD MM INA NORMAL REPLACE RESCALE copies and converts additive marker covariables 1 0 1 to dominance marker covariables see below takes antilog base e no argument re quired Jddm converts a number representing a date in the form ddmmccyy ddmmyy or ddmm into days Jmmd converts a date in the form ccyymmdd yymmdd or
225. fields as missing values and NA are also honoured as missing values If you wish to read blank fields as zeros include the string BZ e the string BM switches back to blank missing mode e the string Tc moves the last character read pointer to line position c so that the next field starts at position c 1 For example TO goes back to the beginning of the line e the string D invokes debug mode A format showing these components is FORMAT D 314 8X A6 3 2x F5 2 4x BZ 2011 and is suitable for reading 27 fields from 2 data records such as 111122223333xxxxxxxxALPHAFxx 4 12xx 5 32xx 6 32 xxxx123 567 901 345 7890 IMERGE c f SKIP n MATCH a b may be specified on a line following the datafile line The purpose is to combine data fields from the primary data file with data fields from a secondary file f The effect is to open the named file skip n lines and then insert the columns from the new file into field positions starting at position c If MATCH a b is specified ASReml checks that the field a 0 lt a lt c has the same value as field b If not it is assumed that the merged file has some missing records and missing values are inserted into the data record and the line from the MERGE file is kept for comparison with the next record At this stage it is expected that the lines in the MERGE file are in the same order as the corresponding lines occur in the primary data file and that there are no extraneous
226. gel B J Verbyla A P and Thompson R 1998 Spatial analysis of multi environment early generation trials Biometrics 54 1 18 Cullis B R Smith A B and Thompson R 2004 Perspectives of anova reml and a general linear mixed model in N M Adams M J Crowder D J Hand and D A Stephens eds Methods and Models in Statistics in honour of Professor John Nelder FRS pp 53 94 Dempster A P Selwyn M R Patel C M and Roth A J 1984 Statisti cal and computational aspects of mixed model analysis Applied Statistics 33 203 214 Diggle P J Ribeiro P J J and Christensen O F 2003 An introduction to model based geostatistics in J Moller ed Spatial Statistics and Compu tational Methods Springer Verlag pp 43 86 Draper N R and Smith H 1998 Applied Regression Analysis John Wiley and Sons New York 3rd Edition Dutkowski G and Gilmour A R 2001 Modification of the additive rela tionship matrix for open pollinated trials Developing the Eucalypt of the Future 10 15 September Valdivia Chile p 71 Engel B 1998 A simple illustration of the failure of PQL IRREML and APHL as approximate ml methods for mixed models for binary data Biometrical Journal 2 141 154 Engel B and Buist W 1998 Bias reduction of approximate maximum like lihood estimates for heritability in threshold models Biometrics 54 1155 1164 Engel B and Keen A 1994 A simple app
227. gree_file qualifiers the qualifiers are listed in Table 9 1 the identities individual male_parent female_parent are merged into a single list and the inverse relationship is formed before the data file is read when the data file is read data fields with the P qualifier are recoded according to the combined identity list the inverse relationship matrix is automatically associated with factors coded from the pedigree file unless some other covariance structure is specified The inverse relationship matrix is specified with the variance model name AINV the inverse relationship matrix is written to ainverse bin if ainverse bin already exists ASReml assumes it was formed in a previous run and has the correct inverse ainverse bin is read rather than the inverse being reformed unless MAKE is specified this saves time when performing repeated analyses based on a particular pedigree delete ainverse bin or specify MAKE if the pedigree is changed between runs identities are printed in the s1ln file identities should be whole numbers less than 200 000 000 unless ALPHA is specified pedigree lines for parents must precede their progeny unknown parents should be given the identity number 0 if an individual appearing as a parent does not appear in the first column it is assumed to have unknown parents that is parents with unknown parent age do not need their own line in the file identities may ap
228. h models by y D 5CD where D is a diagonal matrix of variances and C is a correlation matrix with elements given by cj l 7tl The coding for this is yl y y5 y7 yiO Trait tmt Tr tmt 129 14 S2 Tr O EXPH 5 100 200 300 300 300 13 5 7 10 Note that it is necessary to fix the scale parameter to 1 S2 1 to ensure that the elements of D are identifiable Abbreviated output from this analysis is 1 LogL 195 598 S2 1 0000 60 2 LogL 179 036 S2 1 0000 60 3 LogL 175 483 S2 1 0000 60 4 LogL 173 128 S2 1 0000 60 5 LogL 171 980 S2 1 0000 60 6 LogL 171 615 2 1 0000 60 7 LogL 171 527 S2 1 0000 60 8 LogL 171 504 S2 1 0000 60 9 LogL 171 498 2 1 0000 60 10 LogL 171 496 S2 1 0000 60 Source Model terms Gamma Residual POW EXP 5 0 906917 Residual POW EXP 5 60 9599 Residual POW EXP 5 72 9904 Residual POW EXP 5 309 259 Residual POW EXP 5 436 380 Residual POW EXP 5 382 369 Covariance Variance Correlation Matrix POWER 61 11 0 8227 0 6769 0 5569 54 88 72 80 0 8227 0 6769 93 12 123 5 309 7 0 8227 91 02 120 T 302 7 437 1 63 57 84 34 211 4 305 3 Analysis of Variance DF 8 Trait 1 tmt 9 Tr tot PeO 1 components constrained df df df df df df df df df df Component 0 906917 60 9599 72 9904 309 259 436 380 382 369 0 4156 0 5051 0 6140 0 7462 382 9 F_inc 127 95 0 00 4 75 Comp SE 21 Be 1 99 2 22 2 2 74 89 12 52 oOo o 6 o Oo aa a eas Sa Ss The last two mo
229. he data file and the corresponding primary output file along with a description of the problem There is an ASReml discussion list If you would like to join change your email ad dress or be removed from the list email arthur gilmour dpi nsw gov au with your request The address for messages to the list is ASRem1 L dpi nsw gov au 1 Introduction 5 There is a User Area on the website http www VSNi co uk select ASRem1l and then User Area which contains contributed material that may be of assistance It includes an ASReml tutorial in the form of sixteen sets of slides with audio mp3 discussion The sessions last about 20 minutes each 1 6 Typographic conventions A hands on approach is the best way to develop a working understanding of a new computing package We therefore begin by presenting a guided tour of ASReml using a sample data set for demonstration see Chapter 3 Throughout the guide new concepts are demonstrated by example wherever possible In this guide you will find framed sample An example ASReml code box boxes to the right of the page as shown here bola bpe hiskiigite dactions These contain ASReml command file sample of code currently under code Note that discussion remaining code is not the code under discussion is highlighted in T highlighted bold type for easy identification indicates that some of the the continuation symbol is used to original code is omitted from in
230. here for units the default IDV NIN Alliance Trial 1989 variety A id row 22 column i1 nin89aug asd skip 1 yield mu variety r units If mv 120 11 column AR1 0 3 22 row AR1 0 3 structure is assumed The units term is often fitted in spatial models for field trial data to allow for a nugget effect 4 Two dimensional separable autoregressive spatial model with random replicate effects This is essentially a combination of 2b and 3c to demonstrate specifying an R structure and a G structure in the same model The variance header line 1 2 1 indicates that there is one R structure 1 that involves two variance mod els 2 and is therefore the direct product of two matrices and there is one G structure 1 The R structures are defined first so the next two lines are the R structure definition lines for e as in 3b The last two lines are the G structure definition lines for repl as in 2b In this case V 02 yrl Ec Pc Ur pr NIN Alliance Trial 1989 variety A id row 22 column 11 nin89aug asd skip 1 yield mu variety r repl if my ye 11 column AR1 0 3 22 row AR1 0 3 repl i repl O IDV 0 1 7 Command file Specifying variance structures 113 See Section 7 7 Important 5 Two dimensional separable autore gressive spatial model defined as a G structure This model is equivalent to 3c but with the spatial model defined as a G structure rather than an R stru
231. hich are different from A Z so that 61 equalities can be specified 0 and mean unconstrained A colon generates a sequence viz a e is the same as abcde New e Putting as the first characterin s makes the interpretation of codes absolute so that they apply across structures New e Putting as the first characterin s indicates that numbers are repeat counts A Z are equality codes and only is unconstrained Thus 3A2 is equiv alent to OAAA00 or Oaaa00 Examples are presented in Table 7 5 Table 7 5 Examples of constraining variance parameters in ASReml ASReml code action ABACBAOCBA constrain all parameters corresponding to A to be equal similarly for B and C The 7th parameter would be left uncon strained This sequence applied to an unstructured 4 x 4 matrix would make it banded that is A BA CBA OCBA site gen 2 G header line this example defines a structure for the site 0 US 3 O0AOAAO GPUPUUP genotype by site interaction effects in a TE oh ol xe MET in which the genotypes are inde gen pendent random effects within sites but are correlated across sites with equal co variance 7 Command file Specifying variance structures 138 difficult Examples of constraining variance parameters in ASReml ASReml code action site 0 FA2 G4PZ3P4P Q0000000VVVV 4 9 initial values for Ist factor 0 3 1 initial values for 2nd factor first fixed at 0 4 2 init values for sit
232. ial process see Other ex amples below Missing values remain missing replaces the variate with uniform ran dom variables having range 0 v ISEED 848586 treat L CAB CYR treat SET 1 1 1 group treat SET 1 22334 Anorm A SETN 2 5 10 Aeff A SETU 5 10 year 3 SUB 66 67 68 plot Udat V3 SEQ 0 Uniform 4 5 is equivalent to Udat Uniform 4 5 5 Command file Reading the data 56 New List of transformation qualifiers and their actions with examples qualifier argument action examples Vtarget value assigns value to data field target over V3 2 5 writing previous contents subsequent transformation qualifiers will operate on data field target Vfield assigns the contents of data field field V10 V3 to data field target overwriting previ V1i1 block ous contents subsequent transforma V12 VO tion qualifiers will operate on data field target If field is O the number of the data record is inserted QTL marker transformations IMM s associates marker positions in the vector s based on the Haldane mapping function with marker variables and replaces missing values in a vector of marker states with expected values calculated using distances to non missing flanking markers This transformation will normally be used on a G n factor where the n variables are the marker states for n markers in a linkage group in map order and coded 1 1
233. ich variables should be averaged over to form predictions The values to be averaged over must also be defined for each variable the variables involved will be referred to as the averaging set The combination of the classify set with these averaging variables defines a multiway hyper table Note that variables evaluated at only one value for example a covariate at its mean value can be formally introduced as part of the classifying or averaging set c Determine which terms from the linear mixed model are to be used in forming predictions for each cell in the multiway hyper table in order to give appropriate conditional or marginal prediction d Choose the weights to be used when averaging cells in the hyper table to produce the multiway table to be reported Note that after steps a and b there may be some explanatory variables in the fitted model that do not classify the hyper table These variables occur in terms that are ignored when forming the predicted values It was concluded above that fixed terms could not sensibly be ignored in forming predictions so that variables 10 Tabulation of the data and prediction from the model 160 should only be omitted from the hyper table when they only appear in random terms Whether terms derived from these variables should be used when forming predictions depends on the application and aim of prediction The main difference in this prediction process compared to that described by Lane an
234. iction to include only model terms in t It can be used for example to form a table of slopes as in HI mu X variety X variety predict variety X 1 onlyuse X X variety IUSE t causes ASReml to set up a prediction model based on the default rules and then adds the terms listed in t Printing IDEC n gives the user control of the number of decimal places reported New in the table of predicted values where n is 0 9 The default is 4 G15 9 format is used if n exceeds 9 When VVP or SED are used the values are displayed with 6 significant digits unless n is specified and even then the values are displayed with 9 significant digits PLOT a instructs ASReml to attempt a plot of the predicted values This New qualifier is only applicable in versions of ASReml linked with the Winteracter Graphics library If there is no argument ASReml produces a figure of the predicted values as best it can The user can modify the appearance by typing lt Esc gt to expose a menu or with the plot arguments listed in Table 10 2 PRINTALL instructs ASReml to print the predicted value even if it is not of an estimable function By default ASReml only prints predic tions that are of estimable functions SED requests all standard errors of difference be printed Normally only an average value is printed TDIFF requests t statistics be printed for all combinations of predicted values ARANGE AEN O requests ASReml to scan the predicted values
235. ile Reading the data 73 List of rarely used job control qualifiers qualifier action DATAFILE f DENSE n IDF n ASReml prints its standard reports as if it had completed the iteration normally but since it has not completed it some of the information printed will be incorrect In particular variance information on the variance parameters will always be wrong Standard errors on the estimates will be wrong unless n 3 Residuals are not available if n 1 Use of n 3 or n 2 will halve the processing time when compared to the alternative of using MAXIT 1 However MAXIT 1 does result in a complete and correct output report specifies the datafile name replacing the one obtained from the datafile line It is required when different PATHS see DOPATH in Table 12 3 of a job must read different files The SKIP qualifier if specified will be applied when reading the file sets the number of equations solved densely up to a maximum of 5000 By default sparse matrix methods are applied to the random effects and any fixed effects listed after random fac tors or whose equation numbers exceed 800 Use DENSE nto apply sparse methods to effects listed before the r reduc ing the size of the DENSE block or if you have large fixed model terms and want them included in the ANOVA table Individual model terms will not be split so that only part is in the dense section n should be kept small lt 100 for faster processi
236. iles in Excel Binary format 41 4 Data file preparation 42 4 1 Introduction The first step in an ASReml analysis is to prepare the data file Data file prepara tion is described in this chapter using the NIN example of Chapter 3 for demon stration The first 25 lines of the data file are as follows ARAPAHOE 5 1105 661 1 SCOUT66 10 1110 511 1 NE83498 12 1112 492 NE84557 13 1113 509 NE83432 14 1114 268 NE85556 15 1115 633 NE85623 16 1116 513 CENTURK78 17 1117 632 PRPRPRPPR Boop Bop NE86482 21 1121 560 1 HOMESTEAD 22 1122 566 LANCOTA 23 1123 514 1 NE86501 24 1124 635 1 NE86503 25 1125 840 1 variety id pid raw repl nloc yield lat long row column BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602 1 4 30 1 4 3 22 8 19 1 4 33 05 4 3 24 20 1 NE83404 6 1106 605 1 4 30 25 4 3 25 2 21 1 NE83406 7 1107 704 1 4 35 2 4 3 26 4 22 1 NE83407 8 1108 388 1 4 19 4 8 6 1 2 1 2 CENTURA 9 1109 487 1 4 24 35 8 6 2 4 2 2 4 25 55 8 6 3 6 3 2 COLT 11 1111 502 1 4 25 1 8 6 4 8 4 2 24 6 8 6 6 5 2 25 45 8 6 7 2 6 2 13 4 8 6 8 4 7 2 31 65 6 6 9 6 3 2 25 65 8 6 10 8 9 2 1 4 31 6 8 6 12 10 2 NORKAN 18 1118 446 1 4 22 3 8 6 13 2 11 2 KS8831374 19 1119 684 1 4 34 2 8 6 14 4 12 2 TAM200 20 1120 422 1 4 21 1 8 6 15 6 13 2 4 28 8 6 16 8 14 2 1 4 28 3 8 6 18 15 2 4 25 7 8 6 19 2 16 2 4 31 75 8 6 20 4 17 2 4 42 8 6 21 6 18 2 4 2 The data file The standard fo
237. imary data file Typically the primary data file will just contain INCLUDE statements identifying the subfiles to include For example you may have data from a series of related experiments in separate data files for individual analysis The primary data file for the subsequent combined analysis would then just contain a set of INCLUDE statements to specify which experiments were being combined If the subfiles have CSV format they should all have it and the CSV file should be declared on the primary datafile line This option is not available in combination with MERGE 5 8 Job control qualifiers The following tables list the job control qualifiers These change or control various aspects of the analysis Job control qualifiers may be placed on the datafile line and following lines They may also be defined using an environment variable called ASREML_QUAL The environment variable is processed immediately after the datafile line is processed All qualifier settings are reported in the asr file Use the Index to check for examples or further discussion of these qualifiers Important Many of these are only required in very special circumstances and new users should not attempt to understand all of them You do need to under stand that all general qualifiers are specified here Many of these qualifiers are referenced in other chapters where their purpose will be more evident 5 Command file Reading the data 64 New Table 5
238. imates of fixed effects and predictions of random effects for the particular set of variance parameters supplied as initial values Otherwise the estimates and predictions will be for the updated variance parameters see the BLUP qualifier below 5 Command file Reading the data 66 New List of commonly used job control qualifiers action qualifier SUM Xv lY vu IG vu JOIN If MAXIT 1 is used and an Unstructured Variance model is fit ted ASReml will perform a Score test of the US matrix Thus assume the variance structure is modelled with reduced pa rameters if that modelled structure is then processed as the initial values of a US structure ASReml tests the adequacy of the reduced parameterization causes ASReml to report a general description of the distribu tion of the data variables and factors and simple correlations among the variables for those records included in the anal ysis This summary will ignore data records for which the variable being analysed is missing unless a multivariate anal ysis is requested or missing values are being estimated The information is written to the ass file is used to plot the transformed data Use X to specify the x variable Y to specify the y variable and G to specify a grouping variable JOIN joins the points when the x value increases between consecutive records The grouping variable may be omitted for a simple scatter plot Omit Y y produce a his
239. imilarly at f i X at f j X at f k X can be written as at f i j k X provided at f i j k is written as the first component of the interaction Any number of levels may be listed forms cosine from v with period r Omit r if v is radians If v is degrees r is 360 apply sum to zero constraints to factor f It is not appropriate for random factors and fixed factors with missing cells ASReml assumes you specify the correct number of levels for each factor The formal effect of the con function is to form a model term with the highest level formally equal to minus the sum of the preceding terms With sum to zero constraints a missing treatment level will generate a singularity but in the first coefficient rather than in the coefficient corre sponding to the missing treatment In this case the coefficients will not be readily interpretable When interacting constrained factors all cells in the cross tabulation should have data fac v forms a factor with a level for each value of x and any addi tional points inserted as discussed with the qualifiers PPOINTS and PVAL fac v y forms a factor with a level for each combination of values from vand y The values are reported in the res file associates the nth giv G inverse with the factor This is used when there is a known except for scale G structure other than the additive inverse genetic relationship matrix The G inverse is supplied in a file whose name has the file extension gi
240. in file Forming a job template Command line options Prompt for arguments Output control command line options Debug command line options Graphics command line options Job control command line options Workspace command line options Menu command line options Non graphics command line options Examples Advanced processing arguments Standard use of arguments Prompting for input Paths and Loops 176 12 Command file Running the job 177 12 1 Introduction The command line its options and arguments are discussed in this chapter Com mand line options enable more workspace to be accessed to run the job control some graphics output and control advanced processing options Command line arguments are substituted into the job at run time As Windows likes to hide the command line most command line options can be set on an optional initial line of the as file we call the top job control line to distinguish it from the other job control lines discussed in Chapter 6 If the first line of the as file contains a qualifier other than DOPATH it is interpreted as setting command line options and the Title is taken as the next line 12 2 The command line Normal run The basic command to run ASReml is path ASRem1 basename as c e path provides the path to the ASReml program usually called asreml exe in a PC environment In a UNIX environment ASReml is usually run through a shell script called ASRem1 if the ASReml progra
241. in linear mixed models introduces some difficulties In general the methods used to construct F tests in analysis of variance and regression cannot be used for the diversity of applications of the general linear mixed model available in ASReml One approach would be to use likelihood ratio methods see Welham and Thompson 1997 although their approach is not easily implemented Wald type test procedures are generally favoured for conducting tests concerning T The traditional Wald statistic to test the hypothesis Hp Lr l for given 2 Some theory 20 L r x p and l r x 1 is given by W L 1 L X H 1X 12 HL l 2 17 and asymptotically this statistic has a chi square distribution on r degrees of freedom These are marginal tests so that there is an adjustment for all other terms in the fixed part of the model It is also anti conservative if p values are constructed because it assumes the variance parameters are known The small sample behaviour of such statistics has been considered by Kenward and Roger 1997 in some detail They presented a scaled Wald statistic to gether with an F approximation to its sampling distribution which they showed performed well in a range though limited in terms of the range of variance models available in ASReml of settings In the following we describe the facilities now available in ASReml for conducting inference concerning terms which are the in dense fixed effects model component
242. in89 as results in the nin89 pvs file displayed below some output omitted containing the 56 predicted variety means also in the order in which they first appear in the data file column 2 together with standard errors column 3 An average standard error of difference among the predicted variety means is displayed immediately after the list of predicted values As in the asr file date time and trial information are given the title line The Ecode for each prediction column 4 is usually E indicating the prediction is of an estimable function Predictions of non estimable functions are usually not printed see Chapter 10 3 A guided tour 40 title line nin alliance trial 11 Jul 2005 13 55 21 nin893 Ecode is E for Estimable for Not Estimable Predicted values of yield repl is ignored in the prediction except where specifically included variety Predicted_Value Standard_Error Ecode predicted variety LANCER 28 5625 3 8557 E effects BRULE 26 0750 3 8557 E REDLAND 30 5000 3 8557 E CODY 21 2125 3 8557 E ARAPAHOE 29 4375 3 8557 E NE83404 27 3875 3 8557 E NE83406 24 2750 3 8557 E NE83407 22 6875 3 8557 E CENTURA 21 6500 3 8557 E SCOUT66 27 5250 3 8557 E COLT 27 0000 3 8557 E NE87613 29 4000 3 8557 E NE87615 25 6875 3 89557 E NE87619 31 2625 3 8557 E NE87627 23 2250 3 9557 E SED Overall Standard Error of Difference 4 979 Data file preparation Introduction The data file Free format Fixed format Preparing data f
243. ine giving the pedigree of an individual appears before any line where that individual appears as a parent e is read free format it may be the same file as the data file if the data file is free format and has the necessary identities in the first three fields see below e is specified on the line immediately preceding the data file line in the command file e use identity O or for unknown parents harvey ped harvey dat 101 SIRE_1 0 101 SIRE_1 O 1 3 192 390 2241 102 SIRE_1 0 102 SIRE_1 0 1 3 154 403 2651 103 SIRE_1 0 103 SIRE_1 0 1 4 185 432 2411 104 SIRE_1 0 104 SIRE_1 0 1 4 183 457 2251 105 SIRE_1 0 105 SIRE 1 O 1 5 186 483 2581 106 SIRE_1 O 106 SIRE_1 0 1 5 177 469 2671 107 SIRE_1 0 107 SIRE_1 0 1 5 177 428 2711 108 SIRE_1 0 108 SIRE_1 0 1 5 163 439 2471 109 SIRE_2 0 109 SIRE_2 0 1 4 188 439 2292 110 SIRE_2 0 110 SIRE_2 0 1 4 178 407 2262 111 SIRE2 0 111 SIRE_2 0 1 5 198 498 1972 112 SIRE_2 0 112 SIRE_2 0 1 5 193 459 2142 113 SIRE_2 0 113 SIRE_2 0 1 5 186 459 2442 114 SIRE_2 0 114 SIRE 2 0 1 5 175 375 2522 115 SIRE_2 0 115 SIRE 2 0 1 5 171 382 1722 116 SIRE_2 0 116 SIRE 2 0 1 5 168 417 2752 117 SIRE_3 0 117 SIRE_3 0 1 3 154 389 2383 118 SIRE_3 0 118 SIRE 3 0 1 4 184 414 2463 119 SIRE_3 0 119 SIRE 3 0 1 5 174 483 2293 120 SIRE_3 0 120 SIRE 3 O 1 5 170 430 2303 9 Command file Genetic analysis 151 9 4 Reading in the pedigree file The syntax for specifying a pedigree file in the ASReml command file is pedi
244. ined may not in general be positive definite Care should be taken when using this option for incomplete multivariate data The command to run PATH 1 is asreml nrw64 mt 1 The Loglikelihood from this run is 20000 1444 93 When the job runs the message Non positive definite G matrix 0 singularities 1 negative pivots order 3 appears to the screen This refers to the 3 x 3 dam matrix which is estimated as 15 Examples 298 Covariance Variance Correlation Matrix CORRelation 2 573 3 024 0 1526 1 0 3 3 0 20 25 0 6568 82 0 7830 86 0 2098E 01 Note the correlation between wwt and ywt is estimated at 1 025 The results from this analysis can be automatically used by ASReml for the next part if the rsv is copied prior to running the next part That is we add the PATH 2 coding to the job copy mt1 rsv to mt2 rsv so that when we run PATH 2 it starts from where PATH 1 finished and run the job using asreml cnrw64 mt 2 The Loglikelihood from this run is 20000 1427 37 Finally we use the PATH 3 coding to obtain the final analysis copy mt2 rsv to mt3 rsv and run the final stage starting from the stage 2 results Note that we are using the automatic updating associated with CONTINUE A portion of the final output file is Notice LogL values are reported relative to a base of NOTICE 76 1 LogL 1427 oono APUUN oS 11 Source at Trait 1 at Trait 2 at Trait
245. ing constants are significant P lt 0 05 Lastly we add the covariance parameter between the intercept and slope for each tree in model 6 This ensures that the covariance model will be translation invariant A portion of the output file for model 6 is 8 LogL 87 4291 S2 Source Model spl age 7 5 spl age 7 Tree 25 Variance 35 Tree UnStru Tree UnStru Tree UnStru Covariance Variance Correlation Matrix UnStructured 31 65 0 5032 0 6993E 01 0 6102E 03 Analysis of Variance 7 mu 3 age 5 Season 5 6303 32 df terms Gamma Component Comp SE 5 2 17239 12 2311 1 09 25 1 38565 7 80160 1 47 32 1 00000 5 63028 1 72 1 1 5 62219 31 6545 1 26 2 1 0 124202E 01 0 699290E 01 0 85 2 2 0 108377E 03 0 610192E 03 1 40 NumDF DenDF F_inc 1 4 0 169 87 1 4 0 92 78 1 8 9 108 60 oO oo oO 6 Chey ey he tg Prob lt 001 lt 001 lt 001 15 Examples 291 200 600 1000 1400 ji i i I I 5 Marginal 200 4 150 100 50 4 _ 50 200 4 150 100 50 4 Trunk circumference mm i i i N e S l N T i T T n ro a e Oo Oo Oo 200 600 1000 1400 Time since December 31 1968 Days Figure 15 15 Trellis plot of trunk circumference for each tree at sample dates ad justed for season effects with fitted profiles across time and confidence intervals Figure 15 15 presents the predicted growth over time for individual trees and a margina
246. ing measurements 15 Examples 287 circumference 80 60 40 4 20 T T T T T T T T 200 400 600 800 1000 1200 1400 1600 age Figure 15 13 Fitted cubic smoothing spline for tree 1 We now consider the analysis of the full dataset Following Verbyla et al 1999 we consider the analysis of variance decomposition see Table 15 11 which models the overall and individual curves An overall spline is fitted as well as tree deviation splines We note however that the intercept and slope for the tree deviation splines are assumed to be random effects This is consistent with Verbyla et al 1999 In this sense the tree deviation splines play a role in modelling the conditional curves for each tree and variance modelling The intercept and slope for each tree are included as random coefficients denoted by RC in Table 15 11 Thus if U is the matrix of intercepts column 1 and slopes column 2 for each tree then we assume that var vec U X I where is a 2 x 2 symmetric positive definite matrix Non smooth variation can be modelled at the overall mean across trees level and this is achieved in ASReml by inclusion of fac age as a random term 15 Examples 288 Table 15 11 Orange data AOV decomposition stratum decomposition type df or ne constant 1 F 1 age age F 1 spl age 7 R 5 fac age R T tree tree RC 5 age tree x tree RC 5 spl age 7 tree R 25 error R
247. ing predictors of random effects fitted as fac xsca ysca It shows low semivariance in xsca direction high semivari ance in the ysca direction with intermediate values in the 45 and 135 degrees directions sets hardcopy graphics file type to wmf controls the form of the yht file YHTFORM 1 suppresses formation of the yht file YHTFORM 1 is TAB separated yht becomes _yht txt YHTFORM 2 is COMMA separated yht becomes _yht csv YHTFORM 3 is Ampersand separated yht becomes _yht tex adds r to the total Sum of Squares This might be used with DF to add some variance to the analysis when analysing summarised data 5 Command file Reading the data 79 this is a test of matern Variogram of fac xsca ysca predictors 21 6 n e i i m i E v i i 135 a H 0 has x i i A a j y i Z 3 n end Zo 45 c i e eet o0 Distance 2 80 Figure 5 1 Variogram in 4 sectors for Cashmore data Table 5 6 List of very rarely used job control qualifiers qualifier action ATLOADINGS 2 New ICINV n controls modification to AI updates of loadings in factor an alytic variance models After ASReml calculates updates for variance parameters it checks whether the updates are rea sonable and sometimes reduces them For factor loadings the default behaviour is to shrink the loadings only in the first iteration if they appear large This qualifier gives some user control If it is specified
248. ings can be used interchangably DOPATH allows several analyses to be coded and run sequen tially without having to edit the as file between runs Which particular lines in the as file are honoured is controlled by the argument n of the DOPATH qualifier in conjunction with PATH or PART statements The argument n is often given as 1 indicating that the actual path to use is specified as the first argument on the command line see Section 12 4 See Sections 15 7 and 15 10 for examples The default value of n is 1 DOPATH n can be located anywhere in the job but if placed on the top job control line it cannot have the form DOPATH 1 unless the arguments are on the command line as the DOPATH New qualifier will be parsed before any job arguments on the same line are parsed PATH pathlist The PATH or PART control statement may list multiple path numbers so that the following lines are honoured if any one of the listed path numbers is active The PATH qualifier must appear at the beginning of its own line after the DOPATH qualifier A sequence of path numbers can be written using a b notation For example mydata asd DOPATH 4 PATH 2 4 6 10 One situation where this might be useful is where it is neces sary to run simpler models to get reasonable starting values for more complex variance models The more complex mod els are specified in later parts and the CONTINUE command is used to pick up the previous estimates
249. ion action pow x pL o qtl f r sin v r spl v k s v k sqrt v 7r Trait units uni f 0 n defines the covariable x 0 for use in the model where z is a variable in the data p is a power and o is an offset pow z 0 5 0 is equivalent to sqr x 0 pow x 0 o is equivalent to log x o pow x 1 0 is equivalent to inv x o calculates an expected marker state from flanking marker information at position r of the linkage group f see MM to define marker locations r may be specified as TPn where TPn has been previously internally defined with a predict statement see page 164 r should be given in Morgans forms sine from v with period r Omit r if v is radians If v is degrees r is 360 In order to fit spline models associated with a variate v and k knot points in ASReml v is included as a covariate in the model and spl v k as a random term The knot points can be explicitly specified using the SPLINE qualifier Table 5 4 If kis specified but SPLINE is not specified equally spaced points are used If kis not specified and there are less than 50 unique data values they are used as knot points If there are more than 50 unique points then 50 equally spaced points will be used The spline design matrix formed is written to the res file An example of the use of spl is price mu week r spl week forms the square root of v r This may also be used to transform the
250. ion The default value of is 1 New Caution General qualifiers AOD IDISP A requests an Analysis of Deviance table be generated This is formed by fitting a series of sub models for terms in the DENSE part building up to the full model and comparing the deviances An example if its use is LS BIN TOT COUNT AOD mu SEX GROUP AOD may not be used in association with PREDICT includes an overdispersion scaling parameter h in the weights If DISP is specified with no argument ASReml estimates it as the residual variance of the working variable Traditionally it is estimated from the deviance residuals reported by ASReml as Variance heterogeneity An example if its use is count POIS DISP mu group 6 Command file Specifying the terms in the mixed model 99 GLM qualifiers qualifier action OFFSET o ITOTAL n is used especially with binomial data to include an offset in the model where o is the number or name of a variable in the data The offset is only included in binomial and Poisson models for Normal models just subtract the offset variable from the response variable for example count POIS OFFSET base DISP mu group The offset is included in the model as n X7 0 The offset will often be something like In n is used especially with binomial data where n is the field containing the total counts for each sample If omitted count is taken as 1 Residual qualifi
251. ion for the variance matrix in this case is I amp where X is an unstructured variance matrix Specifying multivariate variance structures in ASReml A more sophisticated error structure is re quired for multivariate analysis For a stan dard multivariate analysis the error structure for the residual must be specified as two dimensional with indepen dent records and an unstructured variance matrix across traits records may have ob servations missing in different patterns and these are handled internally during analy sis the R structure must be ordered traits within units that is the R structure defini tion line for units must be specified before the line for Trait variance parameters are variances not vari ance ratios the R structure definition line for units that is 1485 0 ID could be replaced by Oor O O ID this tells ASReml to fill in the Orange Wether Trial 1984 8 SheepID I TRIAL BloodLine I TEAM YEAR GFW YLD FDIAM wether dat skip 1 GFW FDIAM Trait Trait YEAR lr Trait TEAM Trait SheepID 122 1485 0 ID Trait 0 US 3 0 Trait TEAM 2 Trait 0 US 3 0 TEAM O ID Trait SheepID 2 Trait 0 US 3 0 SheepID 0 ID predict YEAR Trait number of units and is a useful option when the exact number of units in the data is not known to the user the error variance matrix is specified by the model Trait 0 US the initial values are for the lower triangle o
252. ion on the command line We have produced the following plots by use of the g22 option Table 15 6 Field layout of Slate Hall Farm experiment Column Replicate levels Row 1 2 3 4 5 6 7 8 9 10 Il 12 13 14 15 1 I 1 1 1 1 2 2 2 2 2 3 3 3 3 3 2 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 5 1 1 1 1 I 2 2 2 2 2 3 3 3 3 3 6 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 8 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 9 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 10 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 Column Rowblk levels Row 1 2 3 4 5 6 7 8 9 10 11 12 138 14 15 1 1 1 1 1 1 11 11 TI W i 21 21 21 21 21 2 2 2 2 2 2 12 12 12 12 12 22 2 22 22 22 3 3 3 3 3 3 138 13 13 13 13 23 23 23 23 23 4 4 4 4 4 4 14 14 14 14 14 24 24 24 24 424 5 5 5 5 5 5 15 15 15 15 15 25 25 25 25 25 6 6 6 6 6 6 16 16 16 16 16 26 26 26 26 26 T 7 7 T 7 7 17 17 17 I7 17 27 27 27 27 27 8 8 8 8 8 8 18 18 18 18 18 28 28 28 28 28 9 9 9 9 9 9 19 19 19 19 19 29 29 29 29 29 10 10 10 10 10 10 20 20 20 20 20 30 30 30 30 30 Column Colblk levels Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 T 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 1 2 3 4 5 6 7 8 9 10 11 12 18 i I5 3 1 2 3 4 5 6 7 8 9 10 1 12 13 14 15 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5 i 2 3 4 5 6 7 8 9 10 11 12 13 14 15 6 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Fo 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 8 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 9 16 17 18 19 20 21 22 2
253. ions A structure for the residual variance for the spatial analysis of multi environment trials Cullis et al 1998 is given by Ry Rj Qj 2 aj p Each section represents a trial and this model accounts for between trial error variance heterogeneity a and possibly a different spatial variance model for each trial In the simplest case the matrix R could be known and proportional to an identity matrix Each component matrix R or R itself for one section is assumed to be the direct product see Searle 1982 of one two or three component matrices The component matrices are related to the underlying structure of the data If the structure is defined by factors for example replicates rows and columns then the matrix R can be constructed as a direct product of three matrices describing the nature of the correlation across replicates rows and columns These factors must completely describe the structure of the data which means that 1 the number of combined levels of the factors must equal the number of data points 2 each factor combination must uniquely specify a single data point These conditions are necessary to ensure the expression var e OR is valid The assumption that the overall variance structure can be constructed as a direct 2 Some theory 10 product of matrices corresponding to underlying factors is called the assumption of separability and assumes that any correlation process across levels of
254. ired range The voltage of 64 regulators was set at 10 setting stations setstat between 4 and 8 regulators were set at each station The regulators were each tested at four testing stations teststat The ASReml input file is presented below Voltage data teststat 4 4 testing stations tested each regulator setstat A 10 setting stations each set 4 8 regulators regulatr 8 regulators numbered within setting stations voltage voltage asd skip 1 voltage mu r setstat setstat regulatr teststat setstat teststat 000 The factor regulatr numbers the regulators within each setting station Thus the term setstat regulatr allows for differential effects of each regulator while the other terms examine the effects of the setting and testing stations and possible interaction The abbreviated output is given below LogL 188 604 S2 0 67074E 01 255 df LogL 199 530 S2 0 59303E 01 255 df 15 Examples 251 LogL 203 007 S2 0 52814E 01 205 df LogL 203 240 S2 0 51278E 01 255 df LogL 203 242 S2 0 51141E 01 255 df LogL 203 242 S2 0 51140E 01 255 df Source Model terms Gamma Component Comp SE C setstat 10 10 0 233418 0 119371E 01 1 35 Q P setstat regulatr 80 64 0 601817 O 2077 FIE OL 3 64 OP teststat 4 4 0 642752E 01 0 328706E 02 0 98 0 P setstat teststat 40 40 0 100000E 08 0 511404E 10 0 00 OB Variance 256 255 1 00000 0 511404E 01 9 72 0 P Warning Code B fixed at a boundary GP F fixed by user liable to change from P to B P
255. is fully formed for the terms in the dense set The inverse coefficient matrix is only partially formed for terms in the sparse set Typically the sparse set is large and sparse storage results in savings in memory and computing A consequence is that the variance matrix for estimates is only available for equations in the dense portion Ordering of terms in ASReml The order in which estimates for the fixed and random effects in linear mixed model are reported will usually differ from the order the model terms are specified Solutions to the mixed model equations are obtained using the methods outlined Gilmour et al 1995 ASReml orders the equations in the sparse part to maintain as much sparsity as it can during the solution After absorbing them it absorbs the model terms associated with the dense equations in the order specified Aliassing and singularities A singularity is reported in ASReml when the diagonal element of the mixed model equations is effectively zero see the TOLERANCE qualifier during absorption It indicates there is either e no data for that fixed effect or e a linear dependence in the design matrix means there is no information left to estimate the effect ASReml handles singularities by using a generalized inverse in which the singular row column is zero and the associated fixed effect is zero Which equations are singular depends on the order the equations are processed This is controlled by ASReml for the
256. is w 1 initial values for US CHOLA CHOLAC and ANTEK structures are given in the form of a US matrix which is specified lower triangle row wise viz 7 Command file Specifying variance structures 128 New On Oo O29 Ox Oy O 31 32 33 that is initial values are given in the order 1 0 1 2 0 3 Oy5 the US model is associated with several special features of ASReml When used in the R structure for multivariate data ASReml automatically recognises patterns of missing values in the responses see Chapter 8 Also there is an option to update its values by EM rather than Al when its Al updates make the matrix non positive definite The Mat rn class of isotropic covariance models is now described ASReml uses an extended Mat rn class which accomodates geometric anisotropy and a choice of metrics for random fields observed in two dimensions This extension described in detail in Haskard 2006 is given by where h hz hy is the spatial separation vector 5 a governs geometric anisotropy A specifies the choice of metric and v are the parameters of the Mat rn correlation function The function is an ere C8 puldid 2r 5 5 7 1 where gt 0 is a range parameter v gt 0 is a smoothness parameter T is the gamma function K is the modified Bessel function of the third kind of order v Abramowitz and Stegun 1965 section 9 6 and d is the distance defined in terms of X and Y
257. it 4 3 heritability tion formed by an F line is added to the list of components Thus the number of coefficients increases by one each line We seek to calculate k c v cov c v v and var c v where v is the vector of existing variance components c is the vector of coefficients for the linear function and k is an optional offset which is usually omitted but would be 1 to represent the 11 Functions of variance components 172 residual variance in a probit analysis and 3 289 to represent the residual variance in a logit analysis The general form of the directive is F labla b aq t tcect td mxk where a b c and d are subscripts to existing components vg Up Ue and vg and cp is a multiplier for vp m is a number bigger than the current length of v to flag the special case of adding the offset k Where matrices are to be combined the form F label a b k ccd can be used as in the Coopworth data example see page 300 Assuming that the pin file in the ASReml code box corresponds to a simple sire model and that variance component 1 is the sire variance and variance component 2 is the residual variance then F phenvar i 2 gives a third component which is the sum of the variance components that is the phenotypic variance and F genvar 1 4 gives a fourth component which is the sire variance component multiplied by 4 that is the genotypic variance Heritability Heritabilities are requested by lines in the
258. itional variation explained when the term is added to a model consisting of the I and C c terms Any c terms are ignored in calculating DenDF for F con using numerical derivatives for computational reasons The terms are ignored for both F inc and F con tests Consider now a nested model which might be represented symbolically by y 1 REGION REGION SITE For this model the incremental and conditional Wald tests will be the same However it is not uncommon for this model to be presented to ASReml as y 1 REGION SITE with SITE identified across REGION rather than within REGION Then the nested structure is hidden but ASReml will still detect the structure and produce a valid conditional Wald F statistic This situation will be flagged in the M code field by changing the letter to lower case Thus in the nested model the three M codes would be A and B because REGION SITE is obviously an interaction dependent on REGION In the second model REGION and SITE appear to be independent factors so the initial M codes are A and A However they are not independent because REGION removes additional degrees of freedom from SITE so the M codes are changed from A and A to a and A We strongly recommend if you are in any doubt about the maximal conditional model MCM for the conditional Wald F statistic that you consult the aov file which spells out the maximal conditional model for each term We also advise users
259. kspace requirements depend on problem size and may be quite large An initial workspace allocation may be requested on the command line with the S or W options if neither is specified 32Mbyte 4 million double precision words is allocated 12 Command file Running the job 184 Wm WORKSPACE m sets the initial size of the workspace in Mbytes For example W1600 requests 1600 Mbytes of workspace the maximum typically available under Windows W2000 is the maximum available on 32bit Unix Linux systems On 64bit systems the argument if less than 32 is taken as Gbyte Alternatively Ss can be used to set the initial workspace allocation s is a digit The workspace allocated is 2 x 8 Mbyte S3 is 64Mb S4 is 128Mb S5 is 256Mb S6 is 512Mb S7 is 1024Mb S8 is 2048Mb S9 is 4096Mb This option was in Release 1 0 the more flexible option Wm has been introduced in Release 2 0 The W option is ignored if the S option is also specified Otherwise additional workspace may be requested with the Ss or Wm options or the WORKSPACE m qualifier on the top job control line if not specified on the control line If your system cannot provide the requested workspace the request will be diminished until it can be satisfied On multi user systems do not unnecessarily request the maximum or other users may complain Having started with an initial allocation if ASReml realises more space is required as it is running it will attempt to restart the jo
260. l Correlation factors Variances Unstructured variance model Lower triangle row wise 0 003219 0 007424 0 01509 0 02532 0 05840 0 0002709 0 1807 0 06013 0 1387 PATH sire 0 0006433 0 005061 0 03487 Maternal structure covers the 3 model terms at Tr 1 dam 2 IPATH 1 3 0 CORGH GU T ak 2 2 4 14 0 018 IPATH 2 3 0 CORGH GU it att 2 2 4 14 0 018 PATH 3 3 0 US GU wal ott 2 2 4 14 0 018 PATH dam at Tr 1 dam at Tr 2 dam at Tr 3 dam Maternal effects Equivalent to Unstructured Litter structure covers the 4 model terms at Tr 1 lit at Tr 2 1lit ae Tr 3 1k St Te 4 it ae Tr 2 lit 2 PATH 1 4 0 DIAG Litter effects Diagonal structure 15 Examples 297 3 74 0 97 0 019 0 941 PATH 2 4 0 FA1 GP Factor Analytic 1 5 2S 0d 2 4 95 4 63 0 037 0 941 PATH 3 4 0 US Unstructured B OTs 3 545 3 914 0 1274 0 08909 0 02865 0 07277 0 05090 0 001829 1 019 PATH lit Table 15 15 Variance models fitted for each part of the ASReml job in the analysis of the genetic example term matrix PATH 1 PATH 2 PATH 3 sire amp DIAG FA1 US dam Yq CORGH CORGH US litter X DIAG FA1 US error Xe US US US In PATH 1 the error variance model is taken to be unstructured but the starting values are set to zero This instructs ASReml to obtain starting values from the sample covariance matrix of the data For incomplete data the matrix so obta
261. l functions of the variance components e an analysis of variance ANOVA table Section 6 12 The table contains the numerator degrees of freedom for the terms and incremental F statistics for approximate testing of effects It may also contain denominator degrees of freedon a conditional F statistic and a significance probability e estimated effects their standard errors and t values for equations in the DENSE portion of the SSP matrix are reported if BRIEF 1 is invoked the T prev column tests difference between successive coefficients in the same factor The following is a copy of nin89a asr ASReml 1 630 01 Jun 2005 Build j 01 Jul 2005 128 14 Jul 2005 12 41 18 360 Licensed to Arthur Gilmour SO A A A AR a a a a ak SYNTAX change A B now means A A B Contact support asreml co uk for licensing and support 32 00 Mbyte Windows NIN alliance trial 1989 nin89a FOO ara ARG Folder C data asr UG2 manex variety A QUALIFIERS SKIP 1 IDISPLAY 15 Reading nin89aug asd FREE FORMAT skipping Univariate analysis of yield Using 242 records Model term variety id pid raw repl nloc yield lat long row column ODANODOAKRWN KH PR e O of 242 read Size miss zero 56 0 0 0 0 18 0 18 0 4 0 0 0 0 Variate 18 0 0 0 0 0 22 0 0 11 0 Q 1 lines MinNonO 1 1 000 1101 21 00 1 4 000 1 050 4 300 1 200 1 1 Mean 26 4545 26 45 2628
262. l prediction for trees with approximate confidence intervals 2 x stan dard error of prediction Within this figure the data is adjusted to remove the estimated seasonal effect The conclusions from this analysis are quite differ ent from those obtained by the nonlinear mixed effects analysis The individual curves for each tree are not convincingly modelled by a logistic function Fig ure 15 16 presents a plot of the residuals from the nonlinear model fitted on p340 of Pinheiro and Bates 2000 The distinct pattern in the residuals which is the same for all trees is taken up in our analysis by the season term 15 Examples 292 00 o 000 00 o0 Residual 00 o T T T T T T T T 200 400 600 800 1000 1200 1400 1600 age Figure 15 16 Plot of the residuals from the nonlinear model of Pinheiro and Bates 15 10 Multivariate animal genetics data Sheep The analysis of incomplete or unbalanced multivariate data often presents com putational difficulties These difficulties are exacerbated by either the number of random effects in the linear mixed model the number of traits the complexity of the variance models being fitted to the random effects or the size of the problem In this section we illustrate two approaches to the analysis of a complex set of incomplete multivariate data Much of the difficulty in conducting such analyses in ASReml centres on obtaining good starting values Derivative based algorithms su
263. ld associated with the label unless the focus has been reset by specifying a new target in a preceding transformation the next three forms change the focus for subsequent transformations to the field target 5 Command file Reading the data 53 e in the last two forms a value is assigned to the target field For example V22 V1il copies existing field 11 into field 22 Such a statement would typically be followed by more transformations If there are fewer than 22 variables labelled then V22 is used in the transformation stage but not kept for analysis Table 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples l I l Ix v usual arithmetic meaning note that yield 10 0 0 gives 0 but v 0 gives a missing half 0 5 value where v is not 0 zero 0 1 v raises the data which must be positive yield to the power v SQRyld yield 70 65 1 0 takes natural logarithms of the data yield which must be positive LNyield yield 70 1 1 takes reciprocal of data data must be yield positive INVyield yield 1 I gt I lt gt v logical operators forming 1 if true O if yield l lt false high yield gt 10 gt ABS takes absolute values no argument re yield quired ABSyield yield ABS ARCSIN v forms an ArcSin transformation using Germ Total the sample size specified in the argu ASG Germ ARCSIN Total ment a number or another
264. le 7 4 additional_initial_values are read from the following lines if there are not enough initial values on the model line Each variance model has a certain number of parameters If insufficient non zero values are found on the model line ASReml expects to find them on the following line s initial values of 0 0 will be ignored if they are on the model line but are accepted on subsequent lines the notation n v for example 5 0 1 is permitted on subsequent lines but not the model line when there are n repeats of a particular initial value U only in a few specified cases is O permitted as an initial value of a non zero parameter 7 Command file Specifying variance structures 120 G structure header and definition lines There are g sets of G structure definition lines and each set is of the form model_term d order key model initial_values qualifier additional_initial_values order key model initial_values qualifier NIN Alliance Trial 1989 variety A id additional_initial_values k ae 22 order key model initial_values qualifier column 11 additional_initial_values nin89aug asd skip 1 yield mu variety r repl If mv model_term is the term from the linear 9 4 model to which the variance structure ap 22 row AR1 0 3 plies the variance structure may cover ad 11 column ARI 0 3 repl 1 ditional terms in the linear model see Sec repl 0 IDV 0 1 tion 7 8
265. le harvey as 9 2 The command file In ASReml the P data field qualifier indicates Pedigree file example that the corresponding data field has an asso animal P i I ciated pedigree The file containing the pedi ep ji gree harvey ped in the example for animal iisas 3 is specified after all field definitions and before damage the datafile definition See below for the first adailygain harvey ped ALPHA harvey dat responding lines of the data file harvey dat adailygain mu lines r All individuals appearing in the data file must animal 0 25 appear in the pedigree file When all the pedi 20 lines of harvey ped together with the cor gree information individual male parent female parent appears as the first three fields of the data file the data file can double as the pedigree file In this example the line harvey ped ALPHA could be replaced with harvey dat ALPHA Typically additional individuals providing additional genetic links are present in the pedigree file 9 Command file Genetic analysis 150 9 3 The pedigree file The pedigree file is used to define the genetic relationships for fitting a genetic animal model and is required if the P qualifier is associated with a data field The pedigree file e has three fields the identities of an individual its sire and its dam or maternal grand sire if the MGS qualifier Table 9 1 is specified in that order e is sorted so that the l
266. le such as ASReml W and ConText described in section 1 3 3 2 Nebraska Intrastate Nursery NIN field experiment The yield data from an advanced Nebraska Intrastate Nursery NIN breeding trial conducted at Alliance in 1988 89 will be used for demonstration see Stroup et al 1994 for details Four replicates of 19 released cultivars 35 experimen tal wheat lines and 2 additional triticale lines were laid out in a 22 row by 11 3 A guided tour 28 column rectangular array of plots the varieties were allocated to the plots using a randomised complete block RCB design In field trials complete replicates are typically allocated to consecutive groups of whole columns or rows In this trial the replicates were not allocated to groups of whole columns but rather overlapped columns Table 3 1 gives the allocation of varieties to plots in field plan order with replicates 1 and 3 in ITALICS and replicates 2 and 4 in BOLD 3 3 The ASReml data file See Chapter 4 The standard format of an ASReml data file is to have the data arranged in for details space TAB or comma separated columns fields with a line for each sampling unit The columns contain covariates factors response variates traits and weight variables in any convenient order This is the first 30 lines of the file nin89 asd containing the data for the NIN variety trial The data are in field order rows within columns and an optional heading first line of the file has been
267. levels of nitrogen application 0 0 2 0 4 and 0 6 cwt acre The field layout consisted of six blocks labelled I II II IV V and VI with three whole plots per block each split into four sub plots The three varieties were randomly allocated to the three whole plots while the four levels of nitrogen application were randomly assigned to the four sub plots within each whole plot The data is presented in Table 15 1 Table 15 1 A split plot field trial of oat varieties and nitrogen application nitrogen block variety 0 0cwt O 2cwt O0 4cwt 0 6cwt Victory 111 130 157 174 l GoldenRain 117 114 161 141 Marvellous 105 140 118 156 Victory 61 91 97 100 ll GoldenRain 70 108 126 149 Marvelous 96 124 121 144 Victory 68 64 112 86 ll GoldenRain 60 102 89 96 Marvellous 89 129 132 124 Victory 74 89 81 122 IV GoldenRain 64 103 132 133 Marvelous 70 89 104 117 Victory 62 90 100 116 vV GoldenRain 80 82 94 126 Marvellous 63 70 109 99 Victory 53 74 118 113 VI GoldenRain 89 82 86 104 Marvellous 97 99 119 121 A standard analysis of these data recognises the two basic elements inherent in the experiment These two aspects are firstly the stratification of the experiment units that is the blocks whole plots and sub plots and secondly the treatment 15 Examples 243 structure that is superimposed on the experimental material The latter is of prime interest in the presence of stratification Thus the aim
268. lfinger and O Connell 1993 and joint maximisation Harville and Mee 1984 Gilmour et al 1985 It is implemented in many statistical packages for instance in the GLMM procedure Welham 2005 and the IRREML procedure of Genstat Keen 1994 in MLwiN Goldstein et al 1998 in the GLMMIXED macro in SAS and in the GLMMPQL function in R to name a few The PQL technique is based on a first order Taylor series approximation to the likelihood It has been shown to perform poorly for certain types of GLMMs In particular for binary GLMMs where the number of random effects is large compared to the number of observations it can underestimate the variance com ponents severely 50 e g Breslow and Lin 1995 Goldstein and Rasbash 1996 Rodriguez and Goldman 2001 Waddington et al 1994 For other types of GLMMs such as Poisson data with many observations per random effect it has been reported to perform quite well e g Breslow 2003 As well as the above references users can consult McCulloch and Searle 2001 for more information about GLMMs Most studies investigating PQL have focussed on estimation bias Much less attention has been given to the wider inferential issues such as hypothesis testing In addition the performance of this technique has only been assessed on a small set of relatively simple GLMMs Anecdotal evidence from users suggests that this technique can give very misleading results in certain situations Therefore
269. lihood changes between iter ations were less than 0 002 iteration number and variance parameter values appear stable A full iteration has not been completed See discussion of BLUP See discussion of ABORTASR NOW the change in REML log likelihood was small and convergence was assumed but the param eters are in fact still changing the maximum number of iterations was reached before the REML log likelihood con verged Examine the sequence of estimates in the res file You may need more iterations in which case restart with the CONTINUE com mand line option see Section 12 3 on job con trol Otherwise restart with more appropri ate initial variances It may be necessary to simplify the model and estimate the dominant components before estimating other terms Parameter values are not at the REML solu tion Parameters appear to be at the REML solu tion in that the parameter values are stable 14 Error messages 229 Table 14 2 List of warning messages and likely meaning s warning message likely meaning Warning e missing values generated by transformation Warning 7 singularities in AI matrix Warning m variance structures were modified Warning n missing values were detected in the design Warning n negative weights Warning r records were read from multiple lines WARNING term has more levels than expected Warning term in the predict IGNORE list
270. lines e the arguments to qualifiers are represented by the following symbols f a filename n an integer number typically a count p a vector of real numbers typically in increasing order r areal number s a character string t a model term label v the number or label of a data variable vlist a list of variable labels 5 Command file Reading the data 60 5 7 Data file qualifiers Table 5 2 lists the qualifiers relating to data input Use the Index to check for examples or further discussion of these qualifiers Table 5 2 Qualifiers relating to data input and output qualifier action Frequently used data file qualifier SKIP n Other data file qualifiers CSV FILTER v FORMAT s causes the first n records of the non binary data file to be ignored Typically these lines contain column headings for the data fields used to make consecutive commas imply a missing value this is automatically set if the file name ends with csv or CSV see Section 4 2 enables a subset of the data to be analysed v is the number or name of a data field When reading data the value in field v is checked after any transformations are performed If select is omitted records with zero in field v are omitted from the analysis Otherwise records with n in field v are retained and all other records are omitted Warning If the filter column contains a missing value the
271. ly included variety Predicted_Value Standard_Error Ecode Marvellous 1099 7917 7 7975 E Victory 97 6250 7T 1876 E Golden_rain 104 5000 7 T975 E SED Overall Standard Error of Difference 7 079 cS SaaS GSSs Sass Sass asn a GaSe 3 Besa Skike mesa SEn a a Predicted values of yield blocks is ignored in the prediction except where specifically included wplots is ignored in the prediction except where specifically included nitrogen variety Predicted_Value Standard_Error Ecode 0 6_cwt Marvellous 126 8333 9 1070 E 0 6_cwt Victory 118 5000 9 1070 E 0 6_ewt Golden_rain 124 8333 9 1070 E 0 4_cwt Marvellous 117 1667 9 1070 E 0 4_cwt Victory 110 8333 9 1070 E 0 4_cwt Golden_rain 114 6667 9 1070 E 15 Examples 246 0 2_cwt Marvellous 108 5000 9 1070 E 0 2_cwt Victory 89 6667 9 1070 E 0 2_cwt Golden_rain 98 5000 9 1070 E O_cwt Marvellous 86 6667 9 1070 E O_cwt Victory 71 5000 9 1070 E O_cwt Golden_rain 80 0000 9 1070 E Predicted values with SED PV 126 833 118 500 9 71503 124 833 9 71503 9 71503 117 167 7 68295 9 71503 9 71503 110 833 9 71503 7 68295 9 71503 9 71503 114 667 9 71503 9 71503 7 68295 9 71503 9 71503 108 500 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 89 6667 9 71503 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 98 5000 9 71503 9 71503 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 86 6667 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 7 68295 9 71503 9 71503 71 5000 9 71503 7 68295 9 71503 9 71503 7 6829
272. m a file which con tains trailing non data records for example extracting the predicted values from a pvs file The argument n speci fies the number of data records to be read If not supplied ASReml reads until a data reading error occurs and then pro cesses the data it has Without this qualifier ASReml aborts the job when it encounters a data error See RSKIP allows ASReml to skip lines at the heading of a file down to and including the nth instance of string s For example to read back the third set predicted values in a pvs file you would specify IRREC RSKIP 4 Ecode since the line containing the 4th instance of Ecode imme diately precedes the predicted values The RREC qualifier means that ASReml will read until the end of the predict ta ble The keyword Ecode which occurs once at the beginning and then immediately before each block of data in the pvs file is used to count the sections 5 Command file Reading the data 63 New Combining rows from separate files ASReml can read data from multiple files provided the files have the same layout The file specified as the primary data file in the command file can contain lines of the form INCLUDE lt filename gt SKIP n where lt filename gt is the path name of the data subfile and SKIP n is an optional qualifier indicating that the first n lines of the subfile are to be skipped After reading each subfile input reverts to the pr
273. m is in the search path then path is not required and the word ASRem1 will suffice for example ASReml nin89 as will run the NIN analysis if asreml exe ASRem1 is not in the search path then path is required for example if asreml exe is in the usual place then c Program Files ASRem12 bin Asreml nin89 as will run nin89 as e ASRem1 invokes the ASReml program e basename is the name of the as c command file The basic command line can be extended with options and arguments to path ASReml options basename as c arguments 12 Command file Running the job 178 e options is a string preceded by a minus sign Its components control several operations batch graphic workspace at run time for example the command line ASReml w128 rat as tells ASReml to run the job rat as with workspace allocation of 128mb e arguments provide a mechanism mostly for advanced users to modify a job at run time for example the command line ASReml rat as alpha beta tells ASReml to process the job in rat as as if it read alpha wherever 1 ap pears in the file rat as beta wherever 2 appears and 0 wherever 3 appears see below Processing a pin file If the filename argument is a pin file see Chapter 11 then ASReml processes it If the pinfile basename differs from the basename of the output files it is processing then the basename of the output files must be specified with the P option letter Thus
274. manex 14 Error messages 216 variety A QUALIFIERS SKIP 1 DISPLAY 15 QUALIFIER DOPART 1 is active Reading nin89aug asd FREE FORMAT skipping 1 lines Univariate analysis of yield records read Using 242 records of 242 read data summary Model term Size miss zero MinNon0O Mean MaxNon0O 1 variety 56 0 0 1 26 4545 56 2 id 0 O 1 000 26 45 56 00 3 pid 18 Oo 1101 2628 4156 4 raw 18 O 21 00 510 5 840 0 5 repl 4 0 1 2 4132 4 6 nloc 0 O 4 000 4 000 4 000 7 yield Variate 18 O 1 050 25 53 42 00 8 lat 0 O 4 300 25 80 47 30 9 long 0 O 1 200 13 80 26 40 10 row 22 0 0 1 11 5000 22 11 column 11 0 0 1 6 0000 11 12 mu 1 13 mv_estimates 18 22 AR AutoReg 0 5000 11 AR AutoReg 0 5000 Forming 75 equations 57 dense Initial updates will be shrunk by factor 0 316 NOTICE 1 singularities detected in design matrix 1 LogL 401 827 S2 42 467 168 df 1 000 0 5000 0 5000 2 LogL 400 780 S2 43 301 168 df 1 000 0 5388 0 4876 3 LogL 399 807 S2 45 066 168 df 1 000 0 5895 0 4698 4 LogL 399 353 S2 47 745 168 df 1 000 0 6395 0 4489 5 LogL 399 326 S2 48 466 168 df 1 000 0 6514 0 4409 6 LogL 399 324 S2 48 649 168 df 1 000 0 6544 0 4384 7 LogL 399 324 S2 48 696 168 df 1 000 0 6552 0 4377 check 8 LogL 399 324 S2 48 708 168 df 1 000 0 6554 0 4375 convergence Final parameter values 1 0000 0 65550 0 43748 parameter Source Model terms Gamma Component Comp SE C estimates Variance 242 168 1 00000 48 7085 6 81 0P Residual AR AutoR 22 0 655505 0 655505 11
275. may be appropriate when the dominant spatial processes are aligned with rows columns as occurs in field experiments Geometric anisotropy is discussed in most geostatistical books Webster and Oliver 2001 Diggle et al 2003 but rarely are the anisotropy angle or ratio estimated from the data Similarly the smoothness parameter v is often set a priori Kammann and Wand 2003 Dig gle et al 2003 However Stein 1999 and Haskard 2006 demonstrate that v can be reliably estimated even for modest sized data sets subject to caveats regarding the sampling design The syntax for the Mat rn class in ASReml is given by MATk where k is the number of parameters to be specified the remaining parameters take their default values Use the G qualifier to control whether a specified parameter is estimated or fixed The order of the parameters in ASReml with their defaults is v 0 5 6 1 a 0 A 2 For example if we wish to fit a Mat rn model with only estimated and the other parameters set at their defaults then we use MAT1 MAT2 allows v to be estimated or fixed at some other value for example MAT2 2 1 GPF The parameters and v are highly correlated so it may be better to manually cover a grid of v values We note that there is non uniqueness in the anisotropy parameters of this T metric d since inverting and adding 5 to a gives the same distance This non uniqueness can be removed by considering 0 lt a lt 5 an
276. minator df in unbalanced designs such as the rat data set described in the next section 15 Examples 245 Tables of predicted means are presented for the nitrogen variety and variety by nitrogen tables in the pvs file The qualifier SED has been used on the third predict statement and so the matrix of SEDs for the variety by nitrogen table is printed For the first two predictions the average SED is calculated from the average variance of differences Note also that the order of the predictions e g 0 6_cwt 0 4_cwt 0 2_ cwt O_cwt for nitrogen is simply the order those treatment labels were discovered in the data file Split plot analysis oat Variety Nitrogen 29 Jul 2005 19 28 02 Ecode is E for Estimable for Not Estimable ees a ee ee ee es 1 Ba a ei a a a eh Predicted values of yield variety is averaged over fixed levels blocks is ignored in the prediction except where specifically included wplots is ignored in the prediction except where specifically included nitrogen Predicted_Value Standard_Error Ecode 0 6_cwt 123 3889 7 1747 E 0 4_cwt 114 2222 7 1747 E 0 2_cwt 98 8889 7 1747 E O_cwt 79 3889 7 1747 E SED Overall Standard Error of Difference 4 436 kaika SSeS Sees Sasa Sees Saas a Saa r ss See See Se SS eee Se Predicted values of yield nitrogen is averaged over fixed levels blocks is ignored in the prediction except where specifically included wplots is ignored in the prediction except where specifical
277. mmdd into days Jyyd converts a date in the form ccyyddd or yyddd into days These calculate the number of days since December 31 1900 and are valid for dates from January 1 1900 to December 31 2099 note that if cc is omitted it is taken as 19 if yy gt 32 and 20 if yy lt 33 the date must be entirely numeric characters such as may not be present but see DATE Mv converts data values of v to miss ing if M is used after A or I v should refer to the encoded factor level rather than the value in the data file see also Section 4 2 the maximum minimum and modulus of the field values and the value v assigns Haldane map positions s to marker variables and imputes missing values to the markers see below replaces any missing values in the vari ate with the value v replaces the variate with normal ran dom variables having variance v replaces data values o with n in the cur rent variable I e IF DataValue EQ o DataValue n rescales the column s in the current variable G group of variables using Y Y 0 s ChrAdom DOM ChrAadd Rate EXP yield M 9 yield M lt 0 M gt 100 yield MAX 9 ChrAadd G 10 MM i Rate NA O Ndat 0 Normal 4 5 is equivalent to Ndat Normal 4 5 Rate REPLACE 9 0 Rate RESCALE 10 0 1 New New New New 5 Command file Reading the data 55 List of transformation qualifiers and their actions with examples qualifier a
278. model The next model includes a measurement error or nugget effect component That is the variance model for the plot errors is now given by o 07 E Q Er YIis0 15 6 where 7 is the ratio of nugget variance to error variance The abbreviated output for this model is given below There is a significant improvement in the REML log likelihood with the inclusion of the nugget effect see Table 15 7 AR1 x AR1 id 1 LogL 739 681 S2 36034 125 df 1 000 0 1000 0 1000 15 Examples 264 Slat Variogram o Hahl ganle 26 nag woe 17 08 51 residu Outer displacement Inner displacement Figure 15 5 Sample variogram of the residuals from the AR1xAR1 model for the Slate Hall data NOOO FP WD 8 LogL 714 LogL 703 LogL 700 LogL 700 LogL 700 LogL 700 LogL 700 340 338 371 324 322 322 322 S52 S52 S2 S52 S2 S52 S52 Final parameter values Source Variance Residual Residual Analysis of Variance 8 6 ARi x AR1 units mu variety 1 LogL 740 735 2 LogL 723 595 3 LogL 698 498 4 LogL 696 847 5 LogL 696 823 28109 29914 37464 38602 38735 38754 38757 Model terms 150 125 AR AutoR 15 AR AutoR 10 S52 33225 52 11661 S2 46239 S2 44725 S2 45563 t 0 6 0 4 NumDF 1 24 125 125 125 125 125 125 125 Gamma 00000 83767 58607 125 125 125 125 125 af af df df df af at 1 0 DenDF 12
279. model for which Gi yiIq and direct product models for correlated random factors given by Gi Gi Gi2 Gi for three component factors The vector w is therefore assumed to be the vector representation of a 3 way array For two factors the vector u is simply the vec of a matrix with rows and columns indexed by the component factors in the term where vec of a matrix is a function which stacks the columns of its matrix argument below each other A range of models are available for the components of both R and G They include correlation C models that is where the diagonals are 1 or covariance 2 Some theory 11 V models and are discussed in detail in Chapter 7 Some correlation models include e autoregressive order 1 or 2 e moving average order 1 or 2 e ARMA 1 1 e uniform e banded e general correlation Some of the covariance models include e diagonal that is independent with heterogeneous variances e antedependence e unstructured e factor analytic There is the facility within ASReml to allow for a nonzero covariance between the subvectors of u for example in random regression models In this setting the intercept and say the slope for each unit are assumed to be correlated and it is more natural to consider the two component terms as a single term which gives rise to a single G structure This concept is discussed later 2 2 Estimation Estimation involves two processes that are closely link
280. ms that dose is significant when adjusted for littersize and sex but ignoring dose sex and that sex is significant when adjusted for littersize and dose but ignoring dose sex These tests respect marginality to the dose sex interaction The incremental Wald tests 15 Examples 249 We also note the comment 3 possible outliers see res file Checking the res file we discover unit 66 has a standardised residual of 8 80 see Fig ure 15 1 The weight of this female rat within litter 9 is only 3 68 compared to weights of 7 26 and 6 58 for two other female sibling pups This weight appears erroneous but without knowledge of the actual experiment we retain the obser vation in the following However part 2 shows one way of dropping unit 66 by fitting an effect for it with out 66 Rats example Residuals vs Fitted values Residuals Y 3 02 1 22 Fitted values X 5 04 7 63 o a 8 o o 5 z 5 o 8 gt 8 0 00 o o ge bo 7 o F o 3 oo e a i o gre o 8 g Da2 orta SiE Sogo geo o ae Oe g go a 2 o oko pe 8 Big k o 5 s ons e e ir 8 8 soo 8 Ps 8 000 S o 80 8 Fo a o o o o o o Oo o o o o Figure 15 1 Residual plot for the rat data We refit the model without the dose sex term Note that the variance parame ters are re estimated though there is little change from the previous analysis Source Model terms Gamma Component Comp SE C dam 27 27 0 595157 0 979179E 01 2 93 0 P Variance 322 317 1 0
281. n predicted values and functions of the variance components 26 3 A guided tour 27 3 1 Introduction This chapter presents a guided tour of ASReml from data file preparation and basic aspects of the ASReml command file to running an ASReml job and inter preting the output files You are encouraged to read this chapter before moving to the later chapters e areal data example is used in this chapter for demonstration see below e the same data are also used in later chapters e links to the formal discussion of topics are clearly signposted by margin notes Note that some aspects of ASReml in particular pedigree files see Chapter 9 and multivariate analysis see Chapter 8 are only covered in later chapters ASReml is essentially a batch program with some optional interactive features The typical sequence of operations when using ASReml is e Prepare the data typically using a spreadsheet or data base program e Export that data as an ASCII file for example export it as a csv comma separated values file from Excel e Prepare a job file with filename extension as e Run the job file with ASReml Review the various output files revise the job and re run it or extract pertinant results for your report You will need a file editor to create the command file and to view the various output files On unix systems vi and emacs are commonly used Under Win dows there are several suitable program editors availab
282. n 6 10 Thus variables form three classes those read from the data file possibly modified normally labelled and available for subsequent use in analysis those created and labelled available for subsequent use in the analysis and those created but not labelled intermediate calculations not required for subsequent analysis The first variables contain the values read from the data file for each record The number of variables read can be explicitly set using the READ qualifier described in Table 5 5 Otherwise ASReml reads values from the data file for each variable factor defined unless the variable factor and all subsequent labelled variables are created using transformations For example ABC A B reads two fields A and B and constructs C as A B All three are available for analysis Variables that have an explicit label may be referenced by their explicit label or their internal label Therefore to avoid confusion do not use explicit labels of the form Vi where 7 is a number for variables to be referred to in a transformation Vi always refers to field variable 7 in a transformation statement Variables that are not initialized from the data file are initialized to missing value for the first record and otherwise to the values from the preceding record 5 Command file Reading the data 52 after transformation Thus A B LagA V4 V4 A reads two fields A and B and constructs LagA as the value of A from the
283. n is missing the residual missing values predicted value and Hat value are also declared missing The missing value estimates with standard errors are reported in the s1n file NIN alliance trial 1989 Residuals vs Fitted values Residuals Y 24 87 15 91 Fitted values X 16 77 35 94 o o o o o o a go ao o g o o o 4 3 o o 8 0 920 ais o g E mo a p oo S oS oS fo eee Ro g og o 2 o Go o 2 oe o ae P oe Mo 8 8 o k o 5 a a E as E os o o 8 o8 oo Figure 13 1 Residual versus Fitted values This is the first 20 lines of nin89a yht Note that the values corresponding to the missing data first 15 records are all 0 1000E 36 which is the internal value used for missing values Record Yhat 0 10000E 36 0 OMAN AO PWNS I p I Residual 0 10000E 36 0 0 10000E 36 0 Q 10000E 36 0 0 10000E 36 0 1LQ0G0E 36 0 0 10000E 36 O LOOOE 36 0 10000E 36 0O 0 10000E 36 O 10000E 36 0 1000E 36 1000E 36 1000E 36 1000E 36 1000E 36 1000E 36 1000E 36 1000E 36 1000E 36 Hat 1000E 36 LOOOE 36 1000E 36 1000E 36 LOOOE 36 1000E 36 1000E 36 LOOOE 36 1000E 36 1000E 36 13 Description of output files 196 11 0 10000E 36 0 1000E 36 0 1000E 36 12 0 10000E 36 0 1000E 36 0 1000E 36 13 0 10000E 36 0 1000E 36 0 1000E 36 14 0 10000E 36 0 1000E 36 0 1000E 36 15 0 10000E 36 0 1000E 36 0 1000E 36 16 24 088 5
284. n of the model fit With no plot options ASReml chooses an arrangement for plotting the predic tions by recognising any covariates and noting the size of factors However the user is able to customize how the predictions are plotted by either using options to the PLOT qualifier or by using the graphical interface The graphical interface is accessed by typing Esc when the figure is displayed The PLOT qualifier has the following options Table 10 2 List of predict plot options option action Lines and data addData superimposes the raw data addlabels factors superimposes the raw data with the data points labelled using the given factors which must not be prediction factors This option may be useful to identify individual data points on the graph for instance potential outliers or alternatively to identify groups of data points e g all data points in the same stratum 10 Tabulation of the data and prediction from the model 166 List of predict plot options option action addlines factors superimposes the raw data with the data points joined using the given factors which must not be prediction factors This option may be useful for repeated measures data noSEs specifies that no error bars should be plotted by default they are plotted semult r specifies the multiplier of the SE used for creating error bars default 1 0 joinmeans specifies that the predicted values should be
285. n raised twin and 4 other sex M F and grp a factor indicating the flock year combination Half sib analysis In the half sib analysis we include terms for the random effects of sires dams and litters In univariate analyses the variance component for sires is denoted by 2 s dams is denoted by o2 to 02 where g2 is the maternal variance component of t03 where A is the additive genetic variance the variance component for and the variance component for litters is denoted by o and represents variation attributable to the particular mating For a multivariate analysis these variance components for sires dams and litters are in theory replaced by unstructured matrices one for each term Additionally we assume the residuals for each trait may be correlated Thus for this example we would like to fit a total of 4 unstructured variance models For such a situation it is sensible to commence the modelling process with a series of univariate analyses These give starting values for the diagonals of the variance matrices but also indicate what variance components are estimable The ASReml job for the univariate analyses is Multivariate Sire amp Dam model tag sire 92 II dam 3561 I grp 49 sex brr 4 litter 4871 15 Examples 294 Table 15 13 REML estimates of a subset of the variance parameters for each trait for the genetic example expressed as a ratio to their asymptotic s e term wwt ywt gfw fdm fat
286. n vari ance structure This facility requires the user to supply a program MYOWNGDG that reads the current set of parameters forms the G matrix and a full set of derivative matrices and writes these to disk Before each iteration ASReml writes the OWN parameters to a file runs MYOWNGDG which it presumes forms the G and derivative matrix and then reads the matrices back in An example of MYOWNGDG f90 is distributed with ASReml It duplicates the AR1 and AR2 structures The following job fits an AR2 structure using this program Example of using the OWN structure rep blcol 7 Command file Specifying variance structures 132 blrow variety 25 yield barley asd skip 1 OWN MYOWN EXE y va ie 10 O ARI 1 15 0 OWN2 2 1 TRR The file written by ASReml has extension own and looks like 15 2 1 0 6025860D 000 1164403D 00 This file was written by asreml for reading by your program MYOWNGDG asreml writes this file runs your program and then reads shfown gdg which it presumes has the following format The first lines should agree with the top of this file specifying the order of the matrices 15 the number of variance parameters 2 and a control parameter you can specify 1 These are written in 315 format They are followed by the list of variance parameters written in 6D13 7 format Follow this with 3 matrices written in 6D13 7 format These are to be each of 120 elements being lower triangle row wise
287. name error there is a missing parameter there are too many few initial values e there is an error in the predict statement e model term mv not included in the model when there are missing values in the data and the model fitted assumes all data is present The most common problem in running ASReml is that a variable label is misspelt 14 3 Things to check in the asr file workspace working tory direc The information that ASReml dumps in the asr file when an error is encountered is intended to give you some idea of the particular error e if there is no data summary ASReml has failed before or while reading the model line if ASReml has completed one iteration the problem is probably associated with starting values of the variance parameters or the logic of the model rather than the syntax per se Part of the file nin89 asr presented in Chapter 13 is displayed below to indicate the lines of the asr file that should be checked You should check that e sufficient workspace has been obtained e the records read lines read records used are correct e mean min max information is correct for each variable e the Loglikelihood has converged and the variance parameters are stable e ANOVA table has the expected degrees of freedom ASReml 1 630 01 Jun 2005 NIN alliance trial 1989 Build j 01 Jul 2005 128 14 Jul 2005 12 41 18 360 32 00 Mbyte Windows nin89a Licensed to Arthur Gilmour Folder C data asr UG2
288. nce 256 255 1 00000 0 511402E 01 Di2 Q P Table 15 3 REML log likelihood ratio for the variance components in the voltage data REML 2x terms log likelihood difference P value setstat 200 31 5 864 0077 setstat regulatr 184 15 38 19 0000 teststat 199 71 7 064 0039 15 Examples 253 15 5 Balanced repeated measures Height The data for this example is taken from the GENSTAT manual It consists of a total of 5 measurements of height cm taken on 14 plants The 14 plants were either diseased or healthy and were arranged in a glasshouse in a completely random design The heights were measured 1 3 5 7 and 10 weeks after the plants were placed in the glasshouse There were 7 plants in each treatment The data are depicted in Figure 15 3 obtained by qualifier line ly y1 G tmt JOIN in the following multivariate ASReml job This is plant data multivariate _Y yl Xoga ts tm Y axis 21 0000 130 500 axis 0 5000 5 5000 1 2 Figure 15 3 Trellis plot of the height for each of 14 plants In the following we illustrate how various repeated measures analyses can be conducted in ASReml For these analyses it is convenient to arrange the data in a multivariate form with 7 fields representing the plant number treatment identification and the 5 heights The ASReml input file up to the specification of the R structure is This is plant data multivariate tmt A Diseased Healthy plant 14 yl y3 y5 y7 yi
289. nce models variance structures are sometimes formed by combining variance models For example a two factor interaction may involve two variance models one for each of the two factors in the interaction Some of the rules for combining variance models differ for R structures and G structures 7 Command file Specifying variance structures 135 See Sections 2 1 and 7 5 A summary of the rules is as follows NIN Alliance Trial 1989 variety A e when combining variance models in both R i and G structures the resulting direct prod uct structure must match the ordered ef row 22 fects with the outer factor first for example column 11 i I i the G structure in the example opposite is ae Ep yield mu variety r repl for column row which tells ASReml that the bolumn row direct product structure matches the effects 0 0 1 ordered rows within columns The variance pases erg ia 2 column model can be written as o I Uc AzR Zos 0 ARV 0 3 0 1 This is why the G structure definition line for column is specified first ASReml automatically includes and estimates an error variance parameter for each section of an R structure The variance structures defined by the user should therefore normally be correlation matrices A variance model can be specified but the S2 1 qualifier would then be required to fix the error vari ance at 1 and prevent ASReml trying to estimate two confounded parameters er
290. ncluded in the ANOVA table e generally begins with the reserved word mu which fits a constant term mean or inter cept see Table 6 1 NIN Alliance Trial 1989 variety row 22 column 11 nin89 asd skip 1 mvinclude yield mu variety r repl If mv 12 11 column AR1 3 22 row AR1 3 6 Command file Specifying the terms in the mixed model 89 Sparse fixed terms The f sparse_fired terms in model formula NIN Alliance Trial 1989 variety e are the fixed covariates for example the fixed lin row covariate now included in ace the model formula factors and interac column 11 nin89 asd skip 1 ield mu variety r repl served words for example mv see Table J r If mv lin row 6 1 for which ANOVA type tests are not 12 required 11 column AR1 424 22 row AR1 904 tions including special functions and re include large gt 100 levels terms 6 4 Random terms in the model The r random terms in the model formula i NIN Alliance Trial 1989 comprise random covariates factors and in variety teractions including special functions and reserved words see Table 6 1 wie column 11 nin89 asd skip 1 yield mu variety r repl ance default 0 1 the initial value can be if mv 1 2 specified after the model term or if the vari 11 column ARI 424 22 row AR1 904 involve an initial non zero variance compo nent or ratio relative to the residual vari ance structu
291. nd the correlations above the diagonal The FITTED matrix is the same as is reported in the asr file and if the Logl has converged is the one you would report the BLUPS matrix is clculated from the BLUPS and is provided so it can be used as starting values when a simple initial model has been used and you are wanting to attempt to fit a full unstructured matrix the rescaled has the variance from the FITTED and the covariance from the BLUPS and might we more suitable as an initial matrix if the variances have been estimated The FITTED and RESCALEd matrices should not be reported relevant portions of the estimated variance matrix for each term for which an R structure or a G structure has been associated a variogram and spatial correlations for spatial analysis the spatial correlations are based on distance between data points see Gilmour et al 1997 the slope of the log absolute residual on log predicted value for assessing pos sible mean variance relationships and the location of large residuals For ex ample SLOPES FOR LOG ABS RES ON LOG PV for section 1 0 99 2 01 4 34 produced from a trivariate analysis reports the slopes A slope of b suggests that y might have less mean variance relationship If there is no mean variance relation a slope of zero is expected A slope of suggests a SQRT transformation might resolve the dependence a slope of 1 means a LOG trans formation might be appropriate So for th
292. neity so that subsequent analyses were conducted on the square root scale Figure 15 8 presents a plot of the treated and the control root area on the square root scale for each variety There is a strong dependence between the treated and control root area which is not surprising The aim of the experiment was to determine the tolerance of varieties to bloodworms and thence identify the most tolerant varieties The definition of tolerance should allow for the fact that varieties differ in their inherent seedling vigour Figure 15 8 The original approach of the scientist was to regress the treated root area against the control root area and define the index of vigour as the residual from this regression This approach is clearly inefficient since there is error in both variables We seek to determine an index of tolerance from the joint analysis of treated and control root area Standard analysis The allocation of bloodworm treatments within varieties and varieties within runs defines a nested block structure of the form run variety tmt run run variety run variety tmt run pair pair tmt run run variety units 15 Examples 274 this is for the paired data Y sye Syc Y axis 1 8957 14 8835 Xodxis 8 2675 23 5051 o0 Figure 15 8 Rice bloodworm data Plot of square root of root weight for treated versus control There is an additional blocking term however due to the fact that the blood worms within a
293. nent say the asymptotic distribution of the REMLRT is a mixture of x variates where the mixing probabilities are 0 5 one with 0 degrees of freedom spike at 0 and the other with 1 degree of freedom The approximate P value for the REMLRT statistic D is 0 5 1 Pr x lt d where d is the observed value of D The distribution of the REMLRT for the test that k variance components are zero or tests involved in random regressions which involve both variance and covariance components involves a mixture of x variates from 0 to k degrees of freedom See Self and Liang 1987 for details Tests concerning variance components in generally balanced designs such as the 2 Some theory 18 balanced one way classification can be derived from the usual analysis of vari ance It can be shown that the REMLRT for a variance component being zero is a monotone function of the F statistic for the associated term To compare two or more non nested models we can evaluate the Akaike Infor mation Criteria AIC or the Bayesian Information Criteria BIC for each model These are given by AIC 2 p 2t BIC 2 p t logy 2 15 where t is the number of variance parameters in model i and v n p is the residual degrees of freedom AIC and BIC are calculated for each model and the model with the smallest value is chosen as the preferred model Diagnostics In this section we will briefly review some of the diagnostics that have b
294. ng alters the error degrees of freedom from v to v n This qualifier might be used when analysing pre adjusted data to reduce the degrees of freedom n negative or when weights are used in lieu of actual data records to supply error infor mation n positive The degrees of freedom is only used in the calculation of the residual variance in a univariate single site analysis The option will have no effect in analyses with multiple error variances for sites or traits other than in the reported degrees of freedom Use ADJUST r rather than DF n if ris not a whole number Use with YSS r to supply variance when data fully fitted 5 Command file Reading the data 74 List of rarely used job control qualifiers qualifier action New Caution EMFLAG n PXEM n requests ASReml use Expectation Maximization EM rather than Average Information AI updates when the AI updates would make a US structure non positive definite This only applies to US structures and is still under development When IGP is associated with a US structure ASReml checks whether the updated matrix is positive definite PD If not it re places the AI update with an EM update If the non PD characteristic is transitory then the EM update is only used as necessary If the converged solution would be non PD there will be a EM update each iteration even though EM is omitted EM is notoriously slow at finding the solution and ASReml includes se
295. ng 0 O 1 200 14 08 26 40 10 row 22 0 0 1 11 7321 22 11 column 11 0 0 1 6 3304 141 12 mu 1 Fault 1 G structure header Factor order Last line read was Repl 10000 ninerr6 variety id pid raw rep nloc yield lat 14 Error messages 224 Model specification TERM LEVELS GAMMAS variety 56 mu repl 4 0 100 SECTIONS 224 4 1 TYPE 0 0 STRUCT 224 0 0 0 0 0 0 12 factors defined max 500 4 variance parameters max1500 2 special structures Final parameter values 0 10000 1 0000 Last line read was Repl 10000 12 1 242 224 8000 Finished 27 Jul 2005 15 41 53 668 G structure header Factor order Fixing the header line we then get the error message Structure Factor mismatch This arose because repl has 4 levels but we have only declared 2 in the G struc ture model line The G structure should read repl 1 4 0 IDV 0 1 The last lines of the output with this error are displayed below 11 column 11 0 0 1 6 3304 11 12 mu 1 2 identity 0 1000 Structure for repl has 2 levels defined Fault 1 Structure Factor mismatch Last line read was 20 IDV 0 100000 ninerr7 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu repl 4 0 100 SECTIONS 224 4 1 TYPE 0 0 1002 STRUCT 224 0 0 0 0 0 0 14 Error messages 225 2 1 0 5 0 i 0 12 factors defined max 500 5 variance parameters max1i500 2 special structures Final parameter values 0 10000 1 0000 0 10000 Last line read was 20 IDV0
296. ng so this can make it difficult to line up the values unless you can manipulate them in another program spread sheet score asl file given if the DL command line option is used tables of means tab file simple averages of cross classified data are pro pvs file duced by the tabulate directive to the tab file Adjusted means predicted from the fitted model are written to the pvs file by the predict direc tive variance of variance vvp file based on the inverse of the average information parameters matrix variance parameters asr file the values at each iteration are printed in the res file res file The final values are arranged in a table printed with labels and converted if necessary to variances variogram graphics file 14 Error messages Introduction Common problems Things to check in the asr file An example Error messages Warning messages 212 14 Error messages 213 14 1 Introduction See Chapter 12 memory info working tory direc When ASReml finds an inconsistency it prints an error message to the screen or the asl file and dumps the current information to the asr file Below is the screen output for a job that has been terminated due to an error If a job has an error you should e try and identify the problem from the error message in the Fault line and the text of the Last line read this appears twice in the file to make it easier to find check that all la
297. ng to a linkage group defined using MM which represents additive marker variation coded 1 0 1 representing marker states aa aA and AA respectively It is a group transformation which takes the 1 1 interval values and calculates X 0 5 2 i e 1 and 1 become one 0 becomes 1 The marker map is also copied and applied to this model term so it can be the argument in a qt1 term page 95 Other rules and examples Other rules include the following e missing values are unaffected by arithmetic operations that is missing values in the current or target column remain missing after the transformation has been performed except in assignment 3 will leave missing values NA and as missing 3 will change missing values to 3 e multiple arithmetic operations cannot be expressed in a complex expression but must be given as separate operations that are performed in sequence as they appear for example yield 120 0 0333 would calculate 0 0333 yield 120 ASReml code action yield MO changes the zero entries in yield to missing values yield 0 takes natural logarithms of the yield data score 5 subtracts 5 from all values in score score ISET 0 5 1 5 2 5 replaces data values of 1 2 and 3 with 0 5 1 5 and 2 5 respectively 5 Command file Reading the data 58 ASReml code action score 5UB 0 5 1 5 2 5 block 8 variety 20 yield plot variety SEQ Var 3 Ni
298. ngularities in the mixed model equations This is intended for use on the rare occasions when ASReml detects singularities after the first iteration they are not expected Normally when no TOLERANCE qualifier is specified a singularity is declared if the adjusted sum of squares of a covariable is less than a small constant 7 or less than the uncorrected sum of squares xn where 7 is 107 in the first iteration and 10 thereafter The qualifier scales 7 by 10 for the the first or subsequent iterations respectively so that it is more likely an equation will be declared singular Once a singularity is detected the corresponding equation is dropped forced to be zero in sub sequent iterations If neither argument is supplied 2 is as sumed If the second argument is omitted it is given the value of the first If the problem of later singularities arises because of the low coefficient of variation of a covariable it would be better to centre and rescale the covariable If the degrees of freedom are correct in the first iteration the problem will be with the variance parameters and a different variance model or variance constraints is required requests writing of vrb file Previously the default was to write the file 6 Command file Specifying the terms in the mixed model Introduction Specifying model formulae in ASReml General rules Examples Fixed terms in the model Primary fixed terms Sparse fixed
299. ning variance models Specifically the code 11 column IDV 48 S2 would be required in this case where 48 is the starting value for the variances This complexity allows for heterogeneous error variance 3b Two dimensional separable autoregressive spatial model This model extends 3a by specifying a first order autoregressive correlation model of or der 11 for columns AR1 The R structure in this case is therefore the direct product of two autoregressive correlation matrices that is V 02 pc Er pr giving a two dimensional first order separable autoregres sive spatial structure for error The starting column correlation in this case is also 0 3 Again note that o is implicit NIN Alliance Trial 1989 variety A id row 22 column 11 nin89aug asd skip 1 yield mu variety f mv 120 11 column ARi 0 3 22 row AR1 0 3 7 Command file Specifying variance structures 112 See Section 7 4 3c Two dimensional separable autoregressive spatial model with mea surement error This model extends 3b by adding a random units term Thus V g ITa Xepe Ur pr The re served word units tells ASReml to construct an additional random term with one level for each experimental unit so that a second in dependent error term can be fitted A units term is fitted in the model in cases like this where a variance structure is applied to the errors Because a G structure is not explic itly specified
300. nits variety Predicted_Value Standard_Error Ecode 1 0000 1245 5843 97 8591 E 2 0000 1516 2331 97 8473 E 3 0000 1403 9863 98 2398 E 4 0000 1404 9202 97 9875 E 5 0000 1471 6197 98 3607 E 23 0000 1316 8726 98 0402 E 24 0000 1557 5273 98 1272 E 25 0000 1573 8920 97 9803 E SED Overall Standard Error of Difference 60 51 IB Rep is ignored in the prediction RowB1lk is ignored in the prediction Co1lBlk is ignored in the prediction variety Predicted_Value Standard_Error Ecode 1 0000 1283 5870 60 1994 E 2 0000 1549 0133 60 1994 E 3 0000 1420 9307 60 1994 E 4 0000 1451 8554 60 1994 E 5 0000 1533 2749 60 1994 E 23 0000 1329 1088 60 1994 E 24 0000 1546 4699 60 1994 E 25 0000 1630 6285 60 1994 E SED Overall Standard Error of Difference 62 02 Notice the differences in SE and SED associated with the various models Choos ing a model on the basis of smallest SE or SED is not recommended because the model is not necessarily fitting the variability present in the data 15 Examples 267 Table 15 7 Summary of models for the Slate Hall data REML number of model log likelihood parameters F statistic SED AR1xAR1 700 32 3 13 04 59 0 AR1xAR1 units 696 82 4 10 22 60 5 IB 707 79 4 8 84 62 0 15 7 Unreplicated early generation variety trial Wheat To further illustrate the approaches presented in the previous section we con sider an unreplicated field experiment conducted at Tullibigeal situated in south western NSW The t
301. nored The qualifier AVERAGE allows these variables to be added to the default averaging set The third step is to select the linear model terms to use in prediction The default is that all model terms based entirely on variables in the classifying and averaging sets are used Two qualifiers allow this default to be modified by adding USE or removing IGNORE model terms The qualifier ONLYUSE explicitly specifies the model terms to use ignoring all others The qualifier EXCEPT explicitly specifies the model terms not to use including all others These qualifiers may implicitly modify the averaging set by including variables defining terms in the predicted model not in the classify set It is sometimes easier to specify the classify set and the model terms to use and allow ASReml to construct the averaging set The fourth step is to choose the weights to use when averaging over dimensions in the hyper table The default is to simply average over the specified levels but the qualifier AVERAGE factor weights allows other weights to be specified For example yield site variety r site variety at site block predict variety puts variety in the classify set site in the averaging set and block in the ig nore set Consequently ASReml forms the sitex variety hyper table from model terms site variety and site variety but ignoring all terms in at site block 10 Tabulation of the data and prediction from the model 162 predi
302. not automatically include and estimate a scale param eter for a G structure when the explicit G structure does not include one For this reason the model supplied when the G structure involves just one variance model must not be a correlation model all diagonal elements equal 1 all but one of the models supplied when the G structure involves more than one variance model must be correlation models the other must be either an homogeneous or a heterogeneous variance model see Section 7 5 for the distinction between these models see also 5 for an example an initial value must be supplied for all parameters in G structure definitions ASReml expects initial values immediately after the variance model identifier 7 Command file Specifying variance structures 110 See Chapter 14 See page 118 or on the next line 0 1 directly after IDV in this case 0 is ignored as an initial value on the model line if there is no initial value after the identifier ASReml will look on the next line if ASReml does not find an initial value it will stop and give an error message in the asr file e in this case V o2 Z Z 02I which is fitted as o2 yr Zr Z I where yr is a variance ratio yp 02 02 and o2 is the scale parameter Thus 0 1 is a reasonable initial value for y regardless of the scale of the data 3a Two dimensional spatial model with spatial correlation in one direc tion This code specifi
303. ns model function action mv out n out n t pol v n p v n is used to estimate missing values in the response variable Formally this creates a model term with a column for each missing value Each column contains zeros except for a solitary 1 in the record containing the corre sponding missing value This is used in spatial analyses so that computing advantages arising from a balanced spatial layout can be exploited The equations for mv and any terms that follow are always included in the sparse set of equations Missing values are handled in three possible ways during analysis see Section 6 10 In the simplest case records containing missing values in the response variable are deleted For multivariate including some re peated measures analysis records with missing values are not deleted but ASReml drops the missing observation and uses the appropriate unstruc tured R inverse matrix For regular spatial analysis we prefer to retain separability and therefore estimate the missing value s by including the special term mv in the model out n out n t establishes a binary variable which is out i 1 if data relates to observation i trait 1 else is 0 out i t 1 if data relates to observation i trait t else is 0 The intention is that this be used to test remove single observations for example to remove the influence of an outlier or influential point Possible outliers will be evident in the plot of re
304. ntinue for one more iteration from the values in the rsv file This is useful when using predict see Chapter 10 O ONERUN is used with the R option to make ASReml perform a single analysis when the R option would otherwise attempt multiple analyses The R option then builds some arguments into the output file name while other arguments are not For example ASReml nor2 mabphen 2 TWT out 621 out 929 results in one run with output files mabphen2_TWT R r RENAME r is used in conjunction with at least r argument s and does two things it modifies the output filename to include the first r arguments so the output is identified by these arguments and if there are more than r arguments the job is rerun moving the extra arguments up to position r unless ONERUN 0 is also set If r is not specified it is taken as 1 For example ASReml r2 job wwt gfw fd fat is equivalent to running three jobs ASReml r2 job wwt gfw jobwwt_gfw asr ASReml r2 job wwt fd jobwwt_fd asr ASReml r2 job wwt fat jobwwt_fat asr Yy YVAR y overrides the value of response the variate to be analysed see Section 6 2 with the value y where y is the number of the data field containing the trait to be analysed This facilitates analysis of several traits under the same model The value of y is appended to the basename so that output files are not overwritten when the next trait is analysed Workspace command line options S W The wor
305. nual 8 Procedure Library PL17 VSN International Hemel Hempstead UK pp 260 265 Welham S J and Thompson R 1997 Likelihood ratio tests for fixed model terms using residual maximum likelihood Journal of the Royal Statistical Society Series B 59 701 714 Welham S J Cullis B R Gogel B J Gilmour A R and Thompson R 2004 Prediction in linear mixed models Australian and New Zealand Journal of Statistics 46 325 347 White I M S Thompson R and Brotherstone S 1999 Genetic and environ mental smoothing of lactation curves with cubic splines Journal of Dairy Science 82 632 638 Wilkinson G N and Rogers C E 1973 Symbolic description of factorial models for analysis of variance Applied Statistics 22 392 399 Wolfinger R and O Connell M 1993 Generalized linear mixed models A pseudo likelihood approach Journal of Statistical Computation and Simu lation 48 233 243 Wolfinger R D 1996 Heterogeneous variance covariance structures for re peated measures Journal of Agricultural Biological and Environmental Statistics 1 362 389 Yates F 1935 Complex experiments Journal of the Royal Statistical Society Series B 2 181 247 Index ABORTASR NOW 65 FINALASR NOW 65 Access 44 accuracy genetic BLUP 194 advanced processing arguments 185 Al algorithm 13 AIC 18 Akaike Information Criteria 18 aliassing 102 Analysis of Deviance 98 animal breeding data 2
306. o analyse just one model per run However the analysis of a data set typically requires many runs fitting different models to different traits It is often convenient to have all these runs coded into a single as file and control the details from the command line or top job control line using arguments The highlevel qualifiers CYCLE and DOPATH enable multiple analyses to be defined and run in one execution of ASReml Table 12 3 High level qualifiers qualifier action ICYCLE list JOIN ue is a mechanism whereby ASReml can loop through a series of jobs The CYCLE qualifier must appear on its own line starting in character 1 em list is a series of values which are substituted into the job wherever the I string appears If JOIN is not specified the current value from list is built into the output filenames writing the output to separate files If JOIN is specified the outputs are written to a single file For example ICYCLE 0 4 0 5 0 6 JOIN 20 O mat2 1 9 I GPF would result in three runs and the results would be appended to a single file Warning The CYCLE mechanism does not work in combina tion with the RENAME qualifier used with multiple command line arguments 12 Command file Running the job 187 High level qualifiers qualifier action DOPATH n The qualifiers DOPART and PART have been extended in re lease 2 0 and DOPATH and PATH are thought to be more ap propriate names Both spell
307. oad d11 functionality is provided under license to VSN Alison Kelly has helped with the review of the XFA models Finally we especially thank our close associates who continually test the enhancements Arthur Gilmour acknowledges the grace of God through Jesus Christ who enables this research to proceed Be exalted O God above the heavens and Thy glory above all the earth Psalm 108 5 Contents Preface i List of Tables xvii List of Figures xix 1 Introduction 1 1 1 What ASReml can do 0 000 ee ee 2 1 2 Installation 2 a 2 1 3 User Interface 2 02000000000 ek 3 ASRemIl W 0 3 COMMER son ee fee oe ek ee ee ee ee A 3 1 4 How to use this guide 2 02 0 200004 4 1 5 Help and discussion list o o a 00000000 eee 4 1 6 Typographic conventions ooa a 2000205008 5 2 Some theory 6 2 1 The linear mixed model aaa aa a a 7 Contents v Introductions ocsi a e E e be a wee eee e e Ge 7 Direct product structures ooo 002005 7 Variance structures for the errors R structures 9 Variance structures for the random effects G structures 10 22 Estimation 2 i 244 4 844585288 bo 2 G4 bd Gas 11 Estimation of the variance parameters 11 Estimation prediction of the fixed and random effects 14 2 3 What are BLUPs 2 02 0 00 000000 15 2 4 Combining variance models 0 0 0 2000000 16 2 5 Inference Random effects
308. od ratio for the remaining terms in the model The summary of the ASReml output for the current model is given below The column labelled Comp SE is printed by ASReml to give a guide as to the significance of the variance component for each term in the model The statistic is simply the REML estimate of the variance component divided by the square root of the diagonal element for each component of the inverse of the average information matrix The diagonal elements of the expected not the av erage information matrix are the asymptotic variances of the REML estimates 15 Examples 252 ltage example 5 3 6 from the GENSTAT REML manual Residuals vs Fitted valu Residuals Y 1 08 1 45 Fitted values X 15 56 16 81 o 2 6 Oo o o o o o o 5 o o b o o i ik ca 20 wm oo o o _ Vo am 09 a o 2 oe a Deep 2o co oo 3 T 5 o o oo 3 mg o Sa o cay o 900 g wy l oo o Po o Figure 15 2 Residual plot for the voltage data of the variance parameters These Comp SE statistics cannot be used to test the null hypothesis that the variance component is zero If we had used this crude measure then the conclusions would have been inconsistent with the conclusions obtained from the REML log likelihood ratio see Table 15 3 Source Model terms Gamma Component Comp SE C setstat 10 10 0 233417 0 119370E 01 1 35 OP setstat regulatr 80 64 0 601817 0 307 7715 01 3 64 OP teststat 4 4 0 642752E 01 0 328705E 02 0 93 OP Varia
309. odel can be written as the sequence R 1 R A 1 R 1 A R 1 R B 1 A R 1 A B R 1 A where the R operator denotes the reduction in the total sums of squares due to a model containing its argument and R denotes the difference between the reduction in the sums of squares for any pair of nested models Thus R B 1 A represents the difference between the reduction in sums of squares between the so called maximal model y 1 A B and y 1l A Implicit in these calculations is that e we only compute Wald statistics for estimable functions Searle 1971 page 408 e all variance parameters are held fixed at the current REML estimates from the maximal model In this example it is clear that the incremental Wald statistics may not produce the desired test for the main effect of A as in many cases we would like to produce a Wald statistic for A based on R A 1 B R 1 A B R 1 B The issue is further complicated when we invoke marginality considerations The issue of marginality between terms in a linear mixed model has been dis cussed in much detail by Nelder 1977 In this paper Nelder defines marginality for terms in a factorial linear model with qualitative factors but later Nelder 1994 extended this concept to functional marginality for terms involving quan titative covariates and for mixed terms which involve an interaction between quantitative covariates and qualitative factors Referring to
310. oefficients blocks 5 00 3175 06 12 0 4 0 1 9 blocks wplots 10 00 601 331 0 0 4 0 1 0 Residual Variance 45 00 177 083 0 0 0 0 1 0 Source Model terms Gamma Component Comp SE C blocks 6 1 21116 214 477 1 27 OP blocks wplots 18 18 0 598937 106 062 1 56 OP Variance 72 60 1 00000 177 083 4 74 OP Analysis of Variance NumDF DenDF Fang Prob 7 mu 1 5 0 245 14 lt 001 4 variety 2 10 0 1 49 0 272 2 nitrogen 3 45 0 37 69 lt 001 8 variety nitrogen 6 45 0 0 30 0 932 15 Examples 244 For simple variance component models such as the above the default parame terisation for the variance component parameters is as the ratio to the residual variance Thus ASReml prints the variance component ratio and variance com ponent for each term in the random model in the columns labelled Gamma and Component respectively The analysis of variance ANOVA is printed below this summary The usual decomposition has three strata with treatment effects separating into different strata as a consequence of the balanced design and the allocation of variety to whole plots In this balanced case it is straightforward to derive the ANOVA estimates of the stratum variances from the REML estimates of the variance components That is blocks stratum variance 12a 462 6 3175 06 blocks wplots stratum variance 46 6 601 331 residual stratum variance 177 083 where cs is the blocks variance component G2 is the blocks wplots component an
311. of effects matches the structure definition so the user must be careful to get this right Check that the Check the order terms are conformable by considering the order of the fitted effects and ensuring the first term of the direct product corresponds to the outer factor in the nesting of the effects Two examples are e random regressions where we want a covariance between intercept and slope Ir animal animal time animal 2 20 Ue 2 5 2 animal is equivalent though not identical because of the scaling differences to Ir pol time 1 animal pol time 1 animal 2 pol time i 0 US 1 1 2 animal maternal direct genetic covariance lambid P sireid P damid P wwt ywt Trait Trait sex r Trait lambid at Trait 2 damid Trait lambid 2 3 0 US 1 3 Var wwt_D 1 0 252 Cov wwt_D ywt_D Var ywt_D ni sa O08 Cov wwt_D wwt_M Cov ywt_D wwt_M Var wwt M lambid O AINV AINV explicitly requests to use A inverse 7 Command file Specifying variance structures 137 7 9 Constraining variance parameters Parameter equality within and between variance structures difficult Equality of parameters in a variance model can be specified using the s qualifier where s is a string of letters and or zeros see Table 7 4 Positions in the string correspond to the parameters of the variance model e all parameters with the same letter in the structure are treated as the same parameter e 1 9 are different from a z w
312. of the G matrix and its derivatives with respect to the parameters in turn This file contains details about what is expected in the file written by your program The filename used has the same basename as the job you are run ning with extension own for the file written by ASReml and gdg for the file your program writes The type of the parameters is set with the T qualifier described below The control parameter is set using the F qualifier F2 applies to OWN models With OWN the argument of F is passed to the MYOWNGDG program as an argument the program can access This is the mechanism that allows several OWN models to be fitted in a single run Ts is used to set the type of the parameters It is primarily used in con junction with the OWN structure as ASReml knows the type in other cases The valid type codes are as follows 7 Command file Specifying variance structures 133 code description action if GP is set V variance forced positive G variance ratio forced positive R correlation l lt r lt l 0 covariance P positive correlation 0 lt r lt 1 i loading This coding also affects whether the parameter is scaled by o in the output 7 6 Variance structure qualifiers Table 7 4 describes the R and G structure line qualifiers Table 7 4 List of R and G structure qualifiers qualifier action l s used to constrain parameters within variance structures see Section 7 9 IGP GU GF modify the
313. of the analysis is to examine the importance of the treatment effects while accounting for the stratification and restricted randomisation of the treatments to the experimental units The ASReml input file is presented below split plot example blocks 6 Coded 1 6 in first data field of oats asd nitrogen A 4 Coded alphabetically subplots Coded 1 4 variety A 3 Coded alphabetically wplots Coded 1 3 yield oats asd SKIP 2 yield mu variety nitrogen variety nitrogen r blocks blocks wplots predict nitrogen Print table of predicted nitrogen means predict variety predict variety nitrogen SED The data fields were blocks wplots subplots variety nitrogen and yield The first five variables are factors that describe the stratification or experi ment design and treatments The standard split plot analysis is achieved by fitting the model terms blocks and blocks wplots as random effects The blocks wplots subplots term is not listed in the model because this interac tion corresponds to the experimental units and is automatically included as the residual term The fixed effects include the main effects of both variety and nitrogen and their interaction The tables of predicted means and associated standard errors of differences SEDs have been requested These are reported in the pvs file Abbreviated output is shown below Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component C
314. of the general linear mixed model These facilities are not available for any terms in the sparse model These include facilities for computing two types of Wald statistics and partial implementation of the Kenward and Roger adjustments Incremental and Conditional Wald Statistics The basic tool for inference is the Wald statistic defined in equation 2 17 ASReml produces a test of fixed effects that reduces to an F statistic in special cases by dividing the Wald test constructed with 0 by r the numerator degrees of freedom In this form it is possible to perform an approximate F test if we can deduce the denominator degrees of freedom However there are several ways L can be defined to construct a test for a particular model term two of which are available in ASReml For balanced designs these Wald F statistics are numerically identical to the F tests obtained from the standard analysis of variance The first method for computing Wald statistics for each term is the so called incremental form For this method Wald statistics are computed from an incremental sum of squares in the spirit of the approach used in classical regression analysis see Searle 1971 For example if we consider a very simple model with terms relating to the main effects of two qualitative factors A and B given symbolically by y 1 A B where the 1 represents the constant term u then the incremental sums of 2 Some theory 21 squares for this m
315. ogeneous variance models at the end of the table from DIAG on In summary to specify e acorrelation model provide the base identifier given in Table 7 3 for example EXP 1 is an exponential correlation model e an homogeneous variance model append a V to the base identifier and provide an additional initial value for the variance for example EXPV 1 3 is an exponential variance model a heterogeneous variance model append an H to the base identifier and provide additional initial values for the diagonal variances for example CORUM 31 3 4 2 is a 3 x 3 matrix with uniform correlations of 0 1 and heterogeneous variances 0 3 0 4 and 0 2 Important See Section 7 7 for rules on combining variance models and important notes regarding initial values The algebraic forms of the homogeneous and heterogeneous variance models are determined as follows Let C Ci denote the correlation matrix for a particular correlation model If is the corresponding homogeneous variance matrix then 5 C It has just one more parameter than the correlation model For example the homogeneous variance model corresponding to the ID correlation model has vari ance matrix X o7I specified IDV in the ASReml command file see below and one parameter The initial values for the variance parameters are listed after the initial values for the correlation parameters For example in 7 Command file Specifying variance stru
316. oint Other output files apj is an ASReml project file created by ASReml W aov contains details of the ANOVA calculations asl contains a progress log and error messages if the L command line option is specified asp contains transformed data see PRINT in Table 5 2 dbr dpr spr contains the data and residuals in a binary form for further analysis see RESIDUALS Table 5 5 veo holds the equation order to speed up re running big jobs when the model is unchanged This binary file is of no use to the user vrb contains the estimates of the fixed effects and their variance Vvp contains the approximate variances of the variance parameters It is designed to be read back with the P option for calculating functions of the variance parameters 13 Description of output files 190 An ASReml run generates many files and the sln and yht files in particular are often quite large and could fill up your disk space You should therefore regularly tidy your working directories maybe just keeping the as asr and pvs files 13 2 An example In this chapter the ASReml output files are NIN Alliance Trial 1989 discussed with reference to a two dimensional variety tA separable autoregressive spatial analysis of the an NIN field trial data see model 3b on page 111 repl 4 of Chapter 7 for details The ASReml com nloc mand file for this analysis is presented to the ae 1 right Recall that this model specifie
317. ological order but chronological order is still required ISELF s allows partial selfing when third field is unknown It indicates that progeny New from a cross where the second parent male_parent is unknown is assumed to be from selfing with probability s and from outcrossing with probability 1 s This is appropriate in some forestry tree breeding studies where seed collected from a tree may have been pollinated by the mother tree or pollinated by some other tree Dutkowski and Gilmour 2001 Do not use the SELF qualifier with the INBRED or MGS qualifiers ISKIP n allows you to skip n header lines at the top of the file SORT causes ASReml to sort the pedigree into an acceptable order that is par New ents before offspring before forming the A Inverse The sorted pedigree is written to a file whose name has srt appended to its name 9 Command file Genetic analysis 154 9 6 Reading a user defined inverse relationship matrix Sometimes an inverse relationship matrix is required other than the one ASReml can produce from the pedigree file We call this a GIV G inverse matrix The user can prepare a giv file containing this matrix and use it in the analysis Alternatively the user can prepare the relationship matrix in a grm file and ASReml will invert it to form the GIV matrix The syntax for specifying a G matrix file say name grm or the G inverse file say name giv is name grm SKIP n or nam
318. olumn 11 nin89aug asd skip 1 yield mu variety 2 11 column AR1 424 22 row AR1 904 Important rules In the ASReml command file all characters following a symbol on a line are ignored all blank lines are ignored lines beginning with followed by a blank are copied to the asr file as comments for the output a blank is the usual separator TAB is also a separator maximum line length is 2000 characters a comma as the last character on the line is used to indicate that the current list is continued on the next line a comma is not needed when ASReml knows how many values to read reserved words used in specifying the linear model Table 6 1 are case sensitive they need to be typed exactly as defined they may not be abbreviated a qualifier is a particular letter sequence beginning with an which sets an option or changes some aspect of ASReml some qualifiers require arguments qualifiers must appear on the correct line qualifier identifiers are not case sensitive qualifier identifiers may be truncated to 3 characters 5 Command file Reading the data 47 5 3 Title line The first 40 characters of the first nonblank NIN Alliance Trial 1989 line in an ASReml command file are taken as TOTEL IS a title for the job Use this to identify the pn a analysis for future reference 5 4 Specifying and reading the data Important Typically a data record consists of all the inf
319. ormation pertaining to an experi mental unit plot animal assessment We distinguish the fields which exist in the data file and are read into ASReml from the fields that are saved in ASReml and are available for analysis They coincide only if no transformations are per formed Similarly we sometimes discard some records so that there are fewer records available for analysis than appear in the data file The data fields to be saved for analysis are defined immediately after the job title The definitions control how each field in the data file is handled as it is read into ASReml ASReml deduces how many of them are read from the data file from the associated transformation information override with the READ qualifier described in Table 5 5 No more than 1000 variables may be read or formed Data field definitions NIN Alliance Trial 1989 variety A e should be given for all fields in the data file id fields on the end of a data line without pia a field definition are ignored if there are Taw not enough data fields on a data line the T i remainder are taken from the next line s yield e must be presented in the order in which ye they appear in the data file Ta column 11 nin89aug asd skip 1 e can appear with other definitions on the yield mu variety same line e must be indented one or more spaces Data fields can be transformed see below transformation qualifiers should be listed after the
320. ortion of Regression Screen output ISCREEN 3 SMX 3 Source Model terms Gamma Component Comp SE idsize 92 92 0 581102 0 136683 3 31 expt idsize 828 28 0 121231 0 285153E 01 1 12 Variance 504 438 1 00000 0 235214 12 70 Analysis of Variance NumDF DenDF_con F_inc F_con M 113 mu 1 72 4 65452 25 56223 68 2 expt 6 37 6 5 27 0 64 A 4 type 4 63 8 22 95 3 01 A 114 expt type 10 TAA 1 31 0 93 B 23 x20 1 55 1 4 33 2 37 B 24 x21 iL 63 3 1291 0 87 B 25 x23 l 68 3 23 93 0 11 B 26 x39 il TOL 1 85 0 35 B 27 x48 1 69 9 1 58 2 08 B 28 x59 1 49 7 1 41 0 08 The qualifier was 0 024 0 508 0 130 0 355 0 745 oO 556 0 154 0 779 13 Description of output files 193 29 30 31 32 33 34 35 36 37 38 39 129 115 127 128 x60 x61 x62 x64 x65 x66 x70 x71 xTS x75 x91 errrere e erer ah 59 64 61 55 57 58 59 64 59 59 63 Notice The DenDF values are calculated ignoring fixed boundary singular woo o o o oA 2 18 31 48 4 72 1 13 ae 0 08 1 79 0 04 1 44 variance parameters using empirical derivatives mv_estimates idsize expt idsize at expt 6 type idsize meth at expt 7 type idsize meth 9 92 828 10 effects effects effects effects effects LINE REGRESSION RESIDUAL ADJUSTED FACTORS INCLUDED NO DF SUMSQUARES DF MEANSQU R SQUARED R SQUARED 39 38 37 36 35 34 1 3 0 1113D 02 452 0 2460 0 09098 0 08495 2 ft 2 Oo SE FORK 2 3 0 1
321. our simple illustra tive example above with a full factorial linear model given symbolically by yr 1 A B AB then A and B are said to be marginal to A B and 1 is marginal to A and B Ina three way factorial model given by y 14 A4 B4 C AB A C4 B C4 A B C 2 Some theory 22 the terms A B C A B A C and B C are marginal to A B C Nelder 1977 1994 argues that meaningful and interesting tests for terms in such models can only be conducted for those tests which respect marginality relations This philos ophy underpins the following description of the second Wald statistic available in ASReml the so called conditional Wald statistic This method is invoked by placing FCON on the datafile line ASReml attempts to construct conditional Wald statistics for each term in the fixed dense linear model so that marginality relations are respected As a simple example for the three way factorial model the conditional Wald statistics would be computed as Term Sums of Squares M code 1 R 1 A R A 1 B C B C R 1 A B C B C R 1 B C B C A B R B 1 A C A C R 1 A B C A C R 1 A C A C A C R C 1 A B A B R 1 A B C A B R 1 A B A B A A B R A B 1 A B C A C B C R 1 A B C A B A C B C R 1 A B C A C B C B A C R A C 1 A B C A B B C R 1 A B C A B A C B C R 1 A B C A B B C B B C R B C 1 A B C A B A C R 1 A B C A B A C B C R 1 A B C A B A C B A B C R A B C 1 A B C A B A C B C R 1 A B C A B A C B C A
322. parameter may only be included in constraints once 7 Command file Specifying variance structures 139 e the P terms refer to positions in the full variance parameter vector This may change if the model is changed and is often difficult to determine as a number If it refers to a parameter which is a single traditional variance component associated with a random term the name of the random term may be given instead of the parameter number The full parameter vector includes a term for each factor in the model and then a term for each parameter defined in the R and G structures A list of P numbers and their initial values is returned in the res file to help you to check the numbers Alternatively examine the asr file from an initial run with VCC included but no arguments supplied The job will terminate but ASReml will provide the FP values associated with each variance component Otherwise the numbers are given in the res file The following are examples ASReml code action BTA Al parameter 7 is a tenth of parameter 5 5 F parameter 7 is the negative of parameter 5 32 34 35 37 38 39 for a 4 x 4 US matrix given by parameters 31 40 the covariances are forced to be equal units uni check parameter associated with model term uni check has the same magnitude but opposite sign to the pa rameter associated with model term units 7 10 Model building using the CONTINUE qualifier difficult In complex mo
323. pear as both male and female parents for example in forestry 1A white paper downloadable from http www vsni co uk resources doc contains de tails of these options 9 Command file Genetic analysis 152 9 5 Genetic groups If all individuals belong to one genetic group then use 0 as the identity of the parents of base individuals However if base individuals belong to various genetic groups this is indicated by the GROUP qualifier and the pedigree file must begin by identifying these groups All base individuals should have group identifiers as parents In this case the identity 0 will only appear on the group identity lines as in the following example where three sire lines are fitted as genetic groups Genetic group example Gi 0 0 animal P G2 0 0 sire 9 A G3 0 0 dam SIRE_1 G1 G1 lines 2 SIRE_2 Gi G1 damage SIRE_3 G1 G1 adailygain SIRE_4 G2 G2 harveyg ped ALPHA MAKE GROUP 3 SIRE_5 G2 G2 harvey dat SIRE_6 G3 G3 adailygain mu SIRE_7 G3 G3 Ir animal 02 5 GU SIRE_8 G3 G3 SIRE_9 G3 G3 101 SIRE_1 G1 102 SIRE_1 G1 103 SIRE_1 G1 163 SIRE_9 G3 164 SIRE_9 G3 165 SIRE_9 G3 Important It is usually appropriate to allocate a genetic group identifier where the parent is unknown Table 9 1 List of pedigree file qualifiers qualifier description ALPHA indicates that the identities are alphanumeric with up to 20 characters otherwise by default they are numeric whole numbers lt 200 00
324. persion Biometrika 82 81 91 Breslow N E and Clayton D G 1993 Approximate inference in generalized linear mixed models Journal of the American Statistical Association 88 9 25 Browne W and Draper D 2004 A comparison of bayesian and likelihood based methods for fitting multilevel models Research Report 04 01 Not tingham Statistics Research Report 04 01 Butler D G Cullis B R Gilmour A R and Gogel B J 2007 Analysis of mixed models for S language environments ASReml R reference manual Technical report Queensland Department of Primary Industries Callens M and Croux C 2005 Performance of likelihood based estimation methods for multilevel binary regression models Technical report Dept of Applied Economics Katholieke Universiteit Leuven Cox D R and Hinkley D V 1974 Theoretical Statistics London Chapman and Hall Cox D R and Snell E J 1981 Applied Statistics Principles and Examples London Chapman and Hall Cressie N A C 1991 Statistics for spatial data New York John Wiley and Sons Inc Cullis B R and Gleeson A C 1991 Spatial analysis of field experiments an extension to two dimensions Biometrics 47 1449 1460 306 Bibliography 307 Cullis B R Gleeson A C Lill W J Fisher J A and Read B J 1989 A new procedure for the analysis of early generation variety trials Applied Statistics 38 361 375 Cullis B R Go
325. ponents from this analysis are given in column b of table 15 8 There is no significant variance heterogeneity at the residual or tmt run level This indicates that the square root transformation of the data has successfully stabilised the error variance There is however significant variance heterogeneity for tmt variety interactions with the variance being much greater for the control group This reflects the fact that in the absence of bloodworms the potential maximum root area is greater Note that the tmt variety interaction variance for the treated group is negative The negative component is meaningful 15 Examples 278 and in fact necessary and obtained by use of the GU option in this context since it should be considered as part of the variance structure for the combined variety main effects and treatment by variety interactions That is o 0 o var 12 u1 u2 2 I 15 OF ate e s4 a Using the estimates from table 15 8 this structure is estimated as 3 84 2 33 2 33 1 96 8 Iu Thus the variance of the variety effects in the control group also known as the genetic variance for this group is 3 84 The genetic variance for the treated group is much lower 1 96 The genetic correlation is 2 33 v 3 84 1 96 0 85 which is strong supporting earlier indications of the dependence between the treated and control root area Figure 15 8 A multivariate approach In this simple case in which the varianc
326. presents an indicative AOV decomposition for this experiment Table 15 2 Rat data AOV decomposition stratum decomposition type df or ne constant 1 F 1 dams dose F 2 littersize F 1 dam R 27 dams pups sex F 1 dose sex F 2 error R The dose and littersize effects are tested against the residual dam variation while the remaining effects are tested against the residual within litter variation The ASReml input to achieve this analysis is presented below Rats example dose 3 A sex 2 A littersize dam 27 pup 18 weight rats asd DOPATH 1 Change DOPATH argument to select each PATH IPATH 1 weight mu littersize dose sex dose sex r dam PATH 2 weight mu out 66 littersize dose sex dose sex r dam PATH 3 weight mu littersize dose sex r dam PATH 4 weight mu littersize dose sex 15 Examples 248 The input file contains an example of the use of the DOPATH qualifier Its ar gument specifies which part to execute We will discuss the models in the two parts It also includes the FCON qualifier to request conditional F statistics Abbreviated output from part 1 is presented below 1 LogL 74 2174 52 0 2 LogL 79 1579 S2 0 3 LogL 83 9408 S22 Q 4 LogL 86 8093 S2 0 5 LogL 87 2249 522 0 6 LogL 87 2398 S2 0 7 LogL 87 2398 S2 0 8 LogL 87 2398 52 0 Final parameter values 19670 18751 17755 16903 16594 16532 16530 16530 315 315 315 315 315 315 315
327. r 3 before attempting to code their first job It presents an overview of basic ASReml coding demonstrated on a real data example Chapter 15 presents a range of examples to assist users further When coding you first job look for an example to use as a template Data file preparation is described in Chapter 4 while Chapter 5 describes how to input data into ASReml Chapters 6 and 7 are key chapters which present the syntax for specifying the linear model and the variance models for the random effects in the linear mixed model Variance modelling is a complex aspect of analysis We introduce variance modelling in ASReml by example in Chapter 7 Chapters 8 and 9 describe special commands for multivariate and genetic analyses respectively Chapter 10 deals with prediction of fixed and random effects from the linear mixed model and Chapter 11 presents the syntax for forming functions of variance components Chapter 12 discusses the operating system level command for running an ASReml job Chapter 13 gives a detailed explanation of the output files Chapter 14 gives an overview of the error messages generated in ASReml and some guidance as to their probable cause and discussion list The ASReml help accessable through ASReml W can also be accessed directly ASRem1 chm Supported users of ASReml may email support asreml co uk for assistance When requesting help for a job that is not working as you expect please send the input command file t
328. r may overestimate the size for large ALPHA and INTEGER coded factors so that ASReml reserves enough space for the list Using PRUNE will mean the extra undefined levels will not appear in the sln file Since it is sometimes necessary that factors not be pruned in this way for example in pedigree GIV factors pruning is only done if requested Reordering the factor levels SORT declared after A or I on a field definition line will cause ASReml to sort the levels so that labels occur in alphabetic numeric order for the analysis By default ASReml orders factor levels in the order they appear in the data so that for example the user cannot tell whether SEX will be coded 1 Male 2 Female or 1 Female 2 Male without looking at the data file to see whether Male or Female appears first in the SEX field With the SORT qualifier the coding will be 1 Female 2 Male regardless of which appears first in the file SORTALL means that the levels for the current and subsequent factors are to be sorted Skipping input fields ISKIP f will skip f data fields BEFORE reading this field It is particularly useful in large files with alphabetic fields which are not needed as it saves ASReml the time required to classify the alphabetic labels For example Sire I skip 1 would skip the field before the field which is read as Sire 5 5 Transforming the data Transformation is the process of modifying the data for example dividing all of the
329. r so that you can see the es timates Otherwise check that the dependent values are what you intend and then identify which variables explain it Again the BLUP 1 qualifier might help A program limit has been breached Try sim plifying the model use WORKSPACE option to increase the workspace allocation It may be possible to revise the models to increase sparsity 14 Error messages 237 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Out of memory forming design Overflow structure table Pedigree coding errors Pedigree factor has wrong size Pedigree too big POWER model setup error POWER Model Unique points disagree with size PROGRAM failed in PROGRAMMING error reading SELF option factors are probably not declared properly Check the number of levels Possibly use the WORKSPACE command line option occurs when space allocated for the structure table is exceeded There is room for three structures for each model term for wich G structures are explicitly declared The error might occur when ASReml needs to construct rows of the table for structured terms when the user has not formally declared the struc tures Increasing g on the variance header line for the number of G structures see page 117 will increase the space allocated for the table You will need to add extra explicit declara tions also check the pedigr
330. rait units Or at f n at f fac v fac v y lin f spl v k the constant term or intercept a term to estimate missing values multivariate counterpart to mu forms a factor with a level for each experimental unit placed between labels to specify an interaction forms nested expansion Section 6 5 forms factorial expansion Section 6 5 placed before model terms to ex clude them from the model placed at the end of a line to in dicate that the model specification continues on the next line treated as a space placed around some model terms when it is important the terms not be reordered Section 6 4 condition on level n of factor f n may be a list of values forms conditioning covariables for all levels of factor f forms a factor from v with a level for each unique value in v forms a factor with a level for each combination of values in v and y forms a variable from the factor f with values equal to 1 n cor responding to level 1 level n of the factor forms the design matrix for the ran dom component of a cubic spline for variable v vA J ae S S S lt L So Ke Se US o amp S 6 Command file Specifying the terms in the mixed model 86 Summary of reserved words operators and functions model term brief description common usage fixed random other functions and t 7r c f cos v r ge f giv f n gt f h f
331. rdered levels within traits Lastly we assume that the residual variance matrix is given by Ye Q 17043 Table 15 15 presents the sequence variance models fitted to each of the four random terms sire dam litter and error in the ASReml job Multivariate Sire amp Dam model tag sire 92 II dam 3561 I grp 49 sex brr 4 litter 4871 age wwt mO ywt mO MO identifies missing values giw mO fdm mO fat mO coop fmt DOPATH 1 CONTINUE MAXIT 20 PATH 3 EXTRA 4 PATH wwt ywt gfw fdm fat Trait Tr age Tr brr Tr sex Tr age sex 1 o r Tr sire tf at Tr 1 dam at Tr 2 dam at Tr 3 dam 1J if akiTr 1 it ati 2 lat atr 3 lit atma lit i at Trait 1 age grp 0024 at Trait 2 age grp 0019 at Trait 4 age grp 0020 at Trait 5 age grp 00026 at Trait 1 sex grp 93 at Trait 2 sex grp 16 0 at Trait 3 sex grp 28 at Trait 5 sex grp 1 18 lf Tr grp 23 1 R structure with 2 dimensions and 3 G structures 00 Independent across animals Tr Q US General structure across traits 15 Examples 296 15 0 Tr sire 2 PATH 1 Tr O DIAG 0 608 1 298 0 015 0 197 0 035 PATH 2 Tr O FA1 GP 055 02 5 201 01 O21 0 608 1 298 0 015 0 197 0 035 PATH 3 Tr 0 US 0 6199 0 6939 1 602 Asreml will estimate some starting values Sire effects Initial analysis ignoring genetic correlations Specified diagonal variance structure Initial sire variances Factor Analytic mode
332. re is not scaled identity by syn tax described in detail in Chapter 7 an initial value of its variance ratio may be followed by a GP keep positive the default GU unrestricted or GF fixed qualifier see Table 7 4 use and to group model terms that may not be reordered Normally ASReml will reorder the model terms in the sparse equations putting smaller terms first to speed up calculations However the order must be preserved if the user defines a structure for a term which also covers the following term s a way of defining a covariance structure across model terms Grouping is specifically required if the model terms are of differing sizes number of effects For example for traits weaning weight and yearling weight an animal model with maternal weaning weight should specify model terms Trait animal at Trait 1 dam when fitting a genetic covariance between the direct and maternal effects 6 Command file Specifying the terms in the mixed model 90 6 5 Interactions and conditional factors Interactions 3 interactions are formed by joining two or more terms with a or a for example a b is the interaction of factors a and b interaction levels are arranged with the levels of the second factor nested within the levels of the first labels of factors including interactions are restricted to 31 characters of which only the first 20 are ever displayed Thus for interaction t
333. re lines that a user would normally be required to work out and type into the as file see the example of Section 15 6 are written to the res file The user may then cut and paste them into the as file for a later run if the structures need to be modified Basic multi environment trial analysis site 5 sites coded 1 5 column columns coded 1 row rows coded 1 variety A variety names yield met dat SECTION site ROWFAC row COLFAC col yield site r variety site variety f mv site 2 0 variance header line asreml inserts the 10 lines required to define the R structure lines for the five sites here 5 Command file Reading the data 71 New List of occasionally used job control qualifiers qualifier action SPLINE spl v n p ISTEP r SUBSET t v p defines a spline model term with an explicit set of knot points The basic form of the spline model term spl v is defined in Table 6 1 where vis the underlying variate The basic form uses the unique data values as the knot points The extended form is spl v n which uses n knot points Use this SPLINE qualifier to supply an explicit set of n knot points p for the model term t Using the extended form without using this qualifier results in n equally spaced knot points being used The SPLINE qualifier may only be used on a line by itself after the datafile line and before the model line When knot points are explicitly supplied th
334. rent term is absorbed For example variety nitrogen initially has 12 degrees of freedom non singular ef fects mu takes 1 variety then takes 2 linNitr takes 1 nitrogen takes 2 variety linNitr takes 2 and there are four degrees of freedom left This infor mation is used to make sure that the conditional F statistic does not contradict marginality principles The next table indicates the details of the conditional F statistic The conditional F statistic is based in the reduction in Sums of Squares from dropping the par ticular term indicated by from the model also including the terms indicated by I C and c The next two tables based on incremental and conditional sums of squares report the model term the number of effects in the term the numerator degrees of freedom the F statistic an adjusted F statistic multiplied by a scaling constant reported in the next column and finally the computed denominator degrees of freedom The scaling constant is discussed by Kenward and Roger 1997 Table showing the reduction in the numerator degrees of freedom for each term as higher terms are absorbed Model Term 6 5 43 2 1 1 mu 12 3 4 28 1 2 variety 1 3 3 1 2 3 LinNitr 23 s f 4 nitrogen g 2 2 5 variety LinNitr 6 2 6 variety nitrogen 4 Marginality pattern for F con calculation Model terms Model Term DF 12 3 4 5 6 1 mu dg g 2 variety 2 I 3 LinNitr io I foe g 4 nitrogen 2 III 5 variety LinNitr 2 i t
335. response variable is used with multivariate data to fit the individual trait means It is formally equivalent to mu but Trait is a more natural label for use with multivariate data It is interacted with other factors to estimate their effects for all traits creates a factor with a level for every record in the data file This is used to fit the nugget variance when a correlation structure is applied to the residual creates a factor with a new level whenever there is a level present for the factor f Levels effects are not created if the level of factor fis 0 missing or negative The size may be set in the third argument by setting the second argument to zero 6 Command file Specifying the terms in the mixed model 96 Alphabetic list of model functions and descriptions model function action uni f k n creates a factor with a level for every record subject to the factor level of f equalling k i e a new level is created for the factor whenever a new record is encountered whose integer truncated data value from data field fis k Thus uni site 2 would be used to create an independent error term for site 2 in a multi environment trial and is equivalent to at site 2 units The default size of this model term is the number of data records The user may specify a lower number as the third argument There is little computational penalty from the default but the s1n file may be substantially larger than needed xfa f k
336. rgument action examples SEED SET SETN SETU SUB SEQ UNIFORM vlist vlist sets the seed for the random number generator for vlist a list of n values the data values 1 are replaced by the cor responding element from vlist data values that are lt 1 or gt n are re placed by zero vlist may run over several lines provided each incomplete line ends with a comma i e a comma is used as a continuation symbol see Other examples below SETN v n replaces data values 1 n with normal random variables having variance v Data values outside the range 1 n are set to 0 replaces data values 1 n with uniform random variables having range 0 v Data values outside the range 1 n are set to 0 replaces data values v with their in dex i where vlist is a vector of n values Data values not found in vlist are set to 0 vlist may run over several lines if necessary provided each incomplete line ends with a comma ASReml allows for a small rounding error when match ing It may not distinguish properly if values in vlist only differ in the sixth decimal place see Other examples be low replaces the data values with a sequen tial number starting at 1 which incre ments whenever the data value changes between successive records the current field is presumed to define a factor and the number of levels in the new factor is set to the number of levels identified in this sequent
337. rial was an S1 early stage wheat variety evaluation trial and consisted of 525 test lines which were randomly assigned to plots in a 67 by 10 array There was a check plot variety every 6 plots within each column That is the check variety was sown on rows 1 7 13 67 of each column This variety was numbered 526 A further 6 replicated commercially available varieties numbered 527 to 532 were also randomly assigned to plots with between 3 to 5 plots of each The aim of these trials is to identify and retain the top say 20 of lines for further testing Cullis et al 1989 considered the analysis of early generation variety trials and presented a one dimensional spatial analysis which was an extension of the approach developed by Gleeson and Cullis 1987 The test line effects are assumed random while the check variety effects are consid ered fixed This may not be sensible or justifiable for most trials and can lead to inconsistent comparisons between check varieties and test lines Given the large amount of replication afforded to check varieties there will be very little shrinkage irrespective of the realised heritability We consider an initial analysis with spatial correlation in one direction and fitting the variety effects check replicated and unreplicated lines as random We present three further spatial models for comparison The ASReml input file is Tullibigeal trial linenum yield weed column 10 row 67 variety 532 testlin
338. ributions for the RE ML estimates derived from asymptotic results It can be shown that the approximate variance matrix for the REML estimates is given by the inverse of the expected information matrix Cox and Hinkley 1974 section 4 8 Since this matrix is not available in ASReml we replace the expected information matrix by the Al matrix Furthermore the REML estimates are con sistent and asymptotically normal though in small samples this approximation appears to be unreliable see later A general method for comparing the fit of nested models fitted by REML is the REML likelihood ratio test or REMLRT The REMLRT is only valid if the fixed effects are the same for both models In ASReml this requires not only the same fixed effects model but also the same parameterisation If lro is the REML log likelihood of the more general model and g is the REML log likelihood of the restricted model that is the REML log likelihood under the null hypothesis then the REMLRT is given by D 2log lr2 lr1 2 log r2 log r1 2 14 which is strictly positive If r is the number of parameters estimated in model i then the asymptotic distribution of the REMLRT under the restricted model 2 1s Xrg ry The REMLRT is implicitly two sided and must be adjusted when the test involves an hypothesis with the parameter on the boundary of the parameter space In fact theoretically it can be shown that for a single variance compo
339. rmat of an ASReml data file is to have the data arranged in columns fields with a single line for each sampling unit The columns contain variates and covariates numeric factors alphanumeric traits response vari ables and weight variables in any order that is convenient to the user The data file may be free format fixed format or a binary file Free format data files The data are read free format SPACE COMMA or TAB separated unless the file name has extension bin for real binary or db1 for double precision binary see 4 Data file preparation 43 New New below Important points to note are as follows files prepared in Excel must be converted to comma or tab delimited form blank lines are ignored column headings field labels or comments may be present at the top of the file provided that the skip qualifier Table 5 2 is used to skip over them NA and are treated as coding for missing values in free format data files if missing values are coded with a unique data value for example 0 or 9 use M to flag them as missing or D to drop the data record containing them see Table 5 1 comma delimited files whose file name ends in csv or for which the CSV qualifier is set recognise empty fields as missing values a line beginning with a comma implies a preceding missing value consecutive commas imply a missing value a line ending with a comma implies a trailing missing value if
340. rmed as an extra process in the final iteration and are reported to the pvs file Consequently aborting a run by creating the ABORTASR NOW file see page 65 will cause any predict statements to be ignored By default factors are predicted at each level simple covariates are predicted at their overall mean and covariates used as a basis for splines or orthogonal 10 Tabulation of the data and prediction from the model 161 polynomials are predicted at their design points Covariates grouped into a single term using G qualifier page 49 are treated as covariates Model terms mv and units are always ignored Prediction at particular values of a covariate or particular levels of a factor is achieved by listing the values after the variate factor name Where there is a sequence of values use the notation a b n to represent the sequence of values from a to n with step size b a The default stepsize is 1 in which case b may be omitted A colon may replace the ellipsis An increasing sequence is assumed When giving particular values for factors the default is to use the coded level 1 n rather than the label alphabetical or integer To use the label precede it with a quote The second step is to specify the averaging set The default averaging set is those explanatory variables involved in fixed effect model terms that are not in the classifying set By default variables that only define random model terms are ig
341. roach for the analysis of generalized linear mixed models Statistica Neerlandica 48 1 1 22 Fischer T M Gilmour A R and van der Werf J 2004 Computing approx imate standard errors for genetic parameters derived from random regres sion models fitted by average information reml Genetics Selection Evolution 36 3 363 369 Gilmour A R Anderson R D and Rae A L 1985 The analysis of binomial data by a generalised linear mixed model Biometrika 72 593 599 Bibliography 308 Gilmour A R Cullis B R and Verbyla A P 1997 Accounting for natural and extraneous variation in the analysis of field experiments Journal of Agricultural Biological and Environmental Statistics 2 269 293 Gilmour A R Cullis B R Welham S J Gogel B J and Thompson R 2004 An efficient computing strategy for prediction in mixed linear mod els Computational Statistics and Data Analysis 44 571 586 Gilmour A R Thompson R and Cullis B R 1995 AI an efficient algorithm for REML estimation in linear mixed models Biometrics 51 1440 1450 Gleeson A C and Cullis B R 1987 Residual maximum likelihood REML estimation of a neighbour model for field experiments Biometrics 43 277 288 Gogel B J 1997 Spatial analysis of multi environment variety trials PhD thesis Department of Statistics University of Adelaide South Australia Goldstein H and Rasbash J 1996 Improved approxim
342. ror variance and the parameter corresponding to the variance model speci fied see 3a on page 111 ASReml does not have an implicit scale parameter for G structures that are defined explicitly For this reason the model supplied when the G structure involves just one variance model must be a variance model an initial value must be supplied for this associated scale parameter this is discussed under additional_initial_values on page 119 when the G structure involves more than one variance model one must be either a homogeneous or a heterogeneous variance model and the rest should be correlation models if more than one are non correlation models then the IGF qualifier should be used to avoid identifiability problems that is ASReml trying to estimate both parameters when they are confounded 7 8 G structures involving more than one random term The usual case is that a variance structure applies to a particular term in the linear model and that there is no covariance between model terms Sometimes it is appropriate to include a covariance Then it is essential that the model terms be listed together and that the variance structure defined for the first term be the structure required for both terms When the terms are of different size the terms must be linked together with the and qualifiers Table 6 1 While ASReml 7 Command file Specifying variance structures 136 will check the overall size it does not check that the order
343. rtxlab specifies that vertical annotation be used on the x axis default is horizontal 10 Tabulation of the data and prediction from the model 167 Caution List of predict plot options option action abbrdlab n specifies that the labels used for the data be abbreviated to n characters abbrxlab n specifies that the labels used for the x axis annotation be appre viated to n characters abbrslab n specifies that the labels used for superimposed factors be abbre viated to n characters Complicated weighting Generally when forming a prediction table it is necessary to average over or ignore some potential dimensions of the prediction table By default ASReml uses equal weights 1 f for a factor with f levels More complicated weighting is achieved by using the AVERAGE qualifier to set specific unequal weights for each level of a factor However sometimes the weights to be used need to be defined with respect to two or more factors The simplest case is when there are missing cells and weighting is equal for those cells in a multiway table that are present achieved by using the PRESENT qualifier This is now further generalized by allowing the user to fill in the weights for use by the PRESENT machinery via the PRWTS qualifier The user specifies the factors in the table of weights with the PRESENT statement and then gives the table of weights using the PRWTS qualifier There may only be one
344. s a general or unstructured variance matrix Direct products in G structures Likewise the random terms in u in the model may have a direct product variance structure For example for a field trial with s sites g varieties and the effects ordered varieties within sites the model term site variety may have the variance structure UI where amp is the variance matrix for sites This would imply that the varieties are independent random effects within each site have different variances at each site and are correlated across sites Important Whenever a random term is formed as the interaction of two factors you should consider whether the IID assumption is sufficient or if a direct product structure might be more appropriate 2 Some theory 9 Variance structures for the errors R structures The vector e will in some situations be a series of vectors indexed by a fac tor or factors The convention we adopt is to refer to these as sections Thus e e e3 e4 and the ej represent the errors of sections of the data For ex ample these sections may represent different experiments in a multi environment trial MET or different trials in a meta analysis It is assumed that R is the direct sum of s matrices R j 1 s that is R 0 0 0 0 R 0 0 R O34Rj i oc 2 i 0 0 Ry 0 0 Oo sz 0 Rs so that each section has its own variance structure which is assumed to be inde pendent of the structures in other sect
345. s a sep long arable autoregressive correlation structure for row 22 residual or plot errors that is the direct prod column 11 i 1 i I uct of an autoregressive correlation matrix of 742892 as skip 1 DISPLAY 15 order 22 for rows and an autoregressive corre yield mu variety f mv lation matrix of order 11 for columns In this predict variety case 0 5 is the starting correlation for both t 2 row row AR1 0 5 columns and rows column column AR1 0 5 13 3 Key output files The key ASReml output files are the asr sln and yht files The asr file This file contains e a general announcements box outlined in asterisks containing current mes sages e a summary of the data to validate the specification of the model e asummary of the fitting process to check convergence e asummary of the variance parameters The Gamma column reports the actual parameter fitted the Component column reports the gamma converted to a variance scale if appropriate 13 Description of output files 191 version amp title date workspace notices data summary Comp SE is the ratio of the component relative to the square root of the diagonal element of the inverse of the average information matrix Warning Comp SE should not be used for formal testing The shows the percentage change in the parameter at the last iteration use the pin file described Chapter 11 to calculate meaningfu
346. s file type to eps is used to set a grouping variable for plotting see X 5 Command file Reading the data 69 List of occasionally used job control qualifiers action qualifier IGKRIGE p New HPGL 2 JOIN IMBF mbf v n f SKIP k New MVINCLUDE MVREMOVE NODISPLAY IPS controls the expansion of PVAL lists for fac X Y model terms For kriging prediction in 2 dimensions X Y the user will typically want to predict at a grid of values not necessarily just at data combinations The values at which the prediction is required can be specified separately for X and Y using two PVAL statements Normally predict points will be defined for all combinations of X and Y values This qualifier is required with optional argument 1 to specify the lists are to be taken in parallel The lists must be the same length if to be taken in parallel Be aware that adding two dimensional prediction points is likely to substantially slow iterations because the variance structure is dense and becomes larger For this reason AS Reml will ignore the extra PVAL points unless either FINAL or GKRIGE are set to save processing time sets hardcopy graphics file type to HP GL An argument of 2 sets the hardcopy graphics file type to HP GL 2 is used to join lines in plots see X specified on a separate line after the datafile line predefines the model term mbf v n as a set of n covariates indexed by the data
347. s power 0 00 4 81 Antedependence order 1 4 14 3 96 Unstructured 1 71 4 46 15 Examples 261 15 6 Spatial analysis of a field experiment Barley In this section we illustrate the ASReml syntax for performing spatial and in complete block analysis of a field experiment There has been a large amount of interest in developing techniques for the analysis of spatial data both in the context of field experiments and geostatistical data see for example Cullis and Gleeson 1991 Cressie 1991 Gilmour et al 1997 This example illustrates the analysis of so called regular spatial data in which the data is observed on a lattice or regular grid This is typical of most small plot designed field exper iments Spatial data is often irregularly spaced either by design or because of the observational nature of the study The techniques we present in the following can be extended for the analysis of irregularly spaced spatial data though larger spatial data sets may be computationally challenging depending on the degree of irregularity or models fitted The data we consider is taken from Gilmour et al 1995 and involves a field experiment designed to compare the performance of 25 varieties of barley The experiment was conducted at Slate Hall Farm UK in 1976 and was designed as a balanced lattice square with replicates laid out as shown in Table 15 6 The data fields were Rep RowBlk ColBlk row column and yield Lattice row and
348. siduals versus fitted values see the res file and the appropriate record numbers for the out term are reported in the res file Note that i relates to the data analysed and will not be the same as the record number as obtained by counting data lines in the data file if there were missing observations in the data and they have not been estimated To drop records based on the record number in the data file use the D transformation in association with the VO transformation forms a set of orthogonal polynomials of order n based on the unique values in variate or factor v and any additional interpolated points see PPOINTS and PVAL in Table 5 4 It includes the intercept if n is positive omits it if n is negative For example pol time 2 forms a design matrix with three columns of the orthogonal polynomial of degree 2 from the variable time Alternatively pol time 2 is a term with two columns having centred and scaled linear coefficients in the first column and centred and scaled quadratic coefficients in the second column The actual values of the coefficients are written to the res file This factor could be interacted with a design factor to fit random regression models The leg function differs from the pol function in the way the quadratic and higher polynomials are calculated 6 Command file Specifying the terms in the mixed model 95 New Alphabetic list of model functions and descriptions model funct
349. sire UnStruct 1 1 14422 1 14422 1 94 0 U Tr sire UnStruct 2 0 132847 0 132847 1 88 0 U F phenvar 1 3 4 6 F addvar 4 6 4 H heritA 10 7 H heritB 12 9 R phencorr 7 8 9 R gencor 4 6 Numbering the parameters reported in bsiremod asr and bsiremod vvp error variance for ywt error covariance for ywt and fat error variance for fat sire variance component for ywt sire covariance for ywt and fat sire variance for fat OO RWN ES then 11 Functions of variance components 175 pvc file F phenvar 1 3 4 6 creates new components 7 1 4 8 2 5 and 9 3 6 F addvar 4 6 4 creates new components 10 4 x 4 11 5 x 4 and 12 6 x 4 H heritA 10 7 forms 10 7 to give the heritability for ywt H heritB 12 9 forms 12 9 to give the heritability for fat R phencorr 7 8 9 forms 8 7 x 9 that is the phenotypic correlation between ywt and fat R gencorr 4 6 forms 5 4x6 that is the genetic correlation between ywt and fat The result is 7 phenvar 1 42 75 6 297 8 phenvar 2 3 995 0 6761 9 phenvar 3 1 848 0 1183 10 addvar 4 66 10 24 58 11 addvar 5 4 577 2 354 12 addvar 6 0 5314 0 2831 h2ywt addvar 10 phenvar T 1 5465 h2fat addvar 12 phenvar g 0 2875 phencorr phenvar SQR phenvar phenvar 0 4495 gencor 2 1 Tr si 5 SQR Tr si 4 Tr si 6 0 7722 0 3574 0 1430 0 0483 0 1537 12 Command file Running the job Introduction The command line Normal run Processing a p
350. sparse terms but by the user for the dense terms They should be specified with main effects before interactions so that the ANOVA table has correct marginalization Since ASReml processes the dense terms from the bottom up the first level the last level processed is often singular The number of singularities is reported in the asr file immediately prior to the REML log likelihood LogL line for that iteration see Section 13 3 The effects and associated standard or prediction error which correspond to these singularities are zero in the sln file 6 Command file Specifying the terms in the mixed model Warning Singularities in the sparse_fixed terms of the model may change with changes in the random terms included in the model If this happens it will mean that changes in the REML log likelihood are not valid for testing the changes made to the random model This situation is not easily detected as the only evidence will be in the sln file where different fixed effects are singular A likelihood ratio test is not valid if the fixed model has changed Examples of aliassing The sequence of models in Table 6 5 are presented to facilitate an understanding of over parameterised models It is assumed that var is a factor with 4 levels trt with 3 levels and rep with 3 levels and that all var trt combinations are present in the data Table 6 5 Examples of aliassing in ASReml model number of order of fitting singulariti
351. sr file 215 14 4 Anexample 0 0 0000 rea Eaa ee 217 14 5 Information Warning and Error messages 228 Contents xiv 15 Examples 241 15 1 Introduction 2 0 0 00 00 ee 242 15 2 Split plot design Oats 2 0 00 00 000000020 we 242 15 3 Unbalanced nested design Rats 00 246 15 4 Source of variability in unbalanced data Volts 0 250 15 5 Balanced repeated measures Height 253 15 6 Spatial analysis of a field experiment Barley 261 15 7 Unreplicated early generation variety trial Wheat 267 15 8 Paired Case Control study Rice 2 20 4 272 Standard analysis 2 a a 273 A multivariate approach aaa a 00002020 278 Interpretation of results a oa aa a a a 282 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges ooa 284 15 10Multivariate animal genetics data Sheep oaoa 292 Half sib analysis 2 a 293 Animal mod le iwini a Sead e ee ee Ee a a et A 302 Bibliography 305 Index 311 List of Tables 2 1 3 1 5 1 5 2 5 3 5 4 5 5 5 6 6 1 6 2 6 3 6 4 6 5 7 1 7 2 Combination of models for G and R structures 16 Trial layout and allocation of varieties to plots in the NIN field trial 29 List of transformation qualifiers and their actions with examples 53 Qualifiers relating to da
352. structures variance models for error as specified in the variance header line e G structure header and definition lines define the G structures variance models for the additional random terms in the model as specified in the variance header line these lines are always placed after any R structure definition lines variance parameter constraints are included if parameter constraints are to be imposed see the VCC c qualifier in Table 5 5 and Section 7 9 on constraints between and within variance structures A schematic outline of the variance model specification lines variance header line and R and G structure definition lines is presented in Table 7 2 using the variance model of 4 for demonstration 7 Command file Specifying variance structures 116 Table 7 2 Schematic outline of variance model specification in ASReml general syntax model 4 variance header line s e g 121 R structure definition lines S1 C1 11 column AR1 0 3 C_2 22 row AR1 0 3 C_c S 2 C1 C_c Ss C1 C_c G structure definition lines G1 repl 1 4 0 IDV 0 1 G 2 7 Command file Specifying variance structures 117 See Table 7 3 See Section 7 7 Variance header line The variance header line is of the form s fe all NIN Alliance Trial 1989 variety A id s and c relate to the R structures gis the number of G structures row 22 the variance header line may be omitted column 11 i i i if the default IID
353. t 4 VxN 12 Var 1 4 4 Nit YA V98 YA NA O YB V99 YB NA O V98 DO replaces data values of 0 5 1 5 and 2 5 with 1 2 and 3 respectively a data value of 1 51 would be replaced by 0 since it is not in the list or very close to a number in the list in the case where there are multiple units per plot contiguous plots have different treatments and the records are sorted units within plots within blocks this code generates a plot factor assuming a new plot whenever the code in V2 variety changes no field would be read unless there were later definitions which are not created by transformation assuming Var is coded 1 3 and Nit is coded 1 4 this syntax could be used to create a new factor VxN with the 12 levels of the composite Var by Nit factor will discard records where both YA and YB have miss ing values assuming neither have zero as valid data The first line sets the focus to variable 98 copies YA into V98 and changes any missing values in V98 to zero The second line sets the focus to variable 99 copies YB into V99 and changes any missing values in V99 to zero It then adds V98 and discards the whole record if the result is zero i e both YA and YB have missing values for that record Variables 98 and 99 are not labelled and so are not retained for subsequent use in analysis Special note on covariates Covariates are variates that appear as independent variables in the model It is r
354. t Lf 6 variety nitrogen 4 I I I I I F inc tests the additional variation explained when the term is added to a model consisting of the I terms F con tests the additional variation explained when the term is added to a model consisting of the I and C c terms The terms are ignored for both F inc and F con tests 13 Description of output files 199 title line Incremental F statistics calculation of Denominator degrees of freedom Source Size NumDF fF value Lambda F Lambda DenDF mu 1 1 245 1409 245 1409 1 0000 5 0000 variety 3 2 1 4853 1 4853 1 0000 10 0000 LinNitr 1 1 110 3232 110 3232 1 0000 45 0000 nitrogen 4 2 1 3669 1 3669 1 0000 45 0000 variety LinNitr 3 2 0 4753 0 4753 1 0000 45 0000 variety nitrogen 12 4 0 2166 0 2166 1 0000 45 0000 Conditional F statistics calculation of Denominator degrees of freedom Source Size NumDF F value Lambda F Lambda DenDF mu 1 1 327 5462 327 5462 1 0000 6 0475 variety 3 2 1 4853 1 4853 1 0000 10 0000 LinNitr 1 1 120 3282 110 3232 1 0000 45 0000 nitrogen 4 2 1 3669 1 3669 1 0000 45 0000 variety LinNitr 3 2 0 4753 0 4753 1 0000 45 0000 variety nitrogen 12 4 0 2166 0 2166 1 0000 45 0000 The dpr file The dpr file contains the data and residuals from the analysis in double pre cision binary form The file is produced when the RES qualifier Table 4 3 is invoked The file could be renamed with filename extension dbl and used for input to another run of ASReml Alternatively
355. t of the linear mixed model There are three options for i i 1 suppresses computation i 1 and i 2 compute the denominator d f using numerical and algebraic methods respectively If 7 is omitted then 7 2 is assumed If DDF i is omitted i 1 is assumed except for small jobs lt 10 parameters lt 500 fixed effects lt 10 000 equations and lt 100 Mbyte workspace when i 2 Calculation of the denominator degrees of freedom is compu tationally expensive Numerical derivatives require an extra evaluation of the mixed model equations for every variance parameter Algebraic derivatives require a large dense ma trix potentially of order number of equations plus number of records and is not available when MAXIT is 1 or for multivari ate analysis is used to select particular graphic displays In spatial anal ysis of field trials four graphic displays are possible see Sec tion 13 4 Coding these 1 variogram 2 histogram 4 row and column trends 8 perspective plot of residuals set n to the sum of the codes for the desired graphics The default is 9 These graphics are only displayed in versions of ASReml linked with Winteracter that is LINUX SUN and PC versions Line printer versions of these graphics are writ ten to the res file See the G command line option Section 12 3 on graphics for how to save the graphs in a file for print ing Use NODISPLAY to suppress graphic displays sets hardcopy graphic
356. ta input and output 2 60 List of commonly used job control qualifiers 64 List of occasionally used job control qualifiers 67 List of rarely used job control qualifiers T2 List of very rarely used job control qualifiers 79 Summary of reserved words operators and functions 2 85 Alphabetic list of model functions and descriptions 2 91 Link qualifiers and functions oao aa a 97 GLM qualifiers a 2 2 02 2 97 Examples of aliassing in ASReml ooa 103 Sequence of variance structures for the NIN field trial data 114 Schematic outline of variance model specification in ASReml 116 XV List of Tables xvi 7 3 Details of the variance models available in ASReml 121 7 4 List of R and G structure qualifiers 0 133 7 5 Examples of constraining variance parameters in ASReml 137 9 1 List of pedigree file qualifiers 200 152 10 1 List of prediction qualifiers 0 163 10 2 List of predict plot options 20 4 165 12 1 Command line options 2 002 002 0000 180 12 2 The use of arguments in ASReml 0 4 185 12 3 High level qualifiers 0 0 020000 4 186 13 1 Summary of ASReml output files 189 13 2 ASReml output objects and where to find them 209 14 1 Some information messages and comments
357. tation fault Singularity appeared in AI matrix Singularity in Average Information Matrix Sorting data by Section Row Sorting the data into field order STOP SCRATCH FILE DATA STORAGE ERROR Structure Factor mismatch Too many alphanumeric factor level labels Too many factors with A or I max 100 Too many max 20 dependent variables Apparently ASReml could not open a scratch file to hold the transformed data On unix check the temp directory tmp for old large scratch files this is a Unix memory error The first thing to try is to increase the memory workspace us ing the WORKSPACE see Section 12 3 on mem ory command line option Otherwise you may need to send your data and the as files to Cus tomer Support for debugging See the discussion on AISINGULARITIES the field order coding in the spatial error model does not generate a complete grid with one observation in each cell missing values may be deleted they should be fitted Also may be due to incorrect specification of num ber of rows or columns ASReml attempts to hold the data on a scratch file Check that the disk partition where the scratch files might be written is not too full use the NOSCRATCH qualifier to avoid these scratch files the declared size of a variance structure does not match the size of the model term that it is associated with if the factor level labels are actually all inte gers use the I option instead
358. tatistics Kenward and Roger Adjustments Approximate stratum variances 2 Some theory 7 2 1 The linear mixed model Introduction If y denotes the n x 1 vector of observations the linear mixed model can be written as y XT Zut e 2 1 where T is the p x 1 vector of fixed effects X is an n x p design matrix of full column rank which associates observations with the appropriate combination of fixed effects u is the q x 1 vector of random effects Z is the n x q design matrix which associates observations with the appropriate combination of random effects and e is the n x 1 vector of residual errors The model 2 1 is called a linear mixed model or linear mixed effects model It is assumed elorol o rol 22 where the matrices G and R are functions of parameters y and respectively The parameter 0 is a variance parameter which we will refer to as the scale parameter In mixed effects models with more than one residual variance arising for example in the analysis of data with more than one section see below or variate the parameter is fixed to one In mixed effects models with a single residual variance then is equal to the residual variance o In this case R must be a correlation matrix see Table 2 1 for a discussion Direct product structures To undertake variance modelling in ASReml you need to understand the formation of variance structures via direct products The direct product of two matrices A gt P
359. ten in single precision if the argument n is 1 or 3 asrdata dbl is written in double precision if the argument nis 2 or 4 the data values are written before trans formation if the argument is 1 or 2 and after transformation if the argument is 3 or 4 The default is single precision after transformation see Section 4 2 5 Command file Reading the data T7 New New New New New List of rarely used job control qualifiers qualifier action ISCREEN n SMX m SLNFORM n I SPATIAL TABFORM n TXTFORM n performs a Regression Screen a form of all subsets regres sion For d model terms in the DENSE equations there are 2f 1 possible submodels Since for d gt 8 2f 1 is large the submodels explored are reduced by the parameters n and m so that only models with at least n default 1 terms but no more than m default 6 terms are considered The output see page 192 is a report to the asr file with a line for every submodel showing the sums of squares degrees of freedom and terms in the model There is a limit of d 20 model terms in the screen ASReml will not allow interac tions to be included in the screened terms For example to identify which three of my set of 12 covariates best explain my dependent variable given the other terms in the model specify SCREEN 3 SMX 3 The number of models evaluated quickly increases with d but ASReml has an arbitrary limit of 900 su
360. term is confounded with a fixed term and when there is no information in the data on a particular component The best solution is to reform the variance model so that the ambiguity is removed or to fix one of the parameters in the variance model so that the model can be fitted For in stance if ASUV is specified you may also need 2 1 Only rarely will it be reasonable to specify the AISINGULARITIES qualifier sets hardcopy graphics file type to bmp suppresses some of the information written to the asr file The data summary and regression coefficient estimates are suppressed This qualifier should not be used for initial runs of a job until the user has confirmed from the data summary that the data is correctly interpreted by ASReml Use BRIEF 2 to cause the predicted values to be written to the asr file instead of the pvs file Use BRIEF 1 to get BLUE fixed effect estimates reported in asr file The BRIEF qualifier may be set with the B command line option restricts ASReml to performing only part of the first iteration The estimation routine is aborted after n 1 forming the estimates of the vector of fixed and ran dom effects n 2 forming the estimates of the vector of fixed and ran dom effects REML log likelihood and residuals this is the default n 3 forming the estimates of the vector of fixed and ran dom effects REML log likelihood residuals and inverse coef ficient matrix 5 Command f
361. terms 6 6 30 30 30 30 150 125 Analysis of Variance 8 6 mu variety 0 5 ls T ils NumDF 1 24 125 125 125 125 125 125 125 Gamma 28714 93444 83725 00000 df df df df df df df DenDF 5 0 79 3 Component 4262 39 15595 1 14811 6 8061 81 F_inc 1216 29 8 84 Comp SE 0 62 3 06 3 04 6 01 ue OP OP OP oP Prob lt 001 lt 001 Finally we present portions of the pvs files to illustrate the prediction facility of ASReml The first five and last three variety means are presented for illustration The overall SED printed is the square root of the average variance of difference between the variety means The two spatial analyses have a range of SEDs which are available if the SED qualifier is used All variety comparisons have the same SED from the third analysis as the design is a balanced lattice square The F statistic statistics for the spatial models are greater than for the lattice analysis We note the F statistic for the AR1xAR1 units model is smaller than the F statistic for the AR1xAR1 15 Examples 266 Predicted values of yield AR1 x AR1 variety Predicted_Value Standard_Error Ecode 1 0000 1257 9763 64 6146 E 2 0000 1501 4483 64 9783 E 3 0000 1404 9874 64 6260 E 4 0000 1412 5674 64 9027 E 5 0000 1514 4764 65 5889 E 23 0000 1311 4888 64 0767 E 24 0000 1586 7840 64 7043 E 25 0000 1592 0204 63 5939 E SED Overall Standard Error of Difference 59 05 AR1 x AR1 u
362. terms Random terms in the model Interactions and conditional factors Interactions Conditional factors Alphabetic list of model functions Weights Missing values Missing values in the response Missing values in the explanatory variables Some technical details about model fitting in ASReml Sparse versus dense Ordering of terms in ASReml Aliassing and singularities Examples of aliassing Analysis of Variance table 82 6 Command file Specifying the terms in the mixed model 83 6 1 Introduction The linear mixed model is specified in ASReml as a series of model terms and qualifiers In this chapter the model formula syntax is described 6 2 Specifying model formulae in ASReml The linear mixed model is specified in AS NIN Alliance Trial 1989 Reml as a series of model terms and qualifiers variety Model terms include factor and variate labels Section 5 4 functions of labels special terms column 11 and interactions of these The model is speci nin89 asd skip 1 yield mu variety r repl fied immediately after the datafile and any job bE mv control qualifier and or tabulate lines The t2 syntax for specifying the model is 11 column AR1 3 22 row AR1 3 response wt weight fixed r random f sparse_fixed response is the label for the response variable s to be analysed multivariate analysis is discussed in Chapter 8 weight is a label of a variable containing weights w
363. th the large differ ence in mean sqrt rootwt for the two groups 14 93 and 8 23 for control and treated respectively The inclusion of tmt as a fixed effect ensures that BLUPs of tmt variety effects are shrunk to the correct mean treatment means rather than an overall mean The model for the data is given by y XT Ziu Zou Z3u3 Z4u4 Z5u5 e 15 7 where y is a vector of length n 264 containing the sqrt rootwt values T corresponds to a constant term and the fixed treatment contrast and u1 u5 correspond to random variety treatment by variety run treatment by run and variety by run effects The random effects and error are assumed to be indepen dent Gaussian variables with zero means and variance structures var u o Ie where b is the length of u i 1 5 and var e 07Ip The ASReml code for this analysis is Bloodworm data Dr M Stevens pair 132 rootwt run 66 tmt 2 1A id variety 44 A rice asd skip 1 DOPATH 1 IPATH i sqrt rootwt mu tmt r variety variety tmt run pair run tmt 000 PATH 2 sqrt rootwt mu tmt r variety tmt variety run pair tmt run uni tmt 2 002 tmt variety 2 2 0 DIAG 1 21 IGU 4400 tmt run 2 20 DIAG 1 1 GU 66 0 0 15 Examples 276 Table 15 8 Estimated variance components from univariate analyses of bloodworm data a Model with homogeneous variance for all terms and b Model with het erogeneous variance for interactions involving tmt a b source
364. that the aim of the conditional Wald statistic is to facilitate inference for fixed effects It is not meant to be prescriptive nor is it foolproof for every setting The Wald statistics are collectively presented in a summary table in the asr file The basic table includes the numerator degrees of freedom v1 and the incremental Wald F statistic for each term To this is added the conditional Wald F statistic and the M code if FCON is specified A conditional F statistic is not reported for mu in the asr but is in the aov file adjusted for covariates 2 Some theory 24 Kenward and Roger Adjustments In moderately sized analyses ASReml will also include the denominator degrees of freedom DenDF denoted by 12 Kenward and Roger 1997 and a probablity value if these can be computed They will be for the conditional Wald F statistic if it is reported The DDF i see page 68 qualifier can be used to suppress the DenDF calculation DDF 1 or request a particular algorithmic method DDF 1 for numerical derivatives DDF 2 for algebraic derivatives The value in the probability column either P_inc or P_con is computed from an F reference distribution An approximation is used for computational convenience when cal culating the DenDF for Conditional F statistics using numerical derivatives The DenDF reported then relates to a maximal conditional incremental model MCIM which depending on the model order may not always coincid
365. the BLUP tends to the fixed effect solution while for small ro relative to o the BLUP tends towards zero the assumed initial mean Thus 2 13 represents a weighted mean which involves the prior assumption that the u have zero mean Note also that the BLUPs in this simple case are constrained to sum to zero This is essentially because the unit vector defining X can be found by summing the columns of the Z matrix This linear dependence of the matrices translates to dependence of the BLUPs and hence constraints This aspect occurs whenever the column space of X is contained in the column space of Z The dependence is slightly more complex with correlated random effects 2 Some theory 16 2 4 Combining variance models The combination of variance models within G structures and R structures and between G structures and R structures is a difficult and important concept The underlying principle is that each R and G variance model can only have a single scaling variance parameter associated with it If there is more than one scaling variance parameter for any R or G then the variance model is overspecified or nonidentifiable Some variance models are presented in Table 2 1 to illustrate this principle While all 9 forms of model in Table 2 1 can be specified within ASReml only models of forms 1 and 2 are recommended Models 4 6 have too few variance pa rameters and are likely to cause serious estimation problems For model 3 where the s
366. the model Fault 1 R structures imply OF 242 records only 224 e Last line read was 22 column AR1 0 100000 ninerr9 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu 1 SECTIONS 242 3 1 STRUCT i ul 1 4 1 1 10 22 1 t 5 1 i 11 12 factors defined max 500 5 variance parameters max1500 2 special structures Final parameter values 0 0000 10000E 360 10000 0 10000 Last line read was 22 column AR1 0 100000 12 1 242 224 8000 Finished 27 Jul 2005 15 42 10 192 R structures imply 0 242 records only 224 exist 10 Field layout error in a spatial analysis The final common error we highlight is the misspecification of the field layout In this case we have accidently switched the levels in rows and columns However ASReml can detect this error because we have also asked it to sort the data into field order Had sorting not been requested ASReml would not have been able to detect that the lines of the data file were not sorted into the appropriate field order and spatial analysis would be wrong Folder C data ex manex variety A QUALIFIERS SKIP 1 QUALIFIER DOPART 2 is active Reading nin asd FREE FORMAT skipping 1 lines Univariate analysis of yield 14 Error messages 227 Using 242 records of 242 read Model term Size miss zero MinNonO Mean MaxNonO 1 variety 56 0 0 1 26 4545 56 10 row 22 1 11 5000 22 11 column 11 1 6 0000 11 12 mu 1 13 mv_estimates 18 11 AR AutoR
367. the name of an existing factor p is the list of contrast coefficients For example CONTRAST LinN Nitrogen 3 1 1 3 defines LinN as a contrast based on the 4 implied by the length of the list levels of factor Nitrogen Missing values in the factor become missing values in the contrast Zero values in the factor no level assigned become zeros in the contrast The user should check that the levels of the factor are in the order assumed by contrast check the ass or sln or tab files It may also be used on the implicit factor Trait in a multivariate analysis provided it implicitly identifies the number of levels of Trait the number of traits is implied by the length of the list Thus if the analysis involves 5 traits CONTRAST Time Trait 1 3 5 10 20 5 Command file Reading the data 65 List of commonly used job control qualifiers qualifier action FCON New IMAXIT n adds a conditional F statistic column to the Analysis of Vari ance table It enables inference for fixed effects in the dense part of the linear mixed model to be conducted so as to re spect both structural and intrinsic marginality see Section 2 6 The detail of exactly which terms are conditioned on is reported in the aov file The principle used in determining this conditional test is that a term cannot be adjusted for another term which encompasses it explicitly e g term A C cannot be adjusted for A B C or implicitly e g term R
368. the output from this analysis is 7 LogL 343 220 S2 1 0000 262 df 8 LogL 343 220 S2 1 0000 262 df Source Model terms Gamma Component Comp SE C Residual UnStruct 1 2 14373 2 14373 4 44 0 U Residual UnStruct 1 0 987401 0 987401 2 59 0 y Residual UnStruct 2 2 34751 2 34751 4 62 0 U Tr variety UnStruct 1 3 83959 3 83959 3 47 OU Tr variety UnStruct 1 2 33394 2 33394 3 01 0 U Tr variety UnStruct 2 1 96173 1 96173 2 69 0U Tx run UnStruct 1 1 70788 1 70788 2 62 oU Tr run UnStruct 1 0 319145 0 319145 0 59 0 U Tr run UnStruct 2 2 54326 2 54326 3 20 Q U Covariance Variance Correlation Matrix UnStructured 2 144 0 4402 0 9874 2 348 Covariance Variance Correlation Matrix UnStructured 3 840 0 8504 2 334 1 962 Covariance Variance Correlation Matrix UnStructured 1 708 0 1631 0 3191 2 543 The resultant REML log likelihood is identical to that of the heterogeneous uni variate analysis column b of table 15 8 The estimated variance parameters are given in Table 15 10 Table 15 10 Estimated variance parameters from bivariate analysis of bloodworm data control treated source variance variance covariance us trait variety 3 84 1 96 2 33 us trait run 1 71 2 54 0 32 us trait pair 2 14 2 35 0 99 15 Examples 281 The predicted variety means in the pvs file are used in the following section on interpretation of results A portion of the file is presented below There is a wide range in SED reflecting the imbalance of th
369. thout such permission Published by VSN International Ltd 5 The Waterhouse Waterhouse Street Hemel Hempstead HP1 1ES UK E mail info asreml co uk Website http www vsni co uk The correct bibliographical reference for this document is Gilmour A R Gogel B J Cullis B R and Thompson R 2006 ASReml User Guide Release 2 0 VSN International Ltd Hemel Hempstead HP1 1ES UK ISBN 1 904375 23 5 Preface ASReml is a statistical package that fits linear mixed models using Residual Max imum Likelihood REML It has been under development since 1993 and is a joint venture between the Biometrics Program of NSW Department of Primary Industries and the Biomathematics and Bioinformatics Division previously the Statistics Department of Rothamsted Research This guide relates to Release 2 of ASReml completed in December 2005 Changes in this version are indicated by the word New in the margin A separate document ASReml Update What s new in Release 2 00 is available to highlight the changes from Release 1 00 Linear mixed effects models provide a rich and flexible tool for the analysis of many data sets commonly arising in the agricultural biological medical and en vironmental sciences Typical applications include the analysis of un balanced longitudinal data repeated measures analysis the analysis of un balanced de signed experiments the analysis of multi environment trials the analysis of both univariate and
370. time points we use the EXP model The correlation function is given by plu where u is the time lag is weeks The coding for this is 15 Examples 256 yi ye yo y y10 Trait tmt Tr tmt 120 One error structure in two dimensions 14 Outer dimension 14 plants Tr 0 EXP 5 1326 7 10 Time coordinates A portion of the output is LogL 183 734 S2 435 58 60 df 1 000 0 9500 LogL 183 255 S2 370 40 60 df 1 000 0 9388 LogL 183 010 S2 321 50 60 df 1 000 0 9260 LogL 182 980 S2 298 84 60 df 1 000 0 9179 LogL 182 979 S52 302 02 60 df 1 000 0 9192 Final parameter values 1 0000 0 91397 Source Model terms Gamma Component Comp SE C Variance 70 60 1 00000 302 021 3 11 OP Residual POW EXP 5 0 918971 0 918971 29 53 OU When fitting power models be careful to ensure the scale of the defining variate here time does not result in an estimate of too close to 1 For example use of days in this example would result in an estimate for of about 993 Residuals plotted against Row and Column position 1 Range 45 11 34 86 Q ie 8 s Q 8 O i f B is 8 8 O o g o fe o o A ate RQ o Q 8 g8 a gogg P O aas EEE EEE g og Dona rs Oo 0 ie Figure 15 4 Residual plots for the EXP variance model for the plant data The residual plot from this analysis is presented in Figure 15 4 This suggests increasing variance over time This can be modelled by using the EXPH model 15 Examples 257 whic
371. ting of a cubic spline in ASReml by restricting the dataset to tree 1 only The model includes the intercept and linear regression of trunk circumference on age and an additional random term spl age 7 which instructs ASReml to include a random term with a special design matrix with 7 2 5 columns which relate to the vector 6 whose elements i 2 6 are the second differentials of the cubic spline at the knot points The second differentials of a natural cubic spline are zero at the first and last knot points Green and Silverman 1994 The ASReml job is this is the orange data for tree 1 seq record number is not used Tree 5 age 118 484 664 1004 1231 1372 1582 circ season L Spring Autumn orange asd skip 1 filter 2 select 1 SPLINE spl age 7 118 484 664 1004 1231 1372 1582 PVAL age 150 200 1500 circ mu age r spl age 7 predict age 15 Examples 286 Note that the data for tree 1 has been selected by use of the filter and select qualifiers Also note the use of PVAL so that the spline curve is properly predicted at the additional nominated points These additional data points are required for ASReml to form the design matrix to properly interpolate the cubic smoothing spline between knot points in the prediction process Since the spline knot points are specifically nominated in the SPLINE line these extra points have no effect on the analysis run time The SPLINE line does not modify the analysis in this
372. to the filename In this case it would be necessary to copy the rsv files with the new part number before running the next part to take advantage of this facility 8 Command file Multivariate analysis Introduction Repeated measures on rats Wether trial data Model specification Variance structures Specifying multivariate variance structures in ASReml The output for a multivariate analysis 141 8 Command file Multivariate analysis 142 8 1 Introduction Multivariate analysis is used here in the narrow sense of a multivariate mixed model There are many other multivariate analysis techniques which are not covered by ASReml estimating the correlations between distinct traits for example fleece weight Multivariate analysis is used when we are interested in and fibre diameter in sheep and for repeated measures of a single trait Repeated measures on rats Wolfinger 1996 summarises a range of vari ance structures that can be fitted to repeated measures data and demonstrates the models using five weights taken weekly on 27 rats sub jected to 3 treatments This command file demonstrates a multivariate analysis of the five repeated measures Note that the two di mensional structure for common error meets the requirement of independent units and is correctly ordered traits with units Wether trial data Three key traits for the Australian wool in dustry are the weight of wool grown per year the cleanness
373. to try define US structures as positive definite by using GP supply better starting values fix parameters that you are confident of while getting better estimates for others that is fix variances when estimating co variances fit a simpler model reorganise the model to reduce covariance terms for example use CORUH instead of US It is best to start with a positive definite corre lation structure Maybe use a structured cor relation matrix A variance structure should be specified for this term The error could be in the variable factor name or in the number of values or the list of values The error could be in the variable factor name or in the number of values or the list of values the error model is not correctly specified the file did not exist or was of the wrong file type binary unformatted sequential There are several messages of this form where something is what ASReml is attempting to read Either there is an error telling ASReml to read something when it does not need to or there is an error in the way something is specified 14 Error messages 233 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Error reading the data Error reading the DATA FILENAME line Error reading the model factor list Error setting constraints VCC on variance components Error setting dependent varia
374. togram of the x variable For example X age Y height G sex Note that the graphs are only produced in the graphics ver sions of ASReml Section 12 3 For multivariate repeated measures data ASReml can plot the response profiles if the first response is nominated with the Y qualifier and the following analysis is of the multi variate data ASReml assumes the response variables are in contiguous fields and are equally spaced For example Response profiles Treatment A Yi Y2 Y3 Y4 Y5 rat asd Y Y1 G Treatment JOIN Yi Y2 Y3 Y4 Y5 Trait Treatment Trait Treatment 5 Command file Reading the data 67 Table 5 4 List of occasionally used job control qualifiers qualifier action ASMV n ASUV ICOLFAC v indicates a multivariate analysis is required although the data is presented in a univariate form Multivariate Analysis is used in the narrow sense where an unstructured error variance matrix is fitted across traits records are independent and observations may be missing for particular traits see Chapter 8 for a complete discussion The data is presumed arranged in lots of n records where n is the number of traits It may be necessary to expand the data file to achieve this structure inserting a missing value NA on the additional records This option is sometimes relevant for some forms of repeated measures analysis There will need to be a factor in the data to code for trait as the in
375. trices The third section covers some special cases where the covariance structure is known except for the scale Table 7 3 Details of the variance models available in ASReml base description algebraic number of parameters identifier form corr homo s hetero s variance variance Correlation models One dimensional equally spaced ID identity Ca 1 C 0 i j 0 1 w AR 1 1 order Ca 1 Cai Q 1 2 1 w autoregressive C Co ee i gt j 1 Ip lt 1 AR2 2 order C 1 2 3 2 w autoregressive e RF 6 5 Ca PCer A 5 i gt JFL ld lt 1 45 a lt 1 AR3 3 order C 1 2 1 4 te 3 4 pw autoregressive Cas CA he by b3 Q Ci 42 5 Q d F ds T 1 2 Q C PUis F 20 F Q3Ci 3 5 gt i gt j 2 1d lt 1 65 a lt 1 s lt 1 SAR symmetric Ca 1 1 2 1 w autoregressive Caria 9 1 02 4 Ci dy Cii 4 C x25 35 i gt j l Ip lt 1 SAR2 constrained as for AR3 using 2 3 2 w autoregressive 3 by 7 2 used for 2 ay 7 competition P2 V 27 72 WW 7 Command file Specifying variance structures 122 Details of the variance models available in ASReml base description algebraic number of parameters identifier form corr homo s hetero s variance variance st MA 1 1 order C 1 2 1l w moving average Cid 0 0 C 0 7 gt 1 2 A lt 1 MA2 2 order C 1 2 3 24w moving average Cai 6 1 6 A 6 62 Citai 0
376. trinsic Trait factor is undefined when the data is presented in a univariate manner indicates that a univariate analysis is required although the data is presented in a multivariate form Specifically it allows you to have an error variance other than J amp where amp is the unstructured US see Table 7 3 variance structure If there are missing values in the data include f mv on the end of the linear model It is often also necessary to specify the S2 1 qualifier on the R structure lines The intrinsic factor Trait is defined and may be used in the model See Chapter 8 for more information This option is used for repeated measures analysis when the variance structure required is not the standard multivariate unstructured matrix is used with SECTION v and ROWFAC v to instruct ASReml to set up R structures for analysing a multi environment trial with a separable first order autoregressive model for each site environment vis the name of a factor or variate containing column numbers 1 ne where ne is the number of columns on which the data is to be sorted See SECTION for more detail Command file Reading the data 68 New List of occasionally used job control qualifiers qualifier action DDF i DISPLAY n EPS requests computation of the approximate denominator degrees of freedom according to Kenward and Roger 1997 for the Analysis of Variance of fixed effects terms in the dense par
377. ts see SPLINE even if they do not appear to adequately cover the data values prevents the automatic reversal of the order of the fixed terms in the dense equations and possible reordering of terms in the sparse equations forces ASReml to hold the data in memory ASReml will usu ally hold the data on a scratch file rather than in memory In large jobs the system area where scratch files are held may not be large enough A Unix system may put this file in the tmp directory which may not have enough space to hold it affects the number of distinct points recognised by the pol model function Table 6 1 The default value of n is 1000 so that points closer than 0 1 of the range are regarded as the same point influences the number of points used when predicting splines and polynomials The design matrix generated by the leg Q pol and sp1 functions are modified to include extra rows that are accessed by the PREDICT directive The default value of n is 21 if there is no PPOINTS qualifier The range of the data is divided by n 1 to give a step size i For each point p in the list a predict point is inserted at p i if there is no data value in the interval p p 1 1x i PPOINTS is ignored if PVAL is specified for the variable This process also effects the number of levels identified by the fac model term forces ASReml to attempt to produce the standard output re port when there is a failure of the iteration algorithm
378. ty 0 0 000 12 factors defined max 500 O variance parameters max 900 2 special structures Last line read was LANCER 1 NA NA 1 4 NA 4 31 21 1 12 0 0 o 8000 Finished 28 Jul 2005 09 51 12 817 Missing faulty SKIP or A needed for variety 4 A missing comma and 5 A misspelt factor name in linear model The model has been written over two lines but nin alliance trial ASReml does not realise this because the first variety A line does not end with a comma The missing comma causes the fault repl 4 R header SECTIONS DIMNS GSTRUCT j nin89 asd skip 1 as ASReml tries to interpret the second line of yield mu variety the model see Last line read as the vari 7 Repl ance header line The asr file is displayed 14 Error messages 221 below Note that the data has now been suc cessfully read as indicated by the data summary You should always check the data summary to ensure that the correct number of records have been detected and the data values match the names appropriately ASReml 1 99a 01 Aug 2005 nin alliance trial Build d 27 Jul 2005 32 bit 28 Jul 2005 09 53 15 553 64 00 Mbyte Windows ninerr4 Licensed to Arthur Gilmour Folder C data ex manex variety A QUALIFIERS SKIP 1 QUALIFIER DOPART 1 is active Reading nin asd FREE FORMAT skipping 1 lines Univariate analysis of yield Using 224 records of 242 read Model term Size miss zero MinNon0O Mean MaxNonO 1 variety 56 0 0 1 28
379. uidance as to their probable cause The guide concludes with the most extensive chapter which presents the examples Briefly the improvements in Release 2 00 include more robust variance parame ter updating so that Convergence Failure is less likely extensions to the syntax inclusion of the Mat rn correlation model ability to plot predicted values im provements to the Analysis of Variance procedures improvements to the handling of pedigrees and some increases in computational speed The data sets and ASReml input files used in this guide are available from http www vsni co uk products asreml as well as in the examples direc tory of the distribution CD ROM They remain the property of the authors or of the original source but may be freely distributed provided the source is acknowl edged The authors would appreciate feedback and suggestions for improvements to the program and this guide Preface Proceeds from the licensing of ASReml are used to support continued develop ment to implement new developments in the application of linear mixed models The developmental version is available to supported licensees via a website upon request to VSN Most users will not need to access the developmental version unless they are actively involved in testing a new development Acknowledgements We gratefully acknowledge the Grains Research and Development Corporation of Australia for their financial support for our research since 1988
380. unction described below takes the coding of factor f as a covariate The function is defined for f being a simple factor Trait and units The lin f function does not centre or scale the variable Motivation Sometimes you may wish to fit a covariate as a random factor as well If the coding is say 1 n then you should define the field as a factor in the field definition and use the lin function to include it as a covariate in the model Do not centre the field in this case If the covariate values are irregular you would leave the field as a covariate and use the fac function to derive a factor version forms the natural log of v r This may also be used to transform the response variable creates a first differenced by rows design matrix which when defining a random effect is equivalent to fitting a moving average variance structure in one dimension In the mai form the first difference operator is coded across all data points assuming they are in time space order Otherwise the coding is based on the codes in the field indicated is used to fit the intercept constant term It is normally present and listed first in the model It should be present in the model if there are no other fixed factors or if all fixed terms are covariates or contrasts except in the special case of regression through the origin 6 Command file Specifying the terms in the mixed model 94 New Alphabetic list of model functions and descriptio
381. ur MYOWNGDG program and the gdg file Maybe increase WORKSPACE Try increasing the workspace or simplifying the model 14 Error messages 234 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy FORMAT error reading factor Definitions G structure header Factor order G structure ORDER O MODEL GAMMAS G structure size does not match Gamma Constraint READ error Getting Pedigree GLM Bounds failure Increase declared levels for factor Increase workspace Insufficient data read from file Insufficient points for Likely causes are bad syntax or invalid characters in the vari able labels variable labels must not include any of these symbols and the data file name is misspelt there are too many variables declared or there is no valid value supplied with an arithmetic transformation option there is a problem reading G structure header line The line must contain the name of a term in the linear model spelt exactly as it appears in the model a G structure line cannot be interpreted The size of the structure defined does not agree with the model term that it is associated with Check the syntax an error occurred processing the pedigree The pedigree file must be ascii free format with ANIMAL SIRE and DAM as the first three fields ASReml failed to calculate the GLM working variables or
382. urce Model terms Gamma Component Comp SE C Tree 5 5 4 79025 30 4420 1 24 OP Tree age 5 5 0 939436E 04 0 597011E 03 1 41 O P spl age 7 5 5 100 513 638 759 1 55 0 P spl age 7 Tree 25 25 1 11728 7 10033 1 44 O P Variance 35 33 1 00000 6 35500 1 74 uF Analysis of Variance NumDF DenDF F_inc Prob 7 mu 4 4 0 47 04 0 002 3 age 1 4 0 95 00 lt 001 A quick look suggests this is fine until we look at the predicted curves in Fig ure 15 14 The fit is unacceptable because the spline has picked up too much curvature and suggests that there may be systematic non smooth variation at the overall level This can be formally examined by including the fac age term as a random effect This increased the log likelihood 3 71 P lt 0 05 with the spl age 7 smoothing constants heading to the boundary There is a possible explanation in the season factor When this is added Model 3 it has an F ratio of 107 5 P lt 0 01 while the fac age term goes to the boundry Notice that the inclusion of the fixed term season in models 3 to 6 means that comparisons with models 1 and 2 on the basis of the log likelihood are not valid The spring 15 Examples 290 21 Predicted values of circ ee 3S 118 Figure 15 14 Plot of fitted cubic smoothing spline for model 1 x 1582 measurements are lower than the autumn measurements so growth is slower in winter Models 4 and 5 successively examined each term indicating that both smooth
383. urrent license details 12 Command file Running the job 181 Prompt for arguments A A ASK makes it easier to specify command line options in Windows Explorer One of the options available when right clicking a as file invokes ASReml with this option ASReml then prompts for the options and arguments allowing these to be set interactively at run time With ASK on the top job control line it is assumed that no other qualifiers are set on the line For example a response of H22r 1 23 would be equivalent to ASReml h22r basename 1 2 3 Output control B J B b BRIEF b suppresses some of the information written to the asr file The data summary and regression coefficient estimates are suppressed by the options B B1 or B2 This option should not be used for initial runs of a job before you have confirmed by checking the data summary that ASReml has read the data as you intended Use B2 to also have the predicted values written to the asr file instead of the pvs file Use B 1 to get BLUE estimates reported in asr file J JOIN is used in association with the CYCLE qualifier to put the output from a set of runs into single files see CYCLE list JOIN on page 186 Debug command line options D E D and E DEBUG DEBUG 2 invoke debug mode and increase the information written to the screen or asl file This information is not useful to most users On Unix systems if ASReml is crashing use the system script
384. v described in Section 9 6 6 Command file Specifying the terms in the mixed model 93 Alphabetic list of model functions and descriptions model function action h f New ide f i f inv v 7r leg v n lin f 1 f log vu r mat mal f mu h f requests ASReml to fit the model term for factor f using Helmert constraints Neither Sum to zero nor Helmert constraints generate interpretable effects if singularities occur ASReml runs more efficiently if no constraints are applied Following is an exam ple of Helmert and sum to zero covariables for a factor with 5 levels H1 H2 H3 H4 C1 C2 C3 C4 Fl 1 1 1 1 1 0 0 0 F2 1 1 1 1 0 1 0 0 F3 0 2 1 1 0 0 1 0 F4 0 0 3 1 0 0 0 1 F5 0 0 0 4 1 1 1 1 is used to take a copy of a pedigree factor f and fit it without the ge netic relationship covariance This facilitates fitting a second animal ef fect Thus to form a direct maternal genetic and maternal environment model the maternal environment is defined as a second animal effect coded the same as dams viz r animal dam ide dam forms the reciprocal of v r This may also be used to transform the response variable forms n 1 Legendre polynomials of order 0 intercept 1 linear n from the values in v the intercept polynomial is omitted if n is preceded by the negative sign The actual values of the coefficients are written to the res file This is similar to the pol f
385. values not allowed here Multiple trait mapping problem Negative Sum of Squares NFACT out of range No giv file for No residual variation Out of Out of memory ASReml failed to read the first data record Maybe it is a heading line which should be skipped by using the SKIP qualifier or maybe the field is an alphanumeric field but has not been declared so with the A qualifier You need to identify which design terms con tain missing values and decide whether delete the records containing the missing values in these variables or if it is reasonable to treat the missing values as zero by using MVINCLUDE More missing values in the response were found than expected missing observations have been dropped so that multivariate structure is messed up Maybe a trait name is repeated This is typically caused by negative variance parameters try changing the starting values or using the STEP option If the problem oc curs after several iterations it is likely that the variance components are very small Try sim plifying the model In multivariate analyses it arises if the error variance is becomes nega tive definite Try specifying GP on the struc ture line for the error variance too many terms are being defined Fix the argument to giv after fitting the model the residual variation is essentially zero that is the model fully ex plains the data If this is intended use the BLUP 1 qualifie
386. variety 4 Ir Repl 5 001 Repl 1 6 2 0 IDV 01 Fs part 2 yield mu variety 9 12 11 row AR1 1 10 22 col ARI 1 lpart predict voriety 8 14 Error messages 218 8 misspelt variable label in predict statement 9 mv omitted from spatial model 10 wrong levels declared in R structure model lines 1 Data file not found Running this job produces the asr file in Sec nin alliance trial tion 14 1 The first problem is that ASReml cannot find the data file nine asd as indi nine asd slip 1 cated in the error message above the Fault Yield mu variety line ASReml reports the last line read before the job was terminated an error message FORMAT error reading data structures and other information obtained to that point In this case the program only made it to the data file definition line in the command file Since nine asd commences in column 1 ASReml checks for a file of this name in the working directory since no path is supplied Since ASReml did not find the data file it tried to interpret the line as a variable definition but is not permitted in a variable label The problem is either that the filename is misspelt or a pathname is required In this case the data file was given as nine asd rather than nin asd 2 An unrecognised qualifier and 3 An incorrectly defined factor After supplying the correct pathname and re running the job ASReml produces the warning message W
387. variety plotted against estimate for control 282 15 11 Estimated difference between control and treated for each variety plotted against estimate for control 0 4 283 15 12 Trellis plot of trunk circumference for each tree 2 285 15 13 Fitted cubic smoothing spline for treel 202 287 15 14 Plot of fitted cubic smoothing spline for model 1 2 290 15 15 Trellis plot of trunk circumference for each tree at sample dates 15 16 adjusted for season effects with fitted profiles across time and confidence intervals 00000000 pee eee 291 Plot of the residuals from the nonlinear model of Pinheiro and Bates 292 Introduction What ASReml can do Installation User Interface How to use the guide Help and discussion list Typographic conventions 1 Introduction 2 1 1 What ASReml can do 1 2 ASReml pronounced A Rem el is used to fit linear mixed models to quite large data sets with complex variance models It extends the range of variance models available for the analysis of experimental data ASReml has application in the analysis of e un balanced longitudinal data e repeated measures data multivariate analysis of variance and spline type mod els e un balanced designed experiments e multi environment trials and meta analysis univariate and multivariate animal breeding and genetics data involving a relationship matrix for correlated effects
388. veral modified schemes discussed by Cullis et al 2004 particularly relevant when the AI update is consis tently outside the parameter space These include optionally performing extra local EM or PXEM Parameter Extended EM iterates These can dramatically reduce the number of iterates required to find a solution near the boundary of the parameter space but do not always work well when there are several matrices on the boundary The options are EMFLAG 1 Standard EM plus 10 local EM steps EMFLAG 2 Standard EM plus 10 local PXEM steps PXEM 2 Standard EM plus 10 local PXEM steps EMFLAG 3 Standard EM plus 10 local EM steps EMFLAG 4 Standard EM plus 10 local EM steps EMFLAG 5 Standard EM only EMFLAG 6 Single local PXEM EMFLAG 7 Standard EM plus 1 local EM step EMFLAG 8 Standard EM plus 10 local EM steps Options 3 and 4 cause all US structures to be updated by PX EM if any particular one requires EM updates The test of whether the AI updated matrix is positive defini tite is based on absorbing the matrix to check all pivots are positive Repeated EM updates may bring the matrix closer to being singular This is assessed by dividing the pivot of the first element with the first diagonal element of the ma trix If it is less than 1077 this value is consistent with the multiple partial correlation of the first variable with the rest being greater than 0 9999999 ASReml fixes the matrix at that point and estimates any oth
389. without an argument no extra shrinkage is allowed Otherwise shrinkage is allowed in the first i iterations prints the portion of the inverse of the coefficient matrix per taining to the n term in the linear model Because the model has not been defined when ASReml reads this line it is up to the user to count the terms in the model to iden tify the portion of the inverse of the coefficient matrix to be printed The option is ignored if the portion is not wholly in the SPARSE stored equations The portion of the inverse is printed to a file with extension cii The sparse form of the matrix only is printed in the form i j C that is elements of C that were not needed in the estimation process are not included in the file 5 Command file Reading the data 80 List of very rarely used job control qualifiers qualifier action FACPOINTS n KNOTS n NOCHECK NOREORDER NOSCRATCH POLPOINTS n PPOINTS n REPORT affects the number of distinct points recognised by the fac model function Table 6 1 The default value of n is 1000 so that points closer than 0 1 of the range are regarded as the same point changes the default knot points used when fitting a spline to data with more than n different values of the spline variable When there are more than n default 50 points ASReml will default to using n equally spaced knot points forces ASReml to use any explicitly set spline knot poin
390. ws within columns in this case Important It is assumed that the joint indexing of the components uniquely defines the experimental units if field is a variable it can be plot coordinates provided the plots are in a regular grid Thus in this example 11 lat AR1 0 3 22 long AR1 0 3 is valid because lat gives column position and long gives row position and the positions are on a regular grid The autoregressive correlation values will still be on an plot index basis 1 2 3 not on a distance basis 10m 20m 30m if the data is sorted appropriately for the order the models are specified set field to 0 7 Command file Specifying variance structures 119 model specifies the variance model for the term for example 22 row AR1 0 3 chooses a first order autoregressive model for the row error process all the variance models available in ASReml are listed in Table 7 3 these models have associated variance parameters a error variance component o2 for the example see Section 7 3 is auto matically estimated for each section the default model is ID initial_values are initial or starting values for the variance parameters and must be supplied for example 22 row ARI 0 3 chooses an autoregressive model for the row error process see Table 7 1 with a starting value of 0 3 for the row correlation qualifiers tell ASReml to modify the variance model in some way the qualifiers are described in Tab
391. y r repl predict variety OGI repl 1 repl O IDV 0 1 structures models is presented in Table 7 3 Since IDV is the default variance structure for random effects the same analysis would be performed if these lines were omitted 3 5 Running the job See Chapter 12 An ASReml job is often run from a command line The basic command to run an ASReml job is normally asreml basename as where basename as is the name of the command file For example the com 3 A guided tour 34 Make a habit of giving command files the as ex tension New mand to run nin89 as is asreml nin89 as However if the path to ASReml is not specified in your system s PATH envi ronment variable a path to this program must also be given In this case the command to run the ASReml job in a PC Windows environment is path asreml basename as where path is typically C Program Files ASRem12 bin pointing to the location of the ASReml program In this guide we assume the command file has a filename extension as ASReml also recognises the filename extension asc as an ASReml command file When these are used the extension as or asc may be omitted from basename as in the command line if there is no file in the working directory with the name basename The options and arguments that can be supplied on the command line to modify a job at run time are described in Chapter 12 Forming a job template Notice that the data fil
392. y detected One possibility is to centre and scale covariates involved in interac tions so that their standard deviation is close to 1 Table 14 3 Alphabetical cause s remedies of error messages and probable error message probable cause remedy PRINT Cannot open output file AINV GIV matrix undefined or wrong size ASReml command file is EMPTY ASReml failed in Check filename Check the size of the factor associated with the AINV GIV structure The job file should be in ASCII format Try running the job with increased workspace or using a simpler model Otherwise send the job to VSN mailto support asreml co uk for investigation 14 Error messages 232 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Continue from rsv file Convergence failed Correlation structure is not positive definite Define structure for Error in CONTRAST label factor values Error in SUBSET label factor values Error in R structure model checks Error opening file Error reading something Try running without the CONTINUE qualifier the program did not proceed to convergence because the REML log likelihood was fluctuat ing wildly One possible reason is that some singular terms in the model are not being de tected consistently Otherwise the updated G structures are not positive definite There are some things
393. ymptotic s e 294 15 14 Wald tests of the fixed effects for each trait for the genetic example 294 15 15 Variance models fitted for each part of the ASReml job in the analysis of the genetic example 2 2 020 20200004 297 List of Figures 5 1 Variogram in 4 sectors for Cashmore data 79 Residual versus Fitted values 2 2 2 ee ee ee ee 195 Variogram of residuals 2 a 204 Plot of residuals in field plan order 205 Plot of the marginal means of the residuals 206 Histogram of residuals 2 2 206 Residual plot for the rat data 2 2 0 2 200 008 249 Residual plot for the voltage data 252 Trellis plot of the height for each of 14 plants 2 253 Residual plots for the EXP variance model for the plant data 256 Sample variogram of the residuals from the AR1xAR1 model for the Slate Halldata 2 2 264 Sample variogram of the residuals from the AR1xAR1 model for the Tullibigeal data o aa a 270 Sample variogram of the residuals from the AR1xAR1 pol column 1 model for the Tullibigeal data o o aaa 270 xviii List of Figures xix 15 8 Rice bloodworm data Plot of square root of root weight for treated Versus CONTO hou oe woe ee ey A Agee ee ee Ea et e 274 15 9 BLUPs for treated for each variety plotted against BLUPs for control 281 15 10 Estimated deviations from regression of treated on control for each

ASReml User Guide - VSN International

Contents

Download Pdf Manuals

Related Search

Related Contents