Home

ASReml User Guide - VSN International

image

Contents

1. nitrogen block variety 0 0cwt 0 2cwt O 4cwt 0 6cwt GR 111 130 157 174 l M 117 114 161 141 V 105 140 118 156 GR 61 91 97 100 Il M 70 108 126 149 V 96 124 121 144 GR 68 64 112 86 Il M 60 102 89 96 V 89 129 132 124 GR 74 89 81 122 IV M 64 103 132 133 V 70 89 104 117 GR 62 90 100 116 V M 80 82 94 126 V 63 70 109 99 GR 53 74 118 113 VI M 89 82 86 104 V 97 99 119 121 268 15 2 Split plot design Oats A standard analysis of these data recognises the two basic elements inherent in the ex periment These are firstly the stratification of the experiment units that is the blocks whole plots and sub plots and secondly the treatment structure that is superimposed on the experimental material The latter is of prime interest in the presence of stratifica tion Thus the aim of the analysis is to examine the importance of the treatment effects while accounting for the stratification and restricted randomisation of the treatments to the experimental units The ASReml input file is presented below split plot example blocks 6 Coded 1 6 in first data field of oats asd nitrogen A 4 Coded alphabetically subplots Coded 1 4 variety A 3 Coded alphabetically wplots Coded 1 3 yield oats asd SKIP 2 yield mu variety nitrogen variety nitrogen r idv blocks idv blocks wplots residual idv units predict nitrogen Print table of predicted nitrogen means predict variety
2. 1582 Figure 15 14 Plot of fitted cubic smoothing spline for model 1 A quick look suggests this is fine until we look at the predicted curves in Figure 15 14 The fit is unacceptable because the spline has picked up too much curvature and suggests that there may be systematic non smooth variation at the overall level This can be formally examined by including the fac age term as a random effect This increased the log likelihood 3 71 P lt 0 05 with the spl age 7 smoothing constants heading to the boundary There is a possible explanation in the season factor When this is added Model 3 it has an F ratio of 107 5 P lt 0 01 while the fac age term goes to the boundry Notice that the inclusion of the fixed term season in models 3 to 6 means that comparisons with models 1 and 2 on the basis of the log likelihood are not valid The spring measurements are lower than the autumn measurements so growth is slower in winter Models 4 and 5 successively examined each term indicating that both smoothing constants are significant P lt 0 05 Lastly we add the covariance parameter between the intercept and slope for each tree in model 6 This ensures that the covariance model will be translation invariant A portion of the output file for model 6 is 6 LogL 87 5371 S2 5 9488 32 df 7 LogL 87 4342 82 gt 5 6885 32 d 8 Logl 87 4291 S2 5 6434 32 df 9 LogL 87 4291 S2 5 6412 32 df 314 15 9 Balanced longitudinal data Random coe
3. Pa HE UNIVERSITY GRD CMM OF ADELAIDE Gag SAG AUSTRALIA Ran iwa F Development Corporation sn ASReml User Guide Release 4 1 Functional Specification A R Gilmour VSN International Hemel Hempstead United Kingdom B J Gogel University of Adelaide Australia B R Cullis Universtiy of Wollongong Australia S J Welham VSN International Hemel Hempstead United Kingdom R Thompson Rothamsted Research Harpenden United Kingdom April 8 2015 ASReml User Guide Release 4 1 Functional Specification ASReml is a statistical package that fits linear mixed models using Residual Maximum Like lihood REML It was a joint venture between the Biometrics Program of NSW Department of Primary Industries and the Biomathematics Unit of Rothamsted Research Statisticians in Britain and Australia have collaborated in its development Main authors A R Gilmour B J Gogel B R Cullis S J Welham and R Thompson Other contributors D Butler M Cherry D Collins G Dutkowski S A Harding K Haskard A Kelly S G Nielsen A Smith A P Verbyla and I M S White Author email addresses arthur gilmour Cargovale com au beverley gogel adelaide edu au bcullisQuow edu au sue welham vsni co uk robin thompson rothamsted ac uk Copyright Notice Copyright 2014 VSN International All rights reserved Except as permitted under the Copyright Act 1968 Commonwealth of Australia no part of the
4. e if the correlation structure ar1i column ari row was specified ASReml would auto matically add a common variance see Section 7 4 e ASReml would report an error if the consolidated model term ariv column ariv row was specified as this would correspond to var e 02 Y pc 8 02 U pr and o2 and o2 are unidentifiable that is it is not possible to estimate them separately see Section 7 4 120 7 5 A sequence of variance structures for the NIN data 3c Two dimensional separable autoregressive spatial model with mea surement error This model extends 3b by adding a random units Win Alliance Trial 1989 term Thus variety A var ur 071 var un 07I and id var e 02 B p p The reserved word units tells ASReml to construct an additional random term row 22 with one level for each experimental unit so that a column 11 ET second independent error term can be fitted A creed ieee j units term is fitted in the model in cases like this idv repi Gat as ET where a variance structure is applied to the errors An residual ar1v column ar1 row IDV variance structure is specified for units to model o I The units term is sometimes fitted in spatial models for field trial data to allow for a nugget effect The model now has two terms at the plot experimental unit level that is a correlated structure defined as an R structure and an uncorrelated structure defined in the G structure
5. j l n 145 The sample variogram reported by ASReml has two forms depending on whether the spatial coordinates represent a complete rectangular lattice as typical of a field trial or not In the lattice case the sample variogram is calculated from the triple lij1 lij2 vij where lij1 Si Sji and lij2 Si2 Sj2 are the displacements As there will be many v i with the same displacements ASReml calculates the means for each displacement pair l j1 lij2 either ignoring the signs default or separately for same sign and opposite sign TWOWAY after grouping the larger displacements 9 10 11 14 15 20 The result is displayed as a perspective plot see page 235 of the one or two surfaces indexed by absolute displacement group In this case the two directions may be on different scales Otherwise ASReml forms a variogram based on polar coordinates It calculates the distance between points dij I 2 and an angle 0 180 lt 4 lt 180 subtended by the line from 0 0 to lij lij2 with the x axis The angle can be calculated as 6 tan71 1ij1 lij2 choosing 0 lt 6 lt 180 if J 2 gt 0 and 180 lt 0 lt 0 if lij2 lt 0 Note that the variogram has angular symmetry in that vi Vi dj dj and 6 0 180 The variogram presented averages the v within 12 distance classes and 4 6 or 8 sectors selected using a VGSECTORS qualifier centred on an angle of 1 180 s i 1
6. qualifier action SLOW n TOLERANCE s1 s2 VRB reduces the update step sizes of the variance parameters more persistently than the STEP r qualifier If specified ASReml looks at the potential size of the updates and if any are large it reduces the size of r If n is greater than 10 ASReml also modifies the Information matrix by multiplying the diagonal elements by n This has the effect of further reducing the updates In the iteration subroutine if the calculated LogL is more than 1 0 less than the LogL for the previous iteration and SLOW is set and NIT gt 1 ASReml immediately moves the variance parameters back towards the previous values and restarts the iteration modifies the ability of ASReml to detect singularities in the mixed model equations This is intended for use on the rare occasions when ASReml detects singularities after the first iteration they are not expected Normally when no TOLERANCE qualifier is specified a singularity is declared if the adjusted sum of squares of a covariable is less than a small constant 7 or less than the uncorrected sum of squares x7 where 17 is 1078 in the first iteration and 10 thereafter The qualifier scales 7 by 10 for the the first or subsequent iterations respectively so that it is more likely an equation will be declared singular Once a singularity is detected the corresponding equation is dropped forced to be zero in subsequent iterations If ne
7. 0 0 0 Gp 0 0 G OG 2 iso 0 0 Gya 0 0 QO 0 Gy where is the direct sum operator each G is of size q and q J qi The default assumption is that each random model term generates one component of this direct sum then b b and var u G for i 1 b This means that the random effects from any two distinct model terms are uncorrelated However in some models one component of G may apply across several model terms for example in random coefficient regression where the random intercepts and slopes for subjects are correlated To accommo date these cases one component of G may apply across several model terms then b lt b In some other less likely but possible cases we may wish to separate one model term over several independent parts then b gt b see Section 7 2 1 Example 2 2 Variance components mixed models Building example 2 1 to a linear mixed model with more than one b gt 1 random effect typically known as a variance components mixed model the random effects wu in u and the residual errors e are assumed pairwise uncorrelated and to each be normally distributed with mean zero and variance given by var u o Iy 2 1 The general linear mixed model and var e o7I where I and I are identity matrices of dimension q and n respectively In this case b var y X o ZZ 02Iy 2 5 i 1 2 1 4 Partitioning the residual error term As for the fixed and random
8. 3 4 2 The first text non blank non control line in an ASReml command file is taken as the title for the job and is purely descriptive for future reference The title line 3 4 3 Reading the data The data fields are defined before the data file name is specified Field definitions must be given for all fields in the data file and in the order in which they appear in the data file Note that in previous releases data field definitions had to be indented but in Release 4 this condition has been relaxed and is not required In this case there are 11 data fields variety column in nin89 asd see Sec tion 3 3 The A after variety tells ASReml that the first field is an alphanumeric factor and the 4 after repl tells ASReml that the field called NIN Alliance trial 1989 variety A id NIN Alliance trial 1989 variety A id pid raw repl 4 nloc yield lat long row 22 column i1 nin89 asd skip 1 f repl the fifth field read is a numeric factor with 4 levels coded 1 4 Similarly for row and column The other fields include variates yield and various other variables 3 4 4 The data file name is specified immediately NIN alliance trial 1989 variety A after the last data field definition Data file ia qualifiers that relate to data input and out pid put are also placed on this line if they are required In this example skip 1 tells AS Reml to ignore skip the first
9. EXTRA n FOWN modifies the algorithm used for choosing the order for solving the mixed model equations A new algorithm devised for release 2 is now the default and is formally selected by EQORDER 3 The algorithm used for release 1 is essentially that selected by EQORDER 1 The new order is generally superior EQORDER 1 instructs ASReml to process the equations in the order they are specified in the model Generally this will make a job much slower if it can run at all It is useful if the model has a suitable order as in the IBD model Y m r giv id id giv id invokes a dense inverse of an IBD matrix and id has a sparse structured inverse of an additive relationship matrix While EQORDER 3 generates a more sparse solution EQORDER 1 runs faster forces another mod n 10 rounds of iteration after apparent convergence The default for n is 1 This qualifier has lower priority than MAXIT and ABORTASR NOW see MAXIT for details Convergence is judged by changes in the REML log likelihood value and variance parameters However sometimes the variance parameter con vergence criteria has not been satisfied allows the user to specify the test reported in the F con column of the Wald F Statistics table It has the form FOWN terms to test background terms placed on a separate line immediately after the model line Multiple FOWN statements should appear together It generates a Wald F statistic for each model term in
10. Warning Dropped records were not evenly distributed across Warning Eigen analysis check of US matrix skipped WARNING Extra lines on the end of the input file This is to reduce the number of knot points used in fitting a spline data values should be positive usually means the variance model is overparameterized Look up AISING the structures are probably at the boundary of the param eter space either use MVINCLUDE or delete the records it is better to avoid negative weights unless you can check ASReml is doing the correct thing with them check the data summary has the correct number of records and all variables have valid data values If ASReml does not find sufficient values on a data line it continues reading from the next line You have probably mis specified the number of levels in the factor or omitted the I qualifier see Section 5 4 on data field definition syntax ASReml corrects the number of lev els the term did not appear in the model the term did not appear in the model terms like units and mv cannot be included in prediction RECODE may be needed when using a pedigree and reading data from a binary file that was not prepared with ASReml suggest drop the term and refit the model IMVREMOVE has been used to delete records which have a missing value in design variables This has resulted in mul tivariate data no longer having an n x t n subjects with t traits each structure
11. Table 15 8 Estimated variance components from univariate analyses of bloodworm data a Model with homogeneous variance for all terms and b Model with heterogeneous variance for interactions involving tmt a b source control treated variety 2 378 2 334 tmt variety 0 492 1 505 0 372 run 0 321 0 319 tmt run 1 748 1 388 2 223 variety run pair 0 976 0 987 tmt pair 1 315 1 156 1 359 REML log likelihood 345 256 343 22 The estimated variance components from this analysis are given in column a of table 15 8 The variance component for the variety main effects is large There is evidence of tmt variety interactions so we may expect some discrimination between varieties in terms of tolerance to bloodworms Given the large difference p lt 0 001 between tmt means we may wish to allow for hetero geneity of variance associated with tmt Thus we fit a separate variety variance for each level of tmt so that instead of assuming var u2 03Igg we assume 2 var u2 T 2 Q I where c2 and o3 are the tmt variety interaction variances for control and treated respec tively This model can be achieved using a diagonal variance structure for the treatment part of the interaction We also fit a separate run variance for each level of tmt and heterogeneity at the residual level by including the uni tmt 2 term We have chosen level 2 of tmt as we expect more variation for the exposed treatment and thus the extra varianc
12. cov uw COV Uy Uy Uy a Q en Qo Q e Q ad m A A control BLUP exposed BLUP o T T T 2 1 o 1 N w control BLUP Figure 15 11 Estimated difference between control and treated for each variety plotted against estimate for control The independence of and u and dependence between 6 and wu is clearly illustrated in Figures 15 10 and 15 11 In this example the two measures have provided very different rankings of the varieties The choice of tolerance measure depends on the aim of the experi ment In this experiment the aim was to identify tolerance which is independent of inherent vigour so the deviations from regression measure is preferred 308 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges We now illustrate the use of random coefficients and cubic smoothing splines for the analysis of balanced longitudinal data The implementation of cubic smoothing splines in ASReml was originally based on the mixed model formulation presented by Verbyla et al 1999 More recently the technology has been enhanced so that the user can specify knot points in the original approach the knot points were taken to be the ordered set of unique values of the explanatory variable The specification of knot points is particularly useful if the number of unique valu
13. e labels for the data fields in the data file and the name of the data file e the linear mixed model and the variance model s if required e output options including directives for tabulation and prediction Below is the ASReml command file for an RCB analysis of the NIN field trial data highlighting the main sections Note the order of the main sections title line gt data field definition gt NIN Alliance trial 1989 variety A id pid raw repl 4 nloc yield lat long row 22 data field definition gt data file name and qualifiers gt tabulate statement gt column 11 nin89 asd skip 1 tabulate yield variety linear mixed model definition gt residualvariance model specification predict statement gt yield mu variety r idv repl residual idv units predict variety 3 4 1 ASReml can generate a basic command file a template for you to modify from the data file if the data file has suitable field variable names in the first line The requirements are Generating a template e the data file have file name extension asd csv dat or txt e there is not a matching command file already existing e the first line of the file contains a name for each field e the name must begin with a letter it may contain numbers and the underscore character but not any of the characters 7 amp lt gt 0 QO e the name may be terminated with P
14. for a model factor various qualifiers are required depending on the form of the factor coding where n is the number of levels of the factor and s is a list of labels or the name of a file containing the labels one per row to be assigned to the levels Or nN is used when the data field has values 1 directly coding for the factor unless the levels are to be labelled see L Row 1 12 for example is used when the data field is numeric with values 7 and labels are to be assigned to the n levels for example Sex L Male Female is required if the data field is alphanumeric for example Location A names Specify n if there are more than 1000 classes over all class factor variables indicating the expected number for this factor 47 5 4 Specifying and reading the data A L is used if the data field is alphanumeric and must be coded in a particular s order to set the order of the levels For example SNP A L C C C T T T defines the levels over riding the default data dependent order If there are many labels they may be written over several lines by using a trailing comma to indicate continuation of the list New R4 Alternatively the labels may be listed in a file If the filename includes embedded blanks or has no file extension it must be enclosed in quotes Genotype A L MyNames txt Genotype A L My Names txt Genotype A L MyNames Use a SKIP qualifier after the filename to skip any heading li
15. structure is used this may be used to obtain starting values for another run of ASReml a table showing the variance components for each iteration a figure and table showing the variance partitioning for any XFA structures fitted some statistics derived from the residuals from two dimensional data multivariate re peated measures or spatial the residuals from a spatial analysis will have the units part added to them defined as the combined residual unless the data records were sorted within ASReml in which case the units and the correlated residuals are in different orders data file order and field order respectively the residuals are printed in the yht file but the statistics in the res file are calculated from the combined residual the Covariance Variance Correlation C V C matrix calculated directly from the residuals it contains the covariance below the diagonals the variances on the diagonal and the correlations above the diagonal The fitted matrix is the same as is reported in the asr file and if the Logl has converged is the one you would report The BLUPs matrix is calculated from the BLUPs and is provided so it can be used as starting values when a simple initial model has been used and you are wanting to attempt to fit a full unstructured matrix For computational reasons it pertains to the parameters and so may differ from the parameter values generated by the last iteration The BLUPs matrix may look quit
16. to fit a slide specific regression of signal on background In this example signal is a multivariate set of 93 variates and background is a set of 93 covariates The signal values relate to either the Red or Green channels So for each slide and channel we need to fit a simple regression of signal mu background But the data for the 93 slides is presented in parallel If it were presented in series with a factor slide indexing the slides the equivalent model would be signal slide slide background 6 7 Weights Weighted analyses are achieved by using WT wezght as a qualifier to the response variable An example of this isy WI wt mu A X where y is the name of the response variable and wt is the name of a variate in the data containing weights If these are relative weights to be scaled by the units variance then this is all that is required If they are absolute weights that is the reciprocal of known variances use the GF qualifier to fix the variances in the residual model Section 7 3 When a structure is present in the residuals Section 7 3 the weights are applied as a matrix product If X is the structure and W is the diagonal matrix constructed from the square root of the values of the variate weight then R WX W Negative weights are treated as zeros 6 8 Generalized Linear Mixed Models ASReml includes facilities for fitting the family of Generalized Linear Models GLMs McCul lagh and Nelder 1994 A GLM is defi
17. var Ur _ OI OIS oi Orly orsLi0 Us Ors Oss Orslio osslio Here the set of animal intercepts has a common variance ozr and the set of animal slopes has a different common variance ogs Intercepts and or slopes from two different animals are independent but the intercept and slope from any given animal have covariance gzs or correlation o73 o7105s In this context we use integers as arguments to emphasize that the arguments are specifying the size of the variance structure For this example id 10 can be replaced by id Animal In order to simplify processing of the str arguments ASReml expects at least 1 single term in the consolidated model term to be a variance model function with a dimension rather than a variable name as the argument eg us 2 in the 113 7 3 Applying variance structures to the residual error term example Mostly this is quite natural as a suitable factor is not normally available to indicate the number of linear model terms being combined 2 in this example The dummy identity function id 1 could be introduced to allow processing if the consolidated model term could only be expressed using variable arguments for example str Sire and Dam id 1 nrm Animal This random regression model has been developed to describe the form of the str function We note that this model is equivalent to us pol age id Animal Example 7 2 Fitting a genetic covariance between direct and materna
18. 0 7210E 01 0 7940 0 4170E 01 0 8972 Wald F statistics Source of Variation NumDF F ine 19 Trait age 5 100 141 20 Trait brr 15 116 72 21 Trait sex 5 47 97 23 Trait age sex 4 4 17 29 diag TrSG123 sex grp 147 effects fitted 37 are zero 26 diag TrAG1245 age grp 196 effects fitted 69 are zero 36 Trait grp 180 effects fitted 65 singular 31 us Trait sire 460 effects fitted 20 are zero 33 xfai TrDam123 dam 10683 effects fitted 8 are zero 35 us TrLiti234 lit 19484 effects fitted 20 are zero The REML estimates of all the variance matrices except for the dam components are positive definite Heritabilities for each trait can be calculated using the VPREDICT facility of ASReml The heritability is given by P where 0 is the phenotypic variance and is given by ob 07 0 0 o recalling that 2 1 2 o 0 s 4 A 1 2 2 2 Oq 474 t Om In the half sib analysis we only use the estimate of additive genetic variance from the sire variance component ASReml then carries out the VPREDICT instructions in the asr file stores the instructions in a pin file and produces the following output in a pve file 324 15 10 Multivariate animal genetics data Sheep ASReml 4 1 01 Dec 2014 Multivariate Sire amp coopmf3 pvc created 27 Mar 2015 10 id units us Trait us Trait us Trait us Trait us Trait 2e Trait as Trait us Trait us Trait we Trait as Trait us Trait us Trait fast
19. 103 resid 67 1 2097 0 34725 104 resid 68 0 24528 0 321 1T9E 01 105 resid 69 4 5409 0 21411 106 resid 70 0 85028 0 10023 107 resid 71 2 4831 0 12849 108 resid 72 0 78609E 01 0 11170E 01 109 resid 73 0 11589 0 99338E 01 110 resid 74 1 6318 0 49595E 01 WWTh2 Direct 2 75 phen 1 60 0 1507 0 0396 YWTh2 Direct 2 77 phen 3 62 0 2991 0 0626 GFWh2 Direct 2 80 phen 6 65 0 3087 0 0717 FDMh2 Direct 3 84 phen 10 69 0 1344 CLOTS FATh2 Direct 3 89 phen 15 74 0 0785 0 0388 GenCor 2 1 us Tr 24 SQR us Tr 23 us Tr 25 0 7045 0 1024 GenCor 3 1 us Tr 26 SQR us Tr 23 us Tr 28 0 2970 0 1720 GenCor 3 2 us Tr 27 SQR us Tr 25 us Tr 28 0 0188 0 1808 GenCor 4 1 us Tr 29 SQR us Tr 23 us Tr 32 0 1947 0 3521 GenCor 4 2 us Tr 30 SQR us Tr 25 us Tr 32 0 1326 0 3249 GenCor 4 3 us Tr 31 SQR us Tr 28 us Tr 32 0 0981 0 3874 GenCor 5 1 us Tr 33 SQR us Tr 23 us Tr 37 0 2924 0 2747 GenCor 5 2 us Tr 34 SQR us Tr 25 us Tr 37 0 5913 0 2026 GenCor 5 3 us Tr 35 SQR us Tr 28 us Tr 37 0 0396 0 2687 GenCor 5 4 us Tr 36 SQR us Tr 32 us Tr 37 0 6577 0 3854 MatCor 2 1 Mater 91 SQR Mater 90 Mater 92 1 4277 0 5305 MatCor 3 1 Mater 93 SQR Mater 90 Mater 95 1 7267 1 4388 MatCor 3 2 Mater 94 SQR Mater 92 Mater 95 3 0703 2 9688 Notice The parameter estimates are followed by their approximate standard errors 15 10 2 Animal model In this section we will illustrate the use of a pedigree file to define the genetic relation
20. 15 23 it Cte BUS Ona eh ee eee Oe RY See Eee ee Sew ed 15 2 4 Inference Random effects 000 pee ee eee 16 2 4 1 Tests of hypotheses variance parameters 16 24 2 Diagnostics 2 coin eh ew eRe EER ERs EHS 17 25 Inference Fixed effects 2 424400 4 oe eka Se certi etr kres 18 go WOU lt e so a ba Se Se k k eee eRe eee a eS 18 2 5 2 Incremental and conditional Wald F Statistics 19 2 5 3 Kenward and Roger adjustments 2 0 22 2 5 4 Approximate stratum variances 1 2 2 a ee 23 A guided tour 24 3 1 WARGO s o eoi i a poe eie ee b enk on ele KOR Gie R Ee p Aee 24 3 2 Nebraska Intrastate Nursery NIN field experiment 25 3 3 The ASReml data fil lt o e e ea 646 REA GREER OR RR ERS amp 27 3 4 The ASReml command file 2 2 ee 29 3 4 1 Generating a template lt sco csoc occa certi eart eren 29 342 Tbhetitlelne ec reese deea e e ee ek ele Be 31 3 4 3 Reading the data ee ea eee ewe ee ee Ee 31 3 4 4 The data file line 2 02 2 022 202 0000 31 345 Tab latio lt s o s ga coseta d acsm e ee ee ee e L 32 3 4 6 Specifying the terms in the mixed model 32 34 7 Variance structures 2 gn ee wk coec eR REE SR eo ed 32 34 8 Prediction lt ers era radad erdee ee ee ee ES 33 3 5 Uniti the JOD o e a ogue we he ee ESE EER we RAR Oe Ee a 33 3 6 Description of gutput TES o cc s eed we me we eee Se eee Ze 34 J61 The Asril crer sscn
21. 2 0 141595 0 963017 199771 0 286984 3 64374 0 850282 2 48313 3 0 786089E 01 4 0 115894 5 1 63175 NePPWNF WwW 147 effects 1 1 01106 16 0229 0 280259 w N 196 effects 0 132755E 02 0 976533E 03 0 176684E 02 0 208076E 03 460 effects 0 593942 0 677334 1 55632 280482E 01 287861E 02 150192E 01 596227E 01 657014E 01 477561E 02 157854 407282E 01 133338 877122E 03 0 472300E 01 0 326718E 01 14244 effects 0 126746E 01 0 00000 0 661114E 02 1 46479 1 51911 0 110770 19484 effects 1 23 55275 1 53980 2 2 55497 1 0 310141E 01 2 0 450851E 01 3 1 2 3 PUNB OOGO OP WNHFPBPWNHRFPWNHYEPNFP BE I oD 2 Oo a amp 1 2 3 1 2 3 0 191030E 01 0 721026E 01 0 794020 0 417001E 01 4 0 897161 0 0 0 0 0 0 0 oS Oo oo or O l oO Oxo oo oo 2 CO 0 0 0 O Covariance Variance Correlation Matrix US Residual 9 461 0 5689 0 2355 0 1640 323 Ois 141595 963017 199771 286984 3 64374 850282 2 48313 786089E 01 115894 1 63175 1 01106 16 0229 280259 132755E 02 976533E 03 176684E 02 208076E 03 593942 677334 1 55632 280482E 01 287861E 02 150192E 01 596227E 01 657014E 01 477561E 02 157854 407282E 01 133338 877122E 03 472300E 01 326718E 01 126746E 01 0 00000 661114E 02 1 46479 1 51911 110770 3 55275 1 53980 2 55497 310141E 01 450851E 01 191030E 01 721026E
22. 404860 100269 128460 111660E 01 990547E 01 495973E 01 340424 4 56493 755415E 01 660473E 03 807052E 03 156358E 02 128442E 03 161397 212998 399056 183322E 01 287861E 01 374544E 02 110412 410000 191024E 01 857902E 01 411396E 01 673424E 01 584748E 02 1 53000 163359E 01 422487 0 00000 528891E 02 181736 208097 218051 F 01 416013 15 10 Multivariate animal genetics data Sheep 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 T2 73 74 75 76 tf 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 us TrLit1234 Damv Damv Damv Damv Damv Damv phen phen phen phen phen phen phen phen phen phen phen phen phen phen phen 15 Direct OANDARPWNHE PrRPrRPRE PWN OO 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Direct 37 Maternal Direct Direct Direct Direct Direct Direct Direct Direct Direct Direct Direct Direct Direct 54 55 56 57 58 59 Maternal Maternal Maternal Maternal Maternal resid 60 resid 61 resid 62 resid 63 id lit us TrLiti234 id lit us TrLit1234 id lit us TrLit1234 id lit us TrLit1234 id lit sus TrLiti234 id lit us TrLit1234 id lit us TrLiti234 id
23. NIN alliance trial 1989 variety A statement In this case the 56 variety means for yield as predicted from the fitted model Column 11 would be formed and returned in the pvs ningg asd skip 1 output file See Chapter 9 for a detailed dis tabulate yield variety cussion of prediction in ASReml yield mu variety r idv repl residual idv units predict variety 3 5 Running the job Assuming you have located the nin89 asd file under Windows it will typically be located in ASRemlPath Examples we suggest copying the data file to the users workspace as the Examples folder is sometimes write protected and created the ASCII command file nin89 as as described in the previous section and in the same folder you can run the job ASRemiPath is typically C Program Files ASRem14 under Windows Installation details vary with the implementation and are distributed with the program You could use ASReml W or ConText to create nin89 as These programs can then run ASReml directly after they have been configured for ASReml An ASReml job is also run from a command line or by clicking the as file in Windows Explorer The basic command to run an ASReml job is ASRemlPath bin ASRem1 basename as where basename as is the name of the command file Typically a system PATH is defined which includes AS RemlPath bin so that just the program name ASReml1 is required at the command prompt For example the command to run nin89 as fr
24. TSV or MSV ASReml will use the file f rsv f tsv or f msv If f filename xsv with x r t or m is used with CONTINUE TSV or MSV ASReml will use the file f xsv If the specified file is not present ASReml reverts to reading the previous rsv file Some users may prefer rather than specifying initial values in the model formulation to generate a default tsv file using MAXIT O and then edit the tsv file with more appropriate values If the model has changed and CONTINUE is used ASReml will pick up the values it recognises as being for the same terms from the rsv file Fur thermore ASRem1 will use estimates in the rsv file for certain models to provide starting values for certain more general models inserting rea sonable defaults where necessary The transitions recognised are listed and discussed in Section 7 9 2 66 5 8 Job control qualifiers Table 5 3 List of commonly used job control qualifiers qualifier action CONTRAST s t p IDDF i FCON provides a convenient way to define contrasts among treatment levels CONTRAST lines occur as separate lines between the datafile line and the model line s is the name of the model term being defined t is the name of an existing factor p is the list of contrast coefficients For example CONTRAST LinN Nitrogen 3 1 1 3 defines LinN as a contrast based on the 4 implied by the length of the list levels of factor Nitrogen Missing values in the factor bec
25. V label i j where i j spans an XFA variance structure inserts the US matrix based on the XFA parameters 213 12 2 Syntax 12 2 3 Correlation Correlations are requested by lines beginning y1 y2 Trait r id sire us Trait with an R The specific form of the directive residual id units us Trait 1S VPREDICT DEFINE F phenvar 4 6 1 3 R label a ab b id sire us Trait us Trait This calculates the correlation r o o2o7 tunits us Trait a R phencorr 7 9 phenvar and the associated standard error a b and in f _ R gencorr 4 6 sire us Trait ab are integers indicating the position of the components to be used Alternatively R label a n calculates the correlation r Cab v4 020o for all correlations in the lower triangular row wise matrix represented by components a to n and the associated standard errors Note that covariances between ratios and other components are not generated so the corre lations are not numbered and cannot be used to derive other functions To avoid numbering confusion it is better to include R functions at the end of the VPREDICT block In the example R phencorr 7 8 9 or R phencorr phenvar calculates the phenotypic covariance by calculating component 8 component 7 x component 9 where components 7 8 and 9 are created with the first line of the pin file and R gencorr 4 6 or R gencorr sire us Trait calculates the genotypic covariance by calcul
26. indicates that for the 16th data record the residuals are 2 35 6 58 and 5 64 times the respective standard deviations The standard deviation used in this test is calculated directly from the residuals rather than from the analysis They are intended to flag the records with large residuals rather than to precisely quantify their relative size They are not studentised residuals and are generally not relevant when the user has fitted hetero geneous variances Residual statistics for nin891 asr Convergence sequence of variance parameters Iteration 1 2 3 4 6 6 LogL 449 818 424 315 405 419 399 552 399 336 399 325 Change 177 216 201 51 13 3 Adjusted 0 0 0 0 0 0 StepSz 0 316 0 562 1 000 1 000 1 000 1 000 5 R 0 100000 0 293737 0 481321 0 615630 0 645607 0 653013 1 1 6 R 0 100000 0 232335 0 358720 0 439779 0 441733 0 439143 07 Trace of W W R W G W 1376 1714 Plot of Residuals 24 8729 15 9146 vs Fitted values 16 7728 35 9349 _RvE11 Ss a a i R 1 i i 1 4 i 12 2 1211 1 21 4 1 1 i 112 15 1 311 mi 1 1 1 312 dil 221 3 R 1 11 4 141 22121 41121 2 2 i a 11 1112 23 11 1 2 12 1 21 2 1213 1 ds 2 ii 7 Sa jes see lt lt 2 6 1 212 lt a i iia 41 4AL2 i2 a 1 4 1 11 i 3 2 i 11 1 11 i ii 1 117 2 2 14 1 1 1 i 12 1 1 1 1 1 SLOPES FOR LOG ABS RES on LOG PV for Section 11 0 15 SLOPES FOR LOG S
27. mean Aw If a 0 the BLUP in 2 19 becomes 2 m rop Sly 1y 2 20 Toe y 17 2 20 and the BLUP is a so called shrinkage estimate As ro becomes large relative to 0 the BLUP tends to the fixed effect solution while for small ro relative to the BLUP tends towards zero the assumed initial mean Thus 2 20 represents a weighted mean which involves the prior assumption that the u have zero mean 15 2 4 Inference Random effects Note also that the BLUPs in this simple case are constrained to sum to zero This is essentially because the unit vector defining X can be found by summing the columns of the Z matrix This linear dependence of the matrices translates to dependence of the BLUPs and hence constraints This aspect occurs whenever the column space of X is contained in the column space of Z The dependence is slightly more complex with correlated random effects 2 4 Inference Random effects 2 4 1 Tests of hypotheses variance parameters Inference concerning variance parameters of a linear mixed effects model usually relies on approximate distributions for the RE ML estimates derived from asymptotic results It can be shown that the approximate variance matrix for the REML estimates is given by the inverse of the expected information matrix Cox and Hinkley 1974 section 4 8 Since this matrix is not available in ASReml we replace the expected information matrix by the Al matrix Further
28. predict variety nitrogen SED The data fields were blocks wplots subplots variety nitrogen and yield The first five variables are factors that describe the stratification or experiment design and treat ments The standard split plot analysis is achieved by fitting the model terms blocks and blocks wplots as random effects The blocks wplots subplots term is not listed in the model because this interaction corresponds to the experimental units and is automatically included as the residual term The fixed effects include the main effects of both variety and nitrogen and their interaction The tables of predicted means and associated stan dard errors of differences SEDs have been requested These are reported in the pvs file Abbreviated output is shown below Results from analysis of yield Akaike Information Criterion 424 76 assuming 3 parameters Bayesian Information Criterion 431 04 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv blocks 5 00 3175 06 12 0 4 0 1 0 idv blocks wplots 10 00 601 331 0 0 4 0 1 0 Residual Variance 45 00 177 083 0 0 0 0 1 0 Model_Term Gamma Sigma Sigma SE C blocks IDV_V 6 1 21116 214 477 1 27 OF blocks wplots IDV_V 18 0 598937 106 062 1 56 OP idv units 72 effects Residual SCA_V 72 1 000000 177 083 4 74 OP 269 15 2 Split plot design Oats Wald F statistics Source of Variation NumDF DenDF Pine Pine 7 mu l 5 0 245 14 l
29. 1 4 8 2 5 and 9 3 6 F addvar sire us Trait 4 or F addvar 4 6 4 creates new components 10 4 x 4 11 5 x 4 and 12 6 x 4 H heritA addvar 1 phenvar 1 or H heritA 10 7 forms 10 7 to give the heritability for ywt H heritB addvar 3 phenvar 3 or H heritB 12 9 forms 12 9 to give the heritability for fat R phencorr phenvar forms 8 7 x 9 that is the phenotypic correlation between ywt and fat or R phencorr 7 8 9 R gencorr addvar forms 5 4x6 that is the genetic correlation between ywt and fat or R gencorr 4 6 The resulting pvc file contains id units us Trait 8140 effects 1 id units us Trait us Trait vidi 23 2055 0 522176 2 id units us Trait us Trait C 2 1 2 50402 0 134915 3 id units us Trait us Trait v 2 2 1 66292 0 506679E 01 us Trait id sire 184 effects A ve Trait idtsire us Trait y ot 1 45821 0 398418 5 us Trait id sire us Trait C 2 1 0 130280 0 678542E 01 6 us Trait id sire us Trait T 2 2 0 344381E 01 0 169646E 01 215 12 3 VPREDICT PIN file processing 7 phenvar 1 24 664 0 64250 8 phenvar 2 2 6343 0 14763 9 phenvar 3 1 6974 0 52365E 01 10 addvar 4 5 8328 1 5926 11 addvar 5 0 52112 0 27168 12 addvar 6 0 13775 O 67791E 01 heritA addvar 10 phenvar T 0 2365 0 0612 heritB addvar 12 phenvar 9 0 0812 0 0394 phenco 2 1 phenv 8 SQR phenv 7 phenv 9 0 4071 0 0183 gencor 2 1 addva 11 SQR addva 10 addva 12 0 5814 0 2039
30. 171 497 S2 1 00000 60 df 12 LogL 171 496 S2 1 00000 60 df Results from analysis of yl y3 y5 y7 y10 Akaike Information Criterion 354 99 assuming 6 parameters Bayesian Information Criterion 367 56 Model_Term Sigma Sigma Sigma SE C id units exph Trait 70 effects Trait EXP_P 1 0 906843 0 906843 21 88 QP Trait EXP_V 1 60 8955 60 8955 2212 OP Trait EXP_V 2 8 0128 73 0128 1 99 QP Trait EXP_V 3 309 013 309 013 2 22 OF Trait EXP_V 4 435 964 435 964 2 52 QP Trait EXP_V 5 382 312 382 312 2 74 OP Covariance Variance Correlation Matrix Residual 61 05 0 8227 0 6768 0 5568 0 4155 54 90 72 95 0 8227 0 6768 0 5050 93 05 123 6 309 6 0 8227 0 6139 90 95 120 8 302 6 437 0 0 7462 63 49 84 36 211 2 305 1 382 5 Wald F statistic Source of Variation NumDF DenDF F ine P in 8 Trait 5 18 f 108 25 lt 00 1 tmt 1 13 1 0 00 0 96 9 Tr tat 4 21 0 4 37 0 01 The last two models we fit are the antedependence model of order 1 and the unstructured model Starting values need not actually be supplie in this example the defaults are ad equate but are suppled t demonstrate the syntax We use the REML estimate of X from the heterogeneous power model shown in the previous output The antedependence model models amp by the inverse cholesky decomposition X UDU where D is a diagonal matrix and U is a unit upper triangular matrix For an antedepen dence model of order q then u 0 for j gt i q 1 The antedependence model of order
31. 208 219 asp 209 asr 34 208 210 ass 209 dbr 209 dpr 209 219 339 INDEX msv 208 219 pvc 208 pvs 208 220 221 sxes 208 221 rsv 208 228 Sln 35 208 214 spr 209 tab 208 229 tsv 208 229 veo 209 vll 209 vrb 230 vvp 209 231 was 209 xml 209 yht 37 208 215 dgiv 153 159 mef 160 Sgiv 153 tsv 66 own models 119 OWN variance structure 118 IF2 119 IT 119 Path DOPATH 194 PATH 195 PC environment 183 pedigree 147 file 148 Performance issues 196 power 117 135 Predict TP 100 ITP 173 TURNINGPOINTS 173 PLOT suboptions 174 PRWTS 179 predicted values 37 prediction 33 165 qualifiers 172 predictions estimable 38 prior mean 15 qualifier 87 l lt 54 340 I lt 54 l lt gt 54 I 54 I gt 54 I gt 54 lx 54 I 54 54 54 l 54 1A L 48 ABS 54 ADJUST 76 AIF 150 ATLOADINGS 74 AISINGULARITIES 74 ALPHA 150 AOD Analysis of Deviance 103 ARCSIN 54 ARGS 186 ASK 186 ASMV 69 ASSIGN 192 ASSOCIATE in PREDICT 177 ASSOCIATE 172 ASUV 69 1AS 48 1A 47 BINOMIAL GLM 102 I BLOCKSIZE 123 BLUP 75 BMP 75 IBRIEF 75 186 CHECK 199 ICINV 83 COLUMNFACTOR 62 COMPLOGLOG 102 COMPLOGLOG 102 CONTINUE 66 186 CONTRAST 67 COORD 117 ICOS 55 ICSV 62 ICYCLE 193 IDATAFILE 62 IDDF 67 INDEX DEBUG 186 IDEC 173 IDEFINE 207 IDENSEGIV 154 ID
32. AEXP aexp anisotropic ex C 1 2 3 2 w ponential C dlvi 2al olyi ys a 1 2 0 lt lt 10 lt lt 1 AGAU agau anisotropic C 1 2 3 2 w gaussian C i2 vii aj 1 2 0 lt lt 10 lt lt 1 MATE Mat rn with C Mat rn see text k k 1 k w matk first I lt k lt 5 gt 0 range v shape 0 5 parameters a specaned by gt 0 anisotropy ratio 1 the user a anisotropy angle 0 A 1 2 metric 2 heterogeneous variance models DIAG diag diagonal IDH x 4 0 i j w idh US unstructured Diz Qy T ewt us general covari ance matrix OWNkK user explicitly E k ownk forms V and OV 150 7 12 Variance models available in ASReml Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr hom het model variance variance function name ANTE1 1 k order ES UDU eae aes U 1 U u 1 lt j i lt k antek 1 lt k lt w 1 ii H f ij Ui 0 gt J CHOL1 1 k order S LDI ww a5 k cholesky D d D 0 i j CHOLk pear cholk 1 lt k lt w 1 La 51 Ly 5l CEA Sk FA1 p k order X DCD w w fal k factor C FF E kw w FAK analytic F contains k correlation factors fak E diagonal DD diag FACV 1 1 k order Z IT 9 w w facvl k factor T contains covariance factors kw w FACVk analytic W contains specific variance facvk covari ance form XFA1 1 k order X IT Y w w xfat k
33. ASReml uses the identifiers obtained from the grr file to define the order of the factor classes when the data is read any extra identifiers in the data not in the grr file are appended at the end of the factor level name list If NOID is set identifiers in the grr file are not needed and if present should be skipped using CSKIP Values are typically TAB COMMA or SPACE separated but may be packed no separator when all values are integers 0 1 2 Missing values in the regression variables may be represented by NA Invalid data is also treated as missing Missing values are replaced by the mean of the respective regressor Alternative missing data methods that involve imputation from neighbouring markers have not been implemented Some general qualifiers are SAVEGIV instructs ASReml to write the G matrix in dgiv format PSD s declares that the derived variance matrix may have up to s singularities PEV requests calculation of Prediction Error Variance of marker effects which are reported in the mef file Calculation of Prediction error variances is computationally very expensive CENTRE c requests ASReml to centre the regressors at c if c is specified else at the individual regressor means otherwise the G matrix is formed from uncentered regressors Note that centring introduces a singularity in the G matrix and PSV s will need to be set Other qualifiers relate specifically to whether the regressors are markers Markers are t
34. Fixed effects Term Sums of Squares M code 1 R 1 x A R A 1 B C B C R 1 A B C B C R 1 B C B C A B R B 1 A C A C R 1 A B C A C R 1 A C A C A C R C 1 A B A B R 1 A B C A B R 1 A B A B A A B R A B 1 A B C A C B C R 1 A B C A B A C B C R 1 A B C A C B C B A C R A C 1 A B C A B B C R 1 A B C A B A C B C R 1 A B C A B B C B B C R B C 1 A B C A B A C R 1 A B C A B A C B C R 1 A B C A B A C B A B C R A B C 1 A B C A B A C B C R 1 A B C A B A C B C A B C R 1 A B C A B A C B C c Of these the conditional Wald statistic for the 1 B C and A B C terms would be the same as the incremental Wald statistics produced using the linear model y x1 A B C A B A C B C A B C The preceeding table includes a so called M marginality code reported by ASReml when conditional Wald statistics are presented All terms with the highest M code letter are tested conditionally on all other terms in the model i e by dropping the term from the maximum model All terms with the preceeding M code letter are marginal to at least one term in a higher group and so forth For example in the table model term A B has M code B because it is marginal to model term A B C and model term A has M code A because it is marginal to A B A C and A B C Model term mu M code is a special case in that its test is conditional on all covariates but no factors Following is some ASReml output from the aov file which re
35. This will be a problem if the R structure model assumes n x t data structure the matrix may be OK but ASReml has not checked it this indicates that there are some lines on the end of the as file that were not used The first extra line is displayed This is only a problem if you intended ASReml to read these lines 256 14 5 Information Warning and Error messages Table 14 2 List of warning messages and likely meaning s warning message likely meaning Warning Failed to find header blocks to skip Warning Fewer levels found in term Warning FIELD DEFINITION lines should be INDENTED Warning Fixed levels for factor Warning Initial gamma value is zero Warning Invalid argument Warning It is usual to include Trait in the model Warning LogL Converged Parameters Not Converged Warning LogL not converged Notice LogL values are reported relative to a base of Warning Missing cells in table Warning More levels found in term Warning PREDICT LINE IGNORED TOO MANY Warning PREDICT statement is being ignored Warning Second occurrence of term dropped Warning Spatial mapping information for side Warning Standard errors Warning SYNTAX CHANGE text may be invalid Warning The A qualifier ignored when reading BINARY data Warning The SPLINE qualifier has been redefined The RSKIP qualifier requested skipping header blocks which were not present ASReml in
36. When AILOADINGS i is specified it also prevents AI updates of some loadings during the first i iterations For f gt 1 factors only the last factor is estimated conditional on the earlier ones in the first f 1 iterations Then pairs including the last are estimated until iteration t If AILOADINGS is not specified and CONTINUE is used and initializes the XFA model from a lower order the 7 parameter is set internally can be specified to force a job to continue even though a singularity was detected in the Average Information AI matrix The AI matrix is used to give updates to the variance parameter estimates In release 1 if singularities were present in the AI matrix a generalized inverse was used which effectively conditioned on whichever parameters were identified as singular ASReml now aborts processing if such singularities appear unless the AISINGULARITIES qualifier is set Which particular parameter is singular is reported in the variance component table printed in the asr file 14 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action BMP IBRIEF n BLUP n The most common reason for singularities is that the user has overspec ified the model and is likely to misinterpret the results if not fully aware of the situation Overspecification will occur in a direct product of two unconstrained variance matrices see Section 7 4 when a random te
37. Yi Y2 Y3 Y4 Y5 Trait Treatment Trait Treatment Table 5 4 List of occasionally used job control qualifiers qualifier action ASMV n ASUV DESIGN indicates a multivariate analysis is required although the data is pre sented in a univariate form Multivariate Analysis is used in the narrow sense where an unstructured error variance matrix is fitted across traits records are independent and observations may be missing for particular traits see Chapter 8 for a complete discussion The data is presumed arranged in lots of n records where n is the num ber of traits It may be necessary to expand the data file to achieve this structure inserting a missing value NA on the additional records This option is sometimes relevant for some forms of repeated measures analysis There will need to be a factor in the data to code for trait as the intrinsic Trait factor is undefined when the data is presented in a univariate manner allows you to have an error variance other than I8 X where is the un structured US see Table 7 6 variance structure if the data is presented in a multivariate form If there are missing values in the data include f mv on the end of the linear model The intrinsic factor Trait is defined and may be used in the model See Chapter 8 for more information This option is used for repeated measures analysis when the variance structure required is not the standard multivariate unstructured matri
38. assuming 4 parameters Bayesian Information Criterion 8493 30 Model_Term Gamma Sigma Sigma SE C idv variety IDV_V 532 1 06038 88117 5 9 92 OP ari row ar1 column 670 effects Residual SCA_V 670 1 000000 83100 1 8 90 0 P row AR_R 1 0 685387 0 685387 16 65 oP column AR_R 1 0 285909 0 285909 38T OP Wald F statistics Source of Variation NumDF DenDF F inc P inc 7 mu 1 41 7 6248 66 lt 001 3 weed 1 491 2 85 84 lt 001 294 15 7 Unreplicated early generation variety trial Wheat The change in REML log likelihood is significant x 12 46 p lt 001 with the inclusion of the autoregressive parameter for columns Figure 15 6 presents the sample variogram of the residuals for the AR1xAR1 model There is an indication that a linear drift from column 1 to column 10 is present We include a linear regression coefficient pol column 1 in the model to account for this Note we use the 1 option in the pol term to exclude the overall constant in the regression as it is already fitted The linear regression of column number on yield is significant t 2 96 The sample variogram Figure 15 7 is more satisfactory though interpretation of variograms is often difficult particularly for unreplicated trials This is an issue for further research lbi 12 variogram a a 26 aug 2d02 19 03 11 ale Oo Outer displacement Me displacement Figure 15 6 Sample variogram of the residuals from the AR1 x AR
39. column 2 and the individual components that identify the dimension of the individual matrices used in forming the direct product variance structure are then written down column 3 Note that in the simplest cases there is only one component The variance structure associated with each component has a 110 7 2 Process to define a consolidated model term Table 7 1 List of common variance model functions their type correlation or variance the form of the variance matrix generated C for correlation V for variance matrix S for scaled variance matrix and a brief description Parameters g gt 0 are variances 1 lt p lt 1 are correlations Subscipt c denotes parameter held in common across all rows columns name type variance matrix description for set of n effects idQ correlation C I IID with variance 1 idv variance V o7 1 IID with common variance default model idh variance V diag o 07 independent with separate variances ari correlation Ci pee auto regressive structure of order 1 ariv variance Viz o2pli 4 l auto regressive structure of order 1 arih variance Vij C10 pis auto regressive structure of order 1 corg correlation Ciz Pij unstructured correlation matrix diag variance V diag o 02 independent with separate variances same as idh grm scaled vari S specified applies a known scaled variance matrix the number ance of rows in the matrix must be match the number of levels of
40. defined by the model which includes those terms appearing above the current term given the variance parameters For example the test of nitrogen is calculated from the change in sums of squares for the two models mu variety nitrogen and mu variety No refitting occurs that is the variance parameters are held constant at the REML estimates obtained from the currently specified fixed model The incremental Wald statistics have an asymptotic x distribution with degrees of freedom df given by the number of estimable effects the number in the DF column In this exam ple the incremental Wald F statistics are numerically the same as the ANOVA F statistics and ASReml has calculated the appropriate denominator df for testing fixed effects This is a simple problem for balanced designs such as the split plot design but it is not straightfor ward to determine the relevant denominator df in unbalanced designs such as the rat data set described in the next section Tables of predicted means are presented for the nitrogen variety and variety by nitrogen tables in the pvs file The qualifier SED has been used on the third predict statement and so the matrix of SEDs for the variety by nitrogen table is printed For the first two predictions the average SED is calculated from the average variance of differences Note 270 15 2 Split plot design Oats also that the order of the predictions e g 0 6_cwt 0 4_ cwt 0 2_cwt O_cwt for nitrogen
41. grr file file in the first CYCLE and hold it in memory for use in subsequent cycles This is advantageous when the data grr file is large and there are many cycles to execute where the model changes but the data grr file doesn t The CYCLE mechanism acts as an inner loop when used with RENAME ARG As an example the RENAME ARG arguments might list a set of traits and the CYCLE arguments sequentially test a set of markers A cycle string may consist of up to 4 substrings separated by a semicolon and referenced as I J K and L respectively For example ICYCLE Y1 X1 Y2 X2 I mu J When cycling is active an extra line is written to the asr file containing some details of the cycle in a form which can be extracted to form an analysis summary by searching for LogL A heading for this extra line is written in the first cycle For example LogL LogL Residual NEDF NIT Cycle Text LogL 208 97 0 703148 587 6 1466 LogL Converged The LogL line with the highest LogL value is repeated at the end of the asr file DOPATH with PATH PART statements allows several analyses to be coded in one job file and run selectively without having to edit the as file be tween runs Both spellings can be used interchangably Which particular lines in the as file are honoured is controlled by the argument n of the DOPATH qualifier in conjunction with PATH or PART statements 202 10 4 Advanced processing arguments High level
42. l lt r lt l covariance C positive correlation P O0O lt r lt 1 loading L 130 7 7 Variance model function qualifiers 7 7 8 Equating variance structures USE t In some plant breeding applications it can be convenient to define a variance structure as the sum of two simpler terms For example given 1000 entries representing 50 related families where relationships were derived from markers the full relationship matrix in verse is dense But it can be well approximated as the sum of a family component and a diagonal entry component The reformulation gives a sparser faster formulation But now we have two terms to interact with xfa1 dtrial and both must have the same parameters That is instead of fitting xfai dTrial grm3 entry we fit xfal dTrial grmi family xfai dTrial grm2 entry requiring both xfa1 terms have the same parameters If there are only a few parameters this can be achieved directly as follows ASSIGN QP GPFPFP ASSIGN QE ABCDEFGH ASSIGN QI INIT 0 72631 0 000 242713 0 000 882465 846305 04419 743393 xfai dTrial QP QE QI grm1 family xfai dTrial QP QE QI grm2 entry However for a larger term the number of parameters required may exceed the available letters in the alphabet In this case VCC can be used lt DATAFILE NAME gt VCC 1 xfal dTrial QP QI grm1 family xfail dTrial QP QI grm2 entry 21 29 BLOCKSIZE 8 parameters 21 28 are equal to parameters 29
43. sat Expt 1 idv A B C will be interpreted as sat Expt 1 idv A id B C However it is good practice to specify variance model functions for the components in model terms and we encourage the user to do this ASReml will automatically add a common variance to consolidated model terms that are specified as correlation models for both R and G structures for example id A will be converted to idv A sat Expt 1 id units will be converted to sat Expt 1 idv units id A ar1 B will be converted to idv A ar1 B ar1 A ar1 B will be converted to ar1v A ar1 B sat Expt 1 id A ar1 B will be converted to sat Expt 1 idv A ar1 B sat Expt 1 ar1i A ar1 B will be converted to sat Expt 1 ariv A ar1 B Using NIN example 2 for demonstration Section 7 5 a more succinct coding of the model definition would be yield mu variety r repl residual units which would result in identical output to the original example The model could be relaxed further to yield mu variety r repl 7 11 Variance model functions available in ASReml The full range of variance models that is correlation homogeneous variance and hetero geneous variance models available in ASReml is presented in Table 7 6 which is located at the end of this chapter for easy access see Section 7 12 on page 147 This presents the variance structure name in UPPERCASE the corresponding variance model function name in lowercase used to as
44. to the power v SQRyld yield 170 5 i 0 takes natural logarithms of the data yield which must be positive LNyield yield ee ia 1 takes reciprocal of data data must yield be positive INVyield yield eg Sl ley v logical operators forming 1 if true 0 yield l lt if false high yield gt 10 gt ABS takes absolute values no argument yield required ABSyield yield ABS ARCSIN v forms an ArcSin transformation us Germ Total ing the sample size specified in ASG Germ ARCSIN Total the argument a number or another field In the side example for two existing fields Germ and Total con taining counts we form the ArcSin for their ratio ASG by copying the Germ field and applying the ArcSin transformation using the Total field as sample size 54 5 5 Transforming the data Table 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples COS SIN s takes cosine and sine of the data Day variable with period s having default CosDay Day C0S 27 omit s if data is in radians set 365 s to 360 if data is in degrees ID D lt gt v D o v discards records which have yield D lt 0 ID lt D lt v v or missing value in the field sub yield D lt 1 D gt 100 ID gt D gt v ject to the logical operator o IDV v DV o v discards records subject to yield DV lt 0 IDV lt gt v the logical operator o which have v yield DV lt 1 ID
45. 0 specifies that vertical annotation be used on the x axis default is horizontal specifies that the labels used for the data be abbreviated to n characters specifies that the labels used for the x axis annotation be appreviated to n characters 184 9 3 Prediction Table 9 2 List of predict plot options option action abbrslab n specifies that the labels used for superimposed factors be abbreviated to n characters 185 9 3 Prediction 9 3 4 Associated factors ASSOCIATE factors facilitates prediction when the levels of one factor group or classify the levels of another especially when there are many levels factors is the list of factors in the model which have this hierarchical relationship Typical examples are individually named lines grouped into families usually with unequal numbers of lines per family or trials conducted at locations within regions Declaring factors as associated allows ASReml to combine the levels of the factors appropri ately For example when predicting a trial mean to add the effect of the location and region where the trial was conducted When identifying which levels are associated ASReml checks that the association is strictly hierarchal tree like That is each trial is associated with one location and each location is associated with only one region If a level code is missing for one component it must be missing for all Averaging of associated factors will generally gi
46. 0 0335 OOL 0 1514 0 1269 0 278 0 2622 0 226 0 2857 0 2506 0 0763 TotalVar explained by all loadings The last row contains column averages 13 4 8 The rsv file The rsv file contains the variance parameters from the most recent iteration of a model The primary use of the rsv file is to supply the values for the CONTINUE qualifier see Table 5 4 and the C command line option see Table 10 1 It contains sufficient information to match terms so that it can be used when the variance model has been changed This is nin89a rsv TG 6 1711 121 237 13 4 Other ASReml output files This rsv file holds parameter values between runs of ASReml and is not normally modified by the User The current values of the the variance parameters are listed as a block on the following lines They are then listed again with identifying information in a form that the user may edit 0 000000 0 000000 0 000000 1 0000000 0 6554798 0 4375045 RSTRUCTURE 1 2 3 VARIANCE 1 1 0 a V P 1 00000000 0O 0 STRUCTURE 22 1 J ay Ry Py 0 65547976 0 Q STRUCTURE 11 1 j 6 Ry P 0 43750453 Oo g 13 4 9 The tab file The tab file contains the simple variety means and cell frequencies Below is a cut down version of nin89 tab nin alliance trial 10 Sep 2002 04 20 15 Simple tabulation of yield variety LANCER 28 56 4 BRULE 26 07 4 REDLAND 30 50 4 CODY ea 4 ARAPAHOE 29 44 4 NE83404 27 39 4 NE83406 24 28 4 NE83407 22 69 4 CENTURA
47. 0 4_cwt Golden_rain 114 6667 921070 E 0 2_cwt Marvellous 108 5000 8 1070 E 0 2_cwt Victory 89 6667 9 1070 E 0 2_cwt Golden_rain 98 5000 9 1070 E O_cwt Marvellous 86 6667 9 1070 E O_cwt Victory 71 5000 9 1070 E O_cwt Golden_rain 80 0000 9 1070 E 271 15 2 Split plot design Oats Predicted values with SED PV 126 110 114 833 118 124 Iir 500 833 167 833 667 9 71503 108 500 9 71503 89 6667 7 68295 98 5000 9 71503 86 6667 9 71503 71 5000 7 68295 9 71503 80 0000 9 71503 9 71503 SED Standard Error OO ONN OO O ON 9 Ta os 71503 71503 68295 71503 71503 71503 71503 68295 68295 71503 71503 71503 9 71503 9 71503 ts 9 9 68295 O N OO Ko ONN OO O ON 71903 68295 71503 of Difference O oO 71503 71503 68295 71503 71503 68295 71503 71503 71503 71503 68295 68295 71503 71503 71503 Min 7 6830 272 oO NO O O ON N 71503 71503 68295 11503 71503 68295 71503 71503 71503 71503 68295 68295 71503 Mean 9 1608 O O ON NO 71503 71503 68295 71503 71503 68295 71503 71503 71503 71503 68295 Max 9 7150 15 3 Unbalanced nested design Rats 15 3 Unbalanced nested design Rats The second example we consider is a data set which illustrates some further aspects of testing fixed effects in lin
48. 01 794020 417001E 01 897161 2164 23 89 64 08 00 48 19 04 sar 32 N o onw N ww N OOP OF WW W NOrFOFOrF OO an Or OO w w o 70 33 90 s97 lt 51 ebd 2401 1 1 1 2L 21d J62 68 18 90 53 10 On 54 41 25 84 99 99 15 53 00 03 00 29 06 30 08 54 30 15 eho 74 43 22 505 TO 29 OOo OO Oo Oo Oo 2 Oo 2 jo g ie gi OOGG c e e ee oo ooo ooo 0 6 OO CO Oo SS SO OO eae es Se Oe eo e oO 2 Oo 6 Oo 6 TU DTW ey oS Oo 2 Oo Oo 2 Oo 2 0 oOo 6 Se a ae ee e e a e Se es Ao Oe ee ee I 15 10 Multivariate animal genetics data Sheep 7 342 17 60 0 4231 0 2494 0 4633 0 2725 0 6680 0 1416 0 3995 0 1635 0 9630 1 998 0 2870 3 644 0 4753E 01 0 8503 2 483 0 7861E 01 0 1159 1 632 Covariance Variance Correlation Matrix US us Trait id sire 0 5939 0 7045 0 2970 0 1947 0 2924 0 6773 1 556 0 1883E 01 0 1326 0 5913 0 2805E 0O1 0 2879E 02 0 1502E 01 0 9808E 01 0 3960E 01 0 5962E 01 0 6570E 01 0 4776E 02 0 1579 0 6577 0 4073E 01 0 13383 0 8771E 03 0 4723E 01 0 3267E 01 Covariance Variance Correlation Matrix XFA xfai TrDam123 id dam 2 158 0 9961 0 8035 0 9961 2 225 2 512 0 8066 1 0000 0 1623 0 1687 0 1891E 01 0 8066 1 463 1 521 0 1109 1 0000 Covariance Variance Correlation Matrix US us TrLit1234 id lit 3 500 0 5111 0 1190 0 4039E 01 1 540 2155S 0 2041 0 5244 3101E 01 0 4509E 01 0 1910F 01 0 3185
49. 1 Table 13 1 Summary of ASReml output files file description Key output files asr contains a summary of the data and analysis results msv contains final variance parameter values in a form that is easy to edit for reset ting the initial values if MSV or CONTINUE 3 is used see Table 5 4 pvc contains the report produced with the P option pvs contains predictions formed by the predict directive res contains information from using the pol sp1 and fac functions the iteration sequence for the variance components and some statistics derived from the residuals rsv contains the final parameter values for reading back if the CONTINUE qualifier is invoked see Table 5 4 sln contains the estimates of the fixed and random effects and their corresponding standard errors tab contains tables formed by the tabulate directive tsv contains variance parameter values in a form that is easy to edit for resetting the initial values if TSV or CONTINUE 2 is used see Table 5 4 yht contains the predicted values residuals and diagonal elements of the hat matrix for each data point Other output files asl contains a progress log and error messages if the L command line option is specified aov contains details of the ANOVA calculations apj is an ASReml project file created by ASReml W 217 13 1 Introduction Table 13 1 Summary of ASReml output files file description ask holds
50. 1 has 9 parameters for these data 5 in D and 4 in U The input is given by ASSIGN ANTEI lt INIT 60 1 54 65 73 65 283 15 5 Balanced repeated measures Height 91 50 123 3 306 4 89 17 120 2 298 6 431 8 62 21 83 85 206 3 301 2 379 8 1 gt yi y3 y5 y7 y10 Trait tmt Tr tmt redidual units ante Trait ANTEI The abbreviated output file is 1 LogL 171 501 e2 1 0000 60 df 2 LogL 170 097 52 1 0000 60 df 3 LogL 166 085 S2 1 0000 60 df 4 LogL 161 335 S2 1 0000 60 df 5 LogL 160 407 S2 1 0000 60 df 6 LogL 160 370 S2 1 0000 60 df T LogL 160 369 S2 1 0000 60 df 8 LogL 160 369 S2 1 0000 60 df 9 LogL 160 369 S2 1 0000 60 df Results from analysis of yl y3 y5 y7 y10 Akaike Information Criterion 338 74 assuming 9 parameters Bayesian Information Criterion 357 59 Model_Term Sigma Sigma Sigma SE C id units ante Trait 70 effects Trait ANTE_U 1 1 0 268643E 01 0 268643E 01 2 44 OP Trait ANTE_LU 2 1 0 628417 0 628417 2 55 QP Trait ANTE_U 2 2 0 372830E 01 0 372830E 01 2 41 OP Trait ANTE_U 3 2 1 49102 1 49102 2 54 OP Trait ANTE U 3 3 0 599612E 02 0 599612E 02 2 43 OP Trait ANTE_U 4 3 1 28037 1 28037 6 19 OP Trait ANTE_U 4 4 0 789716E 02 0 789716E 02 2 44 OP Trait ANTE_LU 5 4 0 967820 O 967920 15 40 OP Trait ANTE_ U 5 5 0 390635E 01 0 390635E 01 2 45 OP Covariance Variance Correlation Matrix ANTE Residual 37 20 0 5946 0 3550 0 3115 0 3041 23 38 41 55 0 5970 0 5239 0 5114 34 84 61 93 25
51. 1 variety 55 165 0 0 88 0 712 effects Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 5 repl 4 effects fitted Finished 04 Nov 2011 21 14 29 242 LogL Converged 3 6 2 The sln file The following is an extract from nin89 sln containing the estimated variety effects intercept and random replicate effects in this order column 3 with standard errors column 4 Note that the variety effects are returned in the order of their first appearance in the data file see replicate 1 in Table 3 1 35 3 6 Description of output files Model_Term Level Effect seEffect variety LANCER 0 000 0 000 variety BRULE 2 487 4 979 variety REDLAND 1 938 4 979 variety CODY 7 350 4 979 variety ARAPAHOE 0 8750 4 979 variety NE83404 1 175 4 979 variety NE83406 4 287 4 979 variety NE83407 5 875 4 979 variety CENTURA 6 912 4 979 variety SCOUT66 1 037 4 979 variety COLT 1 562 4 979 variety NE83498 1 563 4 979 variety NE84557 8 03 4 979 variety NE83432 8 837 4 979 variety NE87615 2 975 4 979 variety NE87619 2 700 4 979 variety NE87627 5 337 4 979 mu 1 28 56 3 856 repl 1 1 880 1 T55 repl 2 843 1 755 repl 3 8219 1 755 repl 4 3 852 1 755 36 3 7 Tabulation predicted values and functions of the variance components 3 6 3 The yht file The following is an extract from nin89 yht containing the predicted values of the observa tions column 2 the residua
52. 2 4 31 65 8 6 9 6 8 2 4 25 65 8 6 10 8 9 2 14 8 6 28 3 8 6 18 15 2 T 8 6 1 19 4 4 20 1 N 4 3 26 4 22 1 6 6 1 2 12 3 6 3 6 32 4 842 an 1 6 8 6 12 10 2 8 6 13 2 11 2 28 6 14 4 12 2 G 15 6 13 2 16 8 14 2 9 2 16 2 4 2 The data file The standard format of an ASReml data file is to have the data arranged in columns fields with a single line for each sampling unit The columns contain variates and covariates 40 4 2 The data file numeric factors alphanumeric traits response variables and weight variables in any order that is convenient to the user The data file may be free format fixed format or a binary file 4 2 1 Free format data files The data are read free format SPACE COMMA or TAB separated unless the file name has extension bin for real binary or db1 for double precision binary see below Important points to note are as follows e files prepared in EXCEL must be saved to comma or tab delimited form e blank lines are ignored e column headings field labels or comments may be present at the top of the file See Generating a template on page 29 provided that the skip qualifier Table 5 2 is used to skip over them e NA and are treated as coding for missing values in free format data files if missing values are coded with a unique data value for example 0 or 9 use the transformation M value to flag them as missing or DV value to drop the data record contai
53. 2 and then averages across repl to produce variety predictions GFW Fdiam Trait Trait Year r idv Trait id Team predict Trait Team forms the hyper table for each trait based on Year and Team with each linear combination in each cell of the hyper table for each trait using Team and Year effects Team predictions are produced by averaging over years yield variety r idv site id variety predict variety will ignore the site variety term in forming the predictions while predict variety AVERAGE site forms the hyper table based on site and variety with each linear combination in each cell using variety and site variety effects and then forms averages across sites to produce variety predictions yield site variety r idv site id variety at site idv block predict variety puts variety in the classify set site in the averaging set and block in the ignore set Consequently it forms the sitexvariety hyper table from model terms site variety and site variety but ignoring all terms in at site block and then forms averages across 190 9 3 Prediction sites to produce variety predictions 9 3 7 New R4 Prediction using two way interaction effects In some cases we wish to calculate from two way interaction effects bc say effects for one of the factors B say that are a weighted sum averaged over the c levels of C ie c bi De bcijwj TPREDICT C AVE B weights ONLYUSE fun B fun C allows this to be prod
54. 21 65 4 SCOUT66 27 92 4 COLT 27 00 4 NE87615 25 69 4 NE87619 31 26 4 NE87627 23 23 4 13 4 10 The tsv file The tsv file contains the variance parameters as initialized for the most recent run in a form that is relatively easy to edit if the initial values need to be reset The file is read when TSV or CONTINUE 2 is specified or if CONTINUE is specified but no rsv file exists This is nin89a tsv 238 13 4 Other ASReml output files This tsv file is a mechanism for resetting initial parameter values by changing the values here and rerunning the job with CONTINUE 2 You may not change values in the first 3 fields or RP fields where RP_GN is negative H H HH Fields are GN Term Type PSpace Initial_value RP_GN RP_scale 4 Variance i V P 1 00000000 R 4 1 5 ari row ari column ariv row _1 R P 0 10000000 i By 1 6 ari row ari column ari column _ i R P 0 10000000 6 1 Valid values for Pspace are F P U and maybe Z RP_GN and RP_scale define simple parameter relationships RP_GN links related parameters by the first GN number RP_scale must be 1 0 for the first parameter in the set and otherwise specifies the size relative to the first parameter HOH OH Multivalue RP_scale parameters may not be altered here Notice that this file is overwritten if not being read 13 4 11 The vrb file The vrb file contains the estimates of the effects together with their approx
55. 26342 159 816 2 11 OF idv units Trait 70 effects Residual SCA_V 70 1 000000 126 494 4 90 OP id units coru Trait LogL 196 975 S2 264 10 60 df 1 000 0 5000 LogL 196 924 S2 270 14 60 df 1 000 0 5178 LogL 196 886 S2 278 58 60 df 1 000 0 5400 LogL 196 877 S2 286 23 60 df 1 000 0 5580 LogL 196 877 S2 286 31 60 df 1 000 0 5582 Final parameter values 1 0000 0 55819 Results from analysis of yl y3 y5 y7 y10 Akaike Information Criterion 397 75 assuming 2 parameters Bayesian Information Criterion 401 9 Model_Term Gamma Sigma Sigma SE C id units coru Trait 70 effects Residual SCA_V 70 1 000000 286 310 3 65 OP Trait COR_R 1 0 558191 0 558191 4 28 OP A more realistic model for repeated measures data would allow the correlations to decrease as the lag increases such as occurs with the first order autoregressive model However since the heights are not measured at equally spaced time points we use the EXP model The correlation function is given by plu o where u is the time lag is weeks The coding for this is yi y3 y5 y7 yi0 Trait tmt Tr tmt residual id units exp Trait INIT 0 5 COORD 1 3 5 7 10 A portion of the output is 281 15 5 Balanced repeated measures Height 1 LogL 202 139 S2 234 04 60 df 1 0000 0 5000 2 LogL 183 773 S2 440 42 60 df 1 0000 0 9507 3 LogL 183 070 B2 337 51 60 df 1 0000 0 9308 4 LogL 182 981 52 297 16 60 df 1 0000 0 9172 5 LogL 182 979 S2 302 31 60 df 1 00
56. 3 0 00000 0 00000 0 00 0 Trait XFA_V O 4 0 423585 0 423585 421 0 Trait XFA V O 5 0 00000 0 00000 0 00 0 Trait XFA_L 1 1 0 109659E 02 0 109659E 02 0 00 0 Trait RPA_L 1 2 Q 180117 0 180117 2 88 0 Trait ZPAL 1 3 0 219215 0 219215 3 53 0 Trait XFA_L 1 4 0 214461E 01 0 214461E 01 0 07 0 Trait XFA L 1 amp O 17 7982 0 177932 1 18 0 Trait XFAL 2 1 1 17261 117261 0 00 0 Trait XFA_L 2 2 0 530954E 01 0 530954E 01 0 00 0 Trait XFA_L 2 3 0 604977E 01 0 604977E 01 1 31 0 Trait XFA_L 2 4 0 286377 0 286377 0 99 0 Trait XFA_L 2 5 0 460967E 01 0 460967E 01 0 33 0 Trait XFA L 3 1 0 123499 0 123499 0 528 0 Trait AFA L 3 2 0 938092E 01 0 938092E 01 lt 1 09 0 Trait XFA L 3 3 0 115989 0 115989 1 12 0 Trait XFA_L 3 4 0 439945 0 439945 1 40 0 Trait XFA_L 3 5 0 288612 0 288612 262 0 tag NRM 10696 Warning Code B fixed at a boundary GP F fixed by user liable to change from P to B P positive definite C Constrained by user VCC U unbounded S Singular Information matrix S means there is no information in the data for this parameter Very small components with Comp SE ratios of zero sometimes indicate poor scaling Consider rescaling the design matrix in such cases Covariance Variance Correlation Matrix US Residual 8 138 0 5848 0 2532 0 1518 0 2373 7 284 Ifo 0 5057 0 2658 0 4837 0 2477 0 7052 0 1095 0 4193 0 1997 0 8169 2 038 0 2526 3 314 0 9232E 01 0 8713 2 531 0 8210E 01 0 2087 1 543 Covariance Variance Correlation
57. 336 Index ABORTASR NOW 68 FINALASR NOW 68 Access 42 accuracy genetic BLUP 214 advanced processing arguments 190 AI algorithm 14 AIC 17 ainverse bin 151 Akaike Information Criteria 17 aliassing 106 Analysis of Deviance 103 Analysis of Variance 19 Wald F statistics 108 animal breeding data 1 arguments 4 asrdata bin 81 ASReml symbols Bf 41 41 41 42 90 1 90 90 90 s O 90 90 ia 90 autoregressive 111 Average Information 1 balanced repeated measures 270 Bayesian Information Criteria BIC 17 binary files 43 Binomial divisor 104 BLUE 15 BLUP 15 case 88 combining variance models 12 command file 29 genetic analysis 147 multivariate 145 Command line option A ASK 187 B BRIEF 187 C CONTINUE 189 D DEBUG 188 F FINAL 189 Gg graphics 188 Hg HARDCOPY 188 I INTERACT 188 N NoGraphs 188 0 ONERUN 189 Q QUIET 188 R RENAME 189 W WorkSpace 190 X XML 186 command line options 185 commonly used functions 90 conditional distribution 12 Conditional F Statistics 19 conditional factors 95 contrasts 67 Convergence criterion 68 correlated effects 16 correlation 205 between traits 144 model 11 covariance model 11 covariates 40 61 106 cubic splines 100 data field syntax 47 data file 27 40 337 INDEX binary format 43 fixed format 42 free format 41 using Excel 42 data file line 31 datafile
58. 36 pairwise An ever better option in this case is to use just one structure twice The following code associates xfai dTrial in xfa1 dTrial giv2 entry with xfai dTrial in xfai dTrial givi family that is both terms point to the one structure definition xfai dTrial QP QI grmi family xfal dTrial USE xfai dTrial grm2 entry Table 7 5 gives examples of constraining variance parameters in ASReml 131 7 8 Setting relationships among variance structure parameters 7 8 Setting relationships among variance structure param eters 7 8 1 Simple relationships among variance structure parameters It is possible to define simple equality relationships between variance structure parameters using the s qualifier see Section 7 8 2 and Table 7 4 More general relationships between variance structure parameters can be defined by placing the VCC c qualifier on the data file definition line Unlike the case of parameter equality all parameters can be accessed and the linear relationship is not limited to equality However identification of the parameters is not as easy Each variance structure parameter yi is allocated a number 7 internally These numbers are reported in the tsv file and some are reported in the structure input section of the asr file These numbers are used to specify which parameters are to be constrained using this method Warning Unfortunately the parameter numbers usually change if the model is changed
59. 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples SET SETN SETU SUB SEQ TARGET UNIFORM vlist vlist for vlist a list of n values the data values 1 n are replaced by the cor responding element from vlist data values that are lt 1 or gt n are re placed by zero vlist may run over several lines provided each incom plete line ends with a comma i e a comma is used as a continuation symbol see Other examples below SETN v n replaces data values 1 n with normal random variables having variance v Data values out side the range 1 n are set to 0 replaces data values 1 n with uni form random variables having range 0 v Data values outside the range 1 n are set to 0 replaces data values v with their index i where vlist is a vector of n values Data values not found in vlist are set to 0 vlist may run over several lines if necessary pro vided each incomplete line ends with acomma ASReml allows for a small rounding error when matching It may not distinguish properly if val ues in vlist only differ in the sixth decimal place see Other examples below replaces the data values with a se quential number starting at 1 which increments whenever the data value changes between successive records the current field is presumed to de fine a factor and the number of lev els in the new factor is set t
60. 57 8 4 72 6 12 B 0 016 34 x66 1 58 5 1 13 0 03 B 0 872 35 x70 2 59 3 71 1 40 B 0 242 36 x71 a 64 4 0 08 0 01 B 0 929 37 x73 T 59 0 1 72 3 01 B 0 088 38 x75 1 59 9 0 04 0 26 B 0 613 39 x91 1 63 8 1 44 1 44 B 0 234 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using empirical derivatives 129 mv_estimates 9 effects fitted 9 idsize 92 effects fitted 7 are zero 115 expt idsize 828 effects fitted 672 are zero 127 at expt 6 type idsize meth 9 effects fitted 2199 singular 128 at expt 7 type idsize meth 10 effects fitted 2198 singular LINE REGRESSION RESIDUAL ADJUSTED FACTORS INCLUDED NO DF SUMSQUARES DF MEANSQU R SQUARED R SQUARED 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 1 3 0 1113D 02 452 0 2460 0 09098 0 08495 1 1100000000000000 kkk kk 2 3 0 1180D 02 452 0 2445 0 09648 0 09049 1 1 10000000000000 kkk kk 3 3 0 1843D 01 452 0 2666 0 01507 0 00883 1 110000000000000 4 3 0 1095D 02 452 0 2464 0 08957 0 08353 1 1 010000000000000 5 3 0 1271D 02 452 0 2425 0 10390 0 09799 1001 1000000000000 kkk 2K 6 3 0 9291D 01 452 0 2501 0 07594 0 06981 0 1 011000000000000 7 3 0 9362D 01 452 0 2499 0 07652 0 07039 0 0 iiepo 6 0 0 6 0 oO ao 8 3 0 1357D 02 452 0 2406 0 11091 0 10501 1010100 0000000000 perro rd kk 9 3 0 9404D 01 452 0 2498 0 07687 0 07074 0 1 101000000000000 10 3 0 1266D 02 452 0 2426 0 10350 0 09755 1i 1001000000000000 11 3 0 1261D 02 452 0 2427 0 10313 0 09717 100011
61. 6 15 7 15 8 15 9 15 10 15 11 15 12 15 13 15 14 15 15 15 16 of Figures Variogram in 4 sectors for Cashmore data o oaoa a a a 0 00000 83 Residual versus Fitted values o oo a a a 224 Vorograim Of residuals e a e s SED RHE A REWER EE YR d 235 Plot of residuals in field pla order 2 ce eR RS 236 Plot of the marginal means of the residuals 236 Histogram of residuals s ss ss Faw Reh awe cd dorita doe tkd ERS 237 Residual plot for the rat data oa oa a a a a 275 Residual plot for the voltage data ooo a 278 Trellis plot of the height for each of 14 plants 279 Residual plots for the EXP variance model for the plant data a 282 Sample variogram of the residuals from the ARIxXAR1 model 289 Sample variogram of the residuals from the AR1xAR1 model for the Tulli MESA Gaba s ea na e a Baers SS eee Eee A ee wees 295 Sample variogram of the residuals from the AR1xAR1 pol column 1 model Tot the Tullibigeal datas esa 625 e246 2 tad ereere 296 Rice bloodworm data Plot of square root of root weight for treated versus CONGO ce ek hoe Bad RE MK a a a a i ES a a e E D e 299 BLUPs for treated for each variety plotted against BLUPs for control 306 Estimated deviations from regression of treated on control for each variety plotted against estimate for control ss s es sss esa cai taa cd osad 307 Estimated difference between control and treated for each va
62. 8091 2 125 52 8061 8 125 52 8061 8 125 df df df df df df df Results from analysis of yield Akaike Information Criterion Bayesian Information Criterion 1423 57 assuming 4 parameters 1434 88 Approximate statum variance decomposition Component Coefficients 25 0 Stratum Degrees Freedom idv Rep 5 00 idv RowB1k 24 00 idv Co1B1k 23 66 Residual Variance 72 34 Model_Term idv Rep IDV_V idv RowB1k IDV_V idv ColB1k IDV_V idy units Residual SCA_V Source of Variation 8 mu 6 variety Variance 266657 74887 8 713569 5 8061 81 Gamma 6 0 528714 30 1 93444 30 1 83725 150 effects 150 1 000000 0 0 0 0 0 0 B0 4 3 0 0 0 0 Sigma 4262 39 15595 1 14811 6 8061 81 Wald F statistics DenDF 530 79 3 NumDF 1 24 F_inc 1216 29 8 84 5 0 oF O OUO Sigma SE 0 62 3 06 3 04 6 01 ererre O OOGO c 0 P 0 P 0P OP Prob lt 001 lt 001 Finally we present portions of the pvs files to illustrate the prediction facility of ASReml The first five and last three variety means are presented for illustration The overall SED printed is the square root of the average variance of difference between the variety means The two spatial analyses have a range of SEDs which are available if the SED qualifier is used All variety comparisons have the same SED from the third analysis as the design is a balanced lattice square The Wald F statistic stat
63. 952 COLT 27 00 NE87522 25 00 NE87612 21 80 NE87613 29 40 NE87615 25 69 NE87619 31 26 NE87627 Bones The predict variety statement after the model statement in nin89 as results in the nin89 pvs file displayed below some output omitted containing the 56 predicted variety means also in the order in which they first appear in the data file column 2 together with standard errors column 3 An average standard error of difference among the predicted variety means is displayed immediately after the list of predicted values As in the asr file date time and trial information are given the title line The Ecode for each prediction column 4 is usually E indicating the prediction is of an estimable function Predictions of non estimable functions are usually not printed see Chapter 9 NIN alliance trial 1989 04 Apr 2008 17 00 47 nin89 Ecode is E for Estimable for Not Estimable Predicted values of yield The predictions are obtained by averaging across the hypertable calculated from model terms constructed solely from factors in the averaging and classify sets The ignored set repl Use AVERAGE to move table factors into the averaging set variety Predicted_Value Standard_Error Ecode LANCER 28 5625 3 8557 E BRULE 26 0750 3 8557 E REDLAND 30 5000 3 8557 E CODY 21 2125 3 8557 E ARAPAHOE 29 4375 3 8557 E NE83404 27 3875 3 8557 E NE83406 24 2750 3 8557 E NE83407 22 6875 3 8557 E 38 3 7 Tabulation predicted values and
64. AINV GIV structure ALNORM calculates the Normal Integral ASRem l failed to SORT the pedigree The job file should be in ASCII format Try running the job with increased workspace or us ing a simpler model Otherwise send the job to VSN mailto support asreml co uk for investigation ASReml failed to expand the at model term string Break it into several parts on separate lines ASReml failed to parse the term Revise and simplify An argument in the CALC statement is not valid ASReml is using IDV variance structure but wonders whether that is what you intended ASReml found a alphacharacters when it was expecting nu meric data Either the variable should be declared alphanu meric or we have miscounted items on the line Use CSV if there are TAB or COMMA delimited blank lines Try running without the CONTINUE qualifier the program did not proceed to convergence because the REML log likelihood was fluctuating wildly One possible reason is that some singular terms in the model are not being detected consistently Otherwise the updated G structures are not pos itive definite There are some things to try define US structures as positive definite by using GP supply better starting values fix parameters that you are confident of while getting better estimates for others that is fix variances when estimating covariances fit a simpler model reorganise the model to reduce covariance terms
65. E sum to zero leaving only 3 fixed degrees of freedom fitted Therefore if the A inverse for this pedigree was saved it will contain GROUPSDF 3 in the GIV file 8 9 2 The example continued Below is an extension of harvey as to use harvey giv which is partly shown to the right This G inverse matrix is an identity matrix of order 74 scaled by 0 5 that is 0 5L This model is simply an example which is easy to verify Note that harvey giv is specified on the line immediately preceding harvey dat command file giv file 165 8 10 The reduced animal model RAM giv file example 01 01 5 Animal P 02 02 6 Sire P 03 03 5 Dam 04 04 5 Line 2 05 05 5 AgeOfDam adailygain 2 Y3 T2 T2 25 harvey ped ALPHA T3 Ta 6 harvey giv giv structure file 74 74 5 harvey dat adailygain mu Line fixed model Ir grmiv Sire INIT 0 25 random model residual idv units Model term specification associating the harvey giv structure to the coding of sire takes precedence over the relationship matrix structure implied by the P qualifier for sire In this case the P is being used to amalgamate animals and sires into a single list and the giv matrix must agree with the list order 8 10 The reduced animal model RAM The reduced animal model was devised to reduce the computation involved in fitting a large animal model When there is at most one record per individual a large proportion of the indiv
66. ERE RE Re eH RS Gil Wald F Stafeti gt ee SEG RARE Oe EERE e EH EGE EWS i Command file Specifying the variance structures Tal Applying variance models to random terms ooa aa 0004 7 2 Process to define a consolidated model term aoaaa aa 7 2 1 Modelling a single variance structure over several model terms 7 3 Applying variance structures to the residual error term aooaa 2 7 3 1 Special properties and rules in defining the residual error term 7 3 2 Using sat to specify the residual model term for data with sections 7 4 ldentinabiity c coea hee ae he eG EER EER REAR OR DES oS 7 5 A sequence of variance structures for the NIN data 7 6 Sigma versus gamma parameterization 2 200000048 7 6 1 Which parameterization does ASReml use for estimation 7 6 2 Switching from the gamma to the sigma parameterization vii 116 7 7 Variance model function qualifiers 2 2 02 000 7 7 1 Parameter equality constraints 5 00 0000 7 7 2 New R4 Ways to supply distances in one dimensional metric based mod ek Ce es eo a A ee ee we ee T3 Yourown programi IPis es CR daret ER Eee EES 7 7 4 Parameter space constraints Gs 2 2 ee ee 7 7 5 New R4 Initial values INITv 2 2 2 eee ee 7 7 6 About subsections SUBSECTION f 2 00 Tar Fatame ter ypes Fe es a bison ok ee ee ee eo eee Y 7 7 8 Equating variance structures 1USE 4 24 20
67. FREE FORMAT skipping 1 lines Univariate analysis of HT6 Summary of 6399 records retained of 6795 read Model term Size miss zero MinNonOo Mean MaxNonO StndDevn 1 Nfam 71 0 0 1 36 3379 Ti 2 Nfemale 26 0 0 1 12 8823 26 3 Nmale 37 0 0 1 15 2285 37 Warning More levels found in Clone than specified 4 Clone 926 0 0 1 464 6765 926 Warning Fewer levels found in MatOrder than specified 5 MatOrder 914 0 0 1 432 5760 860 6 rep 8 0 0 1 4 4837 8 7 iblk 80 0 0 1 40 1164 80 8 tree 0 0 1 0000 7 473 14 00 4 018 9 row 0 0 1 0000 28 52 56 00 16 09 10 col 0 0 1 0000 10 50 20 00 5 760 Warning Fewer levels found in prop than specified 11 prop 2 0 0 1 1 0000 1 12 culture 2 0 0 1 1 4945 2 13 treat 2 0 0 1 1 4945 2 Warning Fewer levels found in measure than specified 14 measure 2 0 0 al 1 0000 1 15 SURV 0 6 1 0000 0 9991 1 0000 0 3061E 01 16 DBH6 4 0 0 3000E 01 11 29 16 80 2 400 17 HTG Variate 0 0 76 20 838 6 1286 163 6 18 HT8 83 O 91 44 1148 1576 170 6 19 CWAC6 3167 0 97 54 301 3 542 5 52 26 20 mu 1 21 culture rep 16 12 culture 2 6 rep 8 Warning GRM matrix is too SMALL 171 8 11 Factor effects with large Random Regression models 22 grmi Clone 923 23 rep iblk 640 6 rep 8 7 iblk 80 Forming 2508 equations 19 dense Initial updates will be shrunk by factor 0 316 Notice LogL values are reported relative to a base of 30000 000 Notice 11 singularities detected in design matrix 1 LogL 2845 97 S2 8956 5 6390 df 2
68. Matrix US us TrLit1234 id lit 3 847 0 6368 0 2472 0 7180E 01 2 523 4 079 0 6454 0 4860 0 7674E 01 0 2063 0 2504E 01 0 3706 331 bas e u e e e e e 2 a OU j TTUTU TTT T yyy Oe ee 15 10 Multivariate animal genetics data Sheep 1182 0 8241 0 4923E 01 0 7049 Covariance Variance Correlation Matrix XFA xfa1 TrDam12 id dam 1 614 1 0000 1 0000 1 465 1 330 1 0000 Leek 1 153 1 0000 Covariance Variance Correlation Matrix XFA xfa3 Trait nrm tag 1 389 0 2978 0 1871 0 2861 0 4630E 01 0 9303E 03 0 9948 O 2017 0 7379E 01 0 4419E 01 0 8809 0 1709 0 1009 0 8568 0 2526 0 4495 0 5629E 01 0 4726E 01 0 6514E 01 0 3410 0 3155E 01 0 8583 0 2355 0 4560 0 2820 0 3004E 01 O 7277E 01 0 6992 0 4761 0 2416E 01 0 3414 0 5260 0 1869E 01 7261E 02 0 2757E 02 0 1363 0 1173 0 5210 0 1323 0 8432 0 1097E 02 0 1801 0 2190 0 2020E 01 0 1784 1 0000 0 000 0 000 1 173 0 5310E 01 0 6011E 01 0 2855 0 4530E 01 0 000 1 0000 0 000 Oe 1199 0 9449E 01 0 1164 0 4398 0 2888 0 000 0 000 1 0000 Note that the XFA matrix associated with tag has 8 rows and columns the first five relate to the five traits and the last three relate to the three factors 332 Bibliography Breslow N E 2003 Whither PQL Technical Report 192 UW Biostatistics Working Paper Series University of Washington URL http www bepress com uwbiostat paper192 Breslow N E and Clayton D G 1993 Approximate inference in generalized linea
69. Prepare the data typically using a spreadsheet or data base program e Export that data as an ASCII file for example export it as a csv comma separated values file from Excel e Prepare a job file with filename extension as e Run the job file with ASReml e Review the various output files e revise the job and re run it or 24 3 2 Nebraska Intrastate Nursery NIN field experiment e extract pertinent results for your report You will need a file editor to create the command file and to view the various output files On unix systems vi and emacs are commonly used Under Windows there are several suitable program editors available such as ASReml W and ConText mentioned in Section 1 3 3 2 Nebraska Intrastate Nursery NIN field experiment The yield data from an advanced Nebraska Intrastate Nursery NIN breeding trial conducted at Alliance in 1988 89 will be used for demonstration see Stroup et al 1994 for details Four replicates of 19 released cultivars 35 experimental wheat lines and 2 additional triticale lines were laid out in a 22 row by 11 column rectangular array of plots the varieties were allocated to the plots using a randomised complete block RCB design In field trials complete replicates are typically allocated to consecutive groups of whole columns or rows In this trial the replicates were not allocated to groups of whole columns but rather overlapped columns Table 3 1 gives the allocation of varietie
70. RENAME ARG 1 2 Slate Hall example Rep 6 Six replicates of 5x5 plots in 2x3 arrangement RowBlk 30 Rows within replicates numbered across replicates 287 15 6 Spatial analysis of a field experiment Barley ColBlk 30 Columns within replicates numbered across replicates row 10 Field row column 15 Field column variety 25 yield barley asd skip 1 DOPATH 1 PATH 1 AR1 x AR1 y mu var residual ariv column ar1 row PATH 2 AR1 x AR1 units y mu var r idv units residual ariv column ar1 row PATH 3 incomplete blocks y mu var r idv Rep idv Rowblk idv Colblk residual idv units PATH O predict variety TWOSTAGEWEIGHTS Abbreviated ASReml output file is presented below The iterative sequence has converged to column and row correlation parameters of 68377 45859 respectively The plot size and orientation is not known and so it is not possible to ascertain whether these values are spa tially sensible It is generally found that the closer the plot centroids the higher the spatial correlation This is not always the case and if the highest between plot correlation relates to the larger spatial distance then this may suggest the presence of extraneous variation see Gilmour et al 1997 for example Figure 15 5 presents a plot of the sample variogram of the residuals from this model The plot appears in reasonable agreement with the model The next model includes a measurement error or nugget effect compo
71. Tag e 29 32 50 53 phen uusT 11 15 susT 11 15 defines 11 15 elements of phen G iea BE SE ies H HOD 320 15 10 Multivariate animal genetics data Sheep defines 70 74 11215 eared d Direct susT 4 defines 75 89 23 37 4 Maternal Damv susT 1 6 defines 90 95 54 59 23 28 resid phen susT defines 96 110 60 74 23 37 WWTh2 Direct 1 phen 1 defines 111 75 60 YWTh2 Direct 3 phen 3 defines 112 77 62 GFWh2 Direct 6 phen 6 defines 113 80 65 FDMh2 Direct 10 phen 10 defines 114 84 69 FATh2 Direct 15 phen 15 defines 115 89 74 GenCor susT defines 116 125 from 23 37 MatCor Maternal defines 126 129 from 90 95 POM mteeetyAa aa Table 15 15 Variance models fitted for each part of the ASReml job in the analysis of the genetic example term matrix PATH 1 PATH 2 PATH 3 sire 5 diag fal us dam Xa diag fal fal litter 5X diag fal us error de us us us LogL 1566 45 1488 11 1480 89 Parameters 36 48 55 The specification in Release 3 required specification of initial values for variance parameters and also through the use of CONTINUE the generation of initial values from previous anal yses In Release 4 with the functional specification and no initial values specified ASReml will estimate initial values In this example we start by fitting diagonal matrices for sire dam and litter using initial values from univariate analyses and estimate an unstructured res
72. The default is to read all values in the file regardless of layout Otherwise the weights must appear a single column field one weight per line where the field is specified by appending c to the filename Consider a rather complicated example from a rotation experiment conducted over several years One analysis was of the daily live weight gain per hectare of the sheep grazing the plots There were periods when no sheep grazed Different flocks grazed in the different years Daily liveweight gain was assessed between 5 and 8 times in the various years To obtain a measure of total productivity in terms of sheep liveweight we need to weight the daily gain by the number of sheep grazing days per month The production for each year is given by predict year predict year predict year predict year predict year crop 1 pasture lime AVE month 56 55 56 53 57 63 6 0 crop 1 pasture lime AVE month 36 0 0 53 23 24 54 54 43 35 0 0 crop 1 pasture lime AVE month 70 0 21 170007000 53 0 crop 1 pasture lime AVE month 53 56 22 92 19 44 0 0 36 0 0 49 crop 1 pasture lime AVE month 0 22 0 53 70 22 0 51 16 5100 aoP WN RK but to average over years as well we need one of the following predict statements predict crop 1 pasture lime PRES year month IPRWTS 56 55 56 53 57 63 0 0 0 0 0 0 36 0 0 53 23 24 54 54 43 35 O O TO 0 2117 0 0 TO 0 O83 Q 53 56 22 92 19 44 0 036 0 0 49 0 22 0 537022 0 51 16 51 0 5 predict crop 1 pasture lime PRES m
73. There are now three consolidated model terms idv rep1 idv units and ariv column ar1 row This order is reversed in 4 4 Two dimensional separable autoregressive spatial model defined as a G structure This model is equivalent to 3c but with the spatial NIN Alliance Trial 1989 model defined as a G structure rather than an R struc variety A ture The algebraic form is written alternatively but id equivalently to the form in 3c that is var t O Ty row 22 var Wer Oe el Pe 8 X pr and ane f war e _ o om noran rand Skip 1 l yield mu variety r idv repl ariv column ar1 row Important points 12 une residual idv units e the same G structure could be achieved by specifying ar1 column ariv row see similar comment in example 3b e if the variance structure ariv column ariv row was specified ASReml would report an error see identical comment in example 3b e estimation is based on the gamma parameterization in which case both the estimated sigmas and the estimated gammas are reported The user can force ASReml to use the sigma parameterization by placing the SIGMAP qualifier immediately after the indepen dent variable and before on the model definition line In this case only the sigmas would be reported but they would be reported twice in the output see Important points under example 3a 121 7 6 Sigma versus gamma parameterization 7 6 Sigma versus gamma parameterizati
74. Trait volved 35 teams of wethers representing 27 bloodlines The file wether dat shown below contains greasy fleece weight kg yield per centage of clean fleece weight to greasy fleece weight and fibre diameter microns The code wether as to the right performs a basic bivariate analysis of this data SheepID Site Bloodline Team Year GFW Yield FD 0101 3 21 1 156 74 3 18 5 0101 3 21 1 2 6 0 71 2 19 6 0101 3 21 138 0 75 7 21 5 0102 3 21 1 1 5 3 70 9 20 8 0102 3 21 1 2 5 7 66 1 20 9 0102 3 21 136 8 70 3 22 1 0103 3 21 1 1 5 0 80 7 18 9 0103 3 21 1 25 5 75 5 19 9 0103 3 21 1 3 7 0 76 6 21 9 4013 3 43 35 1 7 9 75 9 22 6 4013 3 43 35 2 7 8 70 3 23 9 4013 3 43 35 3 9 0 76 2 25 4 4014 3 43 35 1 8 3 66 5 22 2 4014 3 43 35 2 7 8 63 9 23 3 4014 3 43 35 3 9 9 69 8 25 5 4015 3 43 35 1 6 9 75 1 20 0 4015 3 43 35 2 7 6 71 2 20 3 4015 3 43 35 3 8 5 78 1 21 7 8 2 Model specification The syntax for specifying a multivariate linear model in ASReml is Y variates fixed r conrandom f sparse_fized residual conresidual e Y variates is a list of up to 20 traits there may be more than 20 actual variates if the list includes sets of variates defined with G on page 49 154 8 3 Residual variance structures fixed conrandom and sparse_fixed are as in the univariate case see Chapter 6 but involve the special term Trait and interactions with Trait The design matrix for Trait has a level column fo
75. Trait pust Trai sus Trait sus Trait Mel irait yust Trai us Trait us Trait mms Trait sustiT raid us Trait sus Trait us Trait us Trait us Trait us Trait us Trait us Trait 1 OANODOa PWN ererrrhe PWN Fe OO 15 Results from analysis of wwt id units id units id units id units id units id units id units id units id units id units id units id units id units id units id units diag TrSG123 16 diag TrSG123 sex grp diag TrSG123 17 diag TrSG123 sex grp diag TrSG123 18 diag TrSG123 sex grp diag TrSG123 Sex grp 35200 effects 147 effects diag TrAG1245 age 19 diag TrAG1245 20 diag TrAG1245 21 diag TrAG1245 22 diag TrAG1245 us Trait id sire Brp age age age age 196 effects grp diag TrAG1245 grp diag TrAG1245 grp diag TrAG1245 grp diag TrAG1245 460 effects 23 24 25 26 2r 28 29 30 31 32 33 34 35 36 37 s Trait us Trait ne Trait us Trait us Trait us Trait us Trait us Trait us Trait ae Trait us Trait ne Trait us Trait ie Trait us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trait Ad sire sus Trait id sire us Trait id sire us Trait id sire us Trait id sire us Trai
76. a none blocks fixed 2 RCB analysis Ir idv repl oy a rT Yr blocks random residual idv units aT a a 3a Two dimensional Ir idv repl ozi g YT Yr spatial model residual idv column ari row a la ZDAN Tapi Tap Pr correlation in one direction Ss 36 Two dimensional Ir idv rep1 ef g rl Yr separable residual ariv column ar1 row oe slic B pr Cx Pe Dapa Dp Pes De autoregressive spatial model 3c Two dimensional Ir idv repl got o Yr Vr separable idv units et on ant ee In autoregressive residual ariv column ar1 row of Dell Volpe of rs Dis Bel Val pp Prs Pe spatial model with measurement error 4 Two dimensional r idv repl oot o2 yT Vr separable autoregressive ariv column ari row Ge Sap E Gale Taa O Lelie ena Pe Pe spatial model residual idv units lcs o2 A defined as a G structure uoljezZuajawesed ewwes SNSADA ewSIS 9 7 7 7 Variance model function qualifiers 7 7 Variance model function qualifiers A consolidated model term is comprised of one or more covariance components where a covariance component is a component of the model term to which a variance model function has been applied see Section 2 1 8 and Table 7 2 All of the covariance components so far have been of the form umfname component where umfname is the variance model function name in this font in first column of Table 7 6 and component is a component in the model term Two single covariance compon
77. and we present a discussion of this code to the left We present the model specification explicitly to help the user understand the logic In some cases experienced users will wish to take advantage of reducing typing and clarity by using default rules These are discussed in Section 7 10 117 7 5 A sequence of variance structures for the NIN data 1 Randomised complete blocks analysis blocks fixed The only random term in a traditional randomised complete block RCB analysis of the NIN data is the residual error term e N 0 o7J The model therefore involves just one R structure IDV and no G structure The variance model function name is idv and there is just one consolidated model term idv units 2 RCB analysis blocks random The random effects RCB model has 2 random terms to indicate that the total variation in the data is comprised of 2 components a random repli cate term u N 0 o7I and the resid ual error term as in example 1 The r be fore repl tells ASReml that repl is a random term All random terms must be written af ter r in the model specification line s This model involves both the original IDV R struc IDV G structure for the random There are now now 2 consoli idv repl and idv units ture and an replicate term dated model terms 118 NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89 asd skip 1 yield mu variety repl residual id
78. be arranged with key fields followed by other fields from the primary file and then fields from the secondary file Table 11 1 List of MERGE qualifiers qualifier action CHECK requests ASReml confirm that fields having a common name have the same contents Discrepancies are reported to the asr file If there are fields with common names which are not key fields and CHECK is omitted the fields will be assumed different and both versions will be copied IKEY keyfields names the fields which are to be used for matching records in the files If the fields have the same name in both file headers they need only be named in association with the primary input file If the key fields are the only fields with common names the KEY qualifier may be omitted altogether If key fields are not nominated and there are no common field names the files are interleaved KEEP instructs ASReml to include in the merged file records from the input file which are not matched in the other input file Missing values are inserted as the values from the other file Otherwise unmatched records are discarded KEEP may be specified with either or both input files INODUP fields Typically when a match occurs the field contents from the second file are combined with the field contents of the first file to produce the merged file The NODUP qualifier which may only be associated with the second file causes the field contents for the nominated fields from th
79. conditional Wald F statistic column to the Wald F Statistics table It enables inference for fixed effects in the dense part of the lin ear mixed model to be conducted so as to respect both structural and intrinsic marginality see Section 2 5 The detail of exactly which terms are conditioned on is reported in the aov file The marginality principle used in determining this conditional test is that a term cannot be ad justed for another term which encompasses it explicitly e g term A C cannot be adjusted for A B C or implicitly e g term REGION cannot be adjusted for LOCATION when locations are actually nested in regions al though they are coded independently FOWN on page 78 provides a way of replacing the conditional Wald F statistic by specifying what terms are to be adjusted for provided its degrees of freedom are unchanged from the incremental test 67 5 8 Job control qualifiers Table 5 3 List of commonly used job control qualifiers qualifier action IMAXIT n SUM IX v IY v IG v JOIN sets the maximum number of iterations the default is 10 for traditional models more for general models ASReml iterates for n iterations unless convergence is achieved first Convergence is presumed when the REML log likelihood changes less than 0 002 current iteration number and the individual variance parameter estimates change less than 1 If the job has not converged in n iterations use the CONTINU
80. e VCC c specifies that there are c lines defining parameter relationships e If VCC is used a residual line is required and the parameter relationship lines must occur after this residual line e each relationship is specified in a separate line of the form ke ok simple case i kxvuk px vp BLOCKSIZE n general case In this specification i and k p are the numbers of the specific variance model parameters and vm m k p are the associated scale coefficients such that ym x V m is equal in value to yi for example 5 7 1 indicates that y_7 x 1 y__5 ie parameter 7 is equal to parameter 5 5 7 1 indicates that parameter 7 is a tenth of parameter 5 x indicates the presence of the scale coefficient v_m for the parameter m if the coefficient is 1 indicating parameter equality the 1 can be omitted for example 5 7 is a simplified coding of the first example if the coefficient is 1 i k x 1 can be simplified to i k for example 5 7 indicates that parameter 7 is has the same magnitude but opposite sign to parameter 5 the BLOCKSIZE n qualifier is used when constraints of the same form are required on 132 7 8 Setting relationships among variance structure parameters blocks of n contiguous parameters for example 21 29 BLOCKSIZE 8 equates parameters 29 with 21 30 with 22 36 with 28 a variance structure parameter may only be included in one relationship line to equate several compo
81. ee eee ee na 10 2 3 Forming a job template from a data file 10 3 Command line options 2 ek c soea ta ke he eR RS eR e E SE 10 3 1 Prompt for arguments A oo aa 10 3 2 Output control B OUTFOLDER IXML 2 224242 he eee dee 10 3 3 Debug command line options D E 10 3 4 Graphics command line options G H 1 N Q 10 3 5 Job control command line options C F O R 10 3 6 Workspace command line options S WW 0 MSS Fe ee he ees ES BS ASP N S A SE EER 10 4 Advanced processing arguments 2 0000002 eee 10 4 1 Standard use of arguments 0 000 eee eee 10 4 2 Frompting for Wt nw ssc eh eee ee he Ewe AS GAS SS Y 10 4 3 Paths and Loops 24 o sacras cacadan behead eee de 10 4 4 Order of Substitution aaao OY wD Oe ee See 105 Perlormmengce BONES o c ee ee bope ee b OE Re pie Eee ee Sew ed 10 5 1 Multiple processors ck ee hb eh we ee ee ees 10 5 2 Slow processes nb kk ba eae HES REE ER tarti 10 5 3 Timing processes ns ke eee Re ERE HE RS 11 Command file Merging data files ILI MRIMCTO es ce a ee eae eee ee eee ee eee ee ES 22 WHET ew satos adada ek BG 6 He Gok HHS Gee eS eR Tt EOS si th eh EMA EERE SEEMS SHES SE HES A G SH 12 Functions of variance components 121 lniodugtom o se ed oe ee PRR ee EER ee OR Ree SS ee es eo a e a e ee ee ee ee ee ee A 12 2 1 Functions of components o sa o ead we So ee ee ee es 12 2 2 Conve
82. extended T contains covariance factors kw w XFAk factor W contains specific variance xfak analytic 151 7 12 Variance models available in ASReml Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr hom het model variance variance function name relationship matrices AINV inverse relationship matrix derived from pedi 0 1 gree NRM relationship matrix derived from pedigree 0 1 nrm GIV1 generalized inverse number 1 0 1 givi GIV8 generalized inverse matrix 8 0 1 givs GRM1 generalized relationship number 1 0 1 grmi GRM8 generalized relationship matrix 8 0 1 grm8 t This is the number of variance structure parameters w is the dimension of the matrix The homogeneous variance form is specified by appending V to the correlation basename the heteroge neous variance form is specified by appending H to the correlation basename t These will be associated with 1 variance parameter unless used in direct product with another structure that provides the variance Appending a v to a name makes it explicit that a variance parameter is fitted 152 8 Command file Multivariate analysis 8 1 Introduction Multivariate analysis is used here in the narrow sense of a multivariate mixed model There are many other multivariate analysis techniques which are not covered by ASReml Multi variate analysis is used when we are interes
83. file is by programs written to parse ASReml output For further details including the status of intended future developments please contact support vsni co uk 196 10 3 Command line options 10 3 3 Debug command line options D E D and E DEBUG DEBUG 2 invoke debug mode and increase the information written to the screen or asl file This information is not useful to most users On Unix systems if ASReml is crashing use the system script command to capture the screen output rather than using the L option as the as1 file is not properly closed after a crash 10 3 4 Graphics command line options G H I N Q Graphics are produced by ASReml on some platforms e g PC and Linux using the Winter acter graphics library The I INTERACTIVE option permits the variogram and residual graphics to be displayed This is the default unless the L option is specified The N NOGRAPHICS option prevents any graphics from being displayed This is the default when the L option is specified The Gg GRAPHICS g option sets the file type for hard copy versions of the graphics Hard copy is formed for all the graphics that are displayed H g HARDCOPY g replaces the G option when graphics are to be written to file but not displayed on the screen The H may be followed by a format code e g H22 for eps Q QUIET is used when running under the control of ASReml W_ to suppress any POP UPs PAUSES from ASReml ASReml writes
84. files produced by this job include the aov pvs res tab sln and yht files see Section 13 4 3 6 1 The asr file Below is nin89 asr with pointers to the main sections The first line gives the version of ASReml used in square brackets and the title of the job The second line gives the build date for the program and indicates whether it is a 32bit or 64bit version The third line gives the date and time that the job was run and reports the size of the workspace The general announcements box outlined in asterisks at the top of the file notifies the user of current release features The remaining lines report a data summary the iteration sequence the estimated variance parameters and a table of Wald F statistics The final line gives the date and time that the job was completed and a statement about convergence ASReml 3 1 01 Jan 2011 NIN alliance trial 1989 job heading Build cm 25 Oct 2011 64 bit 04 Nov 2011 21 14 28 404 32 Mbyte Linux x64 nin89 Licensed to Cargo Vale Olives Univ of Wollongong at Jul 2012 aE kk kkk kk kkk k kk k k k k k kak k kk GIORGIO k k k k k kk Kk kk KK K K Contact support asreml co uk for licensing and support aooo oo oo oo aKa ARG Folder home gilmoua W7drive Users Public ASReml asr3 ug3 Manex4 variety A QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Univariate analysis of yield Summary of 224 records retained of 224 read data summary Model term Size mis
85. for Estimable for Not Estimable Warning mv_estimates is ignored for prediction The predictions are obtained by averaging across the hypertable calculated from model terms constructed solely from factors in the averaging and classify sets Use AVERAGE to move ignored factors into the averaging set nm m a ee Md Mmm 1 mame m a a ar cla Ga aie M lm Predicted values of yield variety Predicted_Value Standard_Error Ecode predicted variety means LANCER 24 0891 2 4648 E BRULE 2 0731 2 4946 E REDLAND 28 7953 2 5066 E CODY 23 7733 2 4973 E ARAPAHOE 27 0429 2 4420 E NE83404 25 7199 2 4426 E NE83406 25 3793 25030 E NE83407 24 3981 2 6892 E CENTURA 26 3531 2 4765 E SCOUT66 29 1741 2 4363 E NE87615 25 1218 2 4436 E NE87619 30 0261 2 4669 E NE87627 19 7108 2 4836 E SED Overall Standard Error of Difference 2 925 SED summary 13 4 7 The res file The res file contains miscellaneous supplementary information including e a list of unique values of x formed by using the fac model term e alist of unique z y combinations formed by using the fac z y model term e legandre polynomials produced by leg model term e orthogonal polynomials produced by pol model term e the design matrix formed for the sp1 model term 230 13 4 Other ASReml output files predicted values of the curvature component of cubic smoothing splines the empirical variance covariance matrix based on the BLUPs when a J or J
86. for exam ple use CORUH instead of US 259 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Correlation structure is not positive definite Data does not have sections Define structure for Error The indicated number of input fields exceeds the limit Error in CONTRAST label factor values Error in GROUP label factor values Error in SUBSET label factor values Error in extended ASSIGN Error in R structure model checks Error opening file Error in list Error in PREDICT Error in variance header line Error in Variance Parameter Constraint Error opening file Error order Error parsing Error reading something It is best to start with a positive definite correlation structure Maybe use a structured correlation matrix The data does not match the RESIDUAL specification A variance structure should be specified for this term The reported limit is hardcoded The number of variables to be read must be reduced The error could be in the variable factor name or in the num ber of values or the list of values The list of values does not agree with the factor definition The error could be in the variable factor name or in the num ber of values or the list of values The lt gt qualifiers allow an assign string to be defined over severa
87. for these instructions are discussed Direct use of the pin file as was required in ASReml 2 is discussed in Section 12 3 12 2 Syntax Instructions to calculate functions are headed by a line VPREDICT DEFINE This line and the following instructions can occur anywhere in the as file but the logical place is at the end of the file The instructions are processed after the job part cycle has been completed ASReml recognises a blank line or end of file as termination of the functional instructions Functions of the variance components are specified by lines of the form letter label coefficients e letter either F H R S V or X must occur in column 1 F forms linear combinations of variance components 210 12 2 Syntax H is for forming heritabilities the ratio of two components R is for forming the correlation from a covariance component S is a square root function Vis for converting components related to a CORUH or an XFA structure into components related to a US structure X is a multiply function e label names the result e coefficients is the list of arguments coefficients for the linear function When ASReml reads back the variance parameters from the asr file each covariance com ponent or variance function is assigned a name The full name is usually the covariance function or its specified contracted form prepended by the consolidated model term or its specified contracted form and
88. form is said to be nonnegative definite if a Aw gt 0 for all a R If x Ags is nonnegative definite and in addition the null vector 0 is the only value of for which Aaw 0 then the quadratic form is said to be positive definite Hence the matrix A is said to be positive definite if z Aa is positive definite see Harville 1997 pp 211 7 11 3 Notes on the variance models These notes provide additional information on the variance models defined in Table 7 6 e the IDH and DIAG models fit the same diagonal variance structure e the CORGH and US are equivalent variance structures parameterised differently Both may fail to converge if the starting values are not good and or if the maximum REML likelihood occurs at parameter values outside the parameter space The us model is likely to be better when the matrix is of order 3 or higher in CHOLk models LDL where L is lower triangular with ones on the diagonal D is diagonal and k is the number of non zero off diagonals in L in CHOLKC models X LDL where L is lower triangular with ones on the diagonal D is diagonal and kis the number of non zero sub diagonal columns in L This is somewhat similar to the factor analytic model in ANTEk models X UDU where U is upper triangular with ones on the diagonal D is diagonal and k is the number of non zero off diagonals in U the CHOLk and ANTEK models are equivalent to the US structure that is the full variance st
89. functions of the variance components CENTURA 21 6500 3 8557 E SCOUT66 27 5250 3 3557 E COLT 27 0000 3 8557 E NE87613 29 4000 3 9657 E NE87615 25 6875 3 8557 E NE87619 31 2625 3 9957 E NE87627 23 2250 2 8557 E SED Overall Standard Error of Difference 4 979 39 4 Data file preparation 4 1 Introduction The first step in an ASReml analysis is to prepare the data file Data file preparation is discussed in this chapter using the NIN example of Chapter 3 for demonstration The first 25 lines of the data file are as follows CODY 4 NE83404 NE83406 NE83407 CENTURA SCOUT66 COLT 11 NE83498 NE84557 NE83432 NE85556 NE85623 NE86482 LANCOTA CENTURK78 17 1117 632 NORKAN 18 1118 446 1 4 22 KS831374 19 1119 684 14 3 TAM200 20 1120 422 1 4 2 HOMESTEAD 22 1122 566 1 4 variety id pid raw repl nloc yield lat long row column BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 4 3 22 8 ARAPAHOE 5 1105 661 1 4 33 05 4 3 1104 602 1 4 30 1 6 1106 605 1 4 30 2 7 1107 704 1 4 35 2 8 1108 388 1 4 19 4 9 1109 487 1 4 24 3 10 1110 511 1 4 25 1111 502 1 4 25 1 8 12 1112 492 13 1113 509 14 1114 268 15 1115 633 16 1116 513 ererrer 3 ae 4 1 1 21 1121 560 1 4 28 23 1123 514 1 4 25 NE86501 24 1124 635 1 4 31 75 8 6 20 4 17 2 NE86503 25 1125 840 1 4 42 8 6 21 6 18 2 25 4 3 25 2 21 1 5 8 0 2 4 2 2 8 4 24 6 8 665 2 4 25 45 8 6 7 2 6 2 4 13 4 8 6 8 4 7
90. have the same name label For example IMBF mbf entry mlib m35 csv RENAME Marker35 If the key values are the ordered sequence 1 N the key field may be omitted if NOKEY is specified If the key is not in the first field its location can be specified with KEY k If extracting a single covariate from a large set of covariates in the file the specific field to extract can be given by FIELD s in absolute terms or relative to the key field by RFIELD r For example IMBF mbf variety 1 markers csv key 1 RFIELD 35 RENAME Marker35 SKIP k requests the first k lines of the file be ignored SPARSE can be used when the covariates are predominately zero Each key value is followed by as many column value pairs as required to specifiy the non zero elements of the design for that value of key The pairs should be arranged in increasing order of column within rows The rows may be continued on subsequent lines of the file provided incomplete lines end with a COMMA This file may now be a binary format file with file extension bin indi cating 32bit real binary numbers and dbl indicating 64bit real binary values Files with these formats can be easily created in a preliminary run using the SAVE qualifier The advantage of using a binary file is that reading the file is much quicker This is important if the file has many fields and is being accessed repeatedly for example ICYCLE 1 1000 IMBF mbf Geno markers dbl key 1 RFIELD I renam
91. inc tests the additional variation explained when the term is added to a model consisting of the I terms F con tests the additional variation explained when the term is added to a model consisting of the I and C c terms The terms are ignored for both F inc and F con tests Incremental F statistics calculation of Denominator degrees of freedom Source Size NumDF F value lLambda F Lambda DenDF mu 1 1 245 1409 245 1409 1 0000 5 0000 variety 3 2 1 4853 1 4853 1 0000 10 0000 227 13 4 Other ASReml output files LinNitr 1 1 110 3232 110 3232 1 0000 45 0000 nitrogen 4 a 1 3669 1 3669 1 0000 45 0000 variety LinNitr 3 2 0 4753 0 4753 1 0000 45 0000 variety nitrogen 12 4 0 2166 0 2166 1 0000 45 0000 Conditional F statistics calculation of Denominator degrees of freedom Source Size NumDF F value Lambda F Lambda DenDF mu 1 1 138 1360 138 1360 1 0000 6 0475 variety 3 2 1 4853 1 4853 1 0000 10 0000 LinNitr 1 1 110 3232 110 3232 1 0000 45 0000 nitrogen 4 2 1 3669 1 3669 1 0000 45 0000 variety LinNitr 3 2 0 4753 0 4753 1 0000 45 0000 variety nitrogen 12 4 0 2166 0 2166 1 0000 45 0000 13 4 2 The asl file The as1 file is primarily used for low level debugging It is produced when the LOGFILE qualifier is specified and contains lowlevel debugging information information when the DEBUG qualifier is also given However when a job running on a Unix system crashes with a Segmentation fault the output buffers are not flushed
92. inverse being reformed unless MAKE is spec ified this saves time when performing repeated analyses based on a particular pedi gree delete ainverse bin or specify MAKE if the pedigree is changed between runs e identities are printed in the sln and the aif file identities should be whole numbers less than 200 000 000 unless ALPHA is specified pedigree lines for parents must precede their progeny unknown parents should be given the identity number 0 if an individual appearing as a parent does not appear in the first column it is assumed to have unknown parents that is parents with unknown parentage do not need their own line in the file identities may appear as both male and female parents for example in forestry We refer the reader to the sheep genetics example on page 317 Table 8 1 List of pedigree file qualifiers qualifier description ALPHA indicates that the identities are alphanumeric with up to 225 characters otherwise by default they are numeric whole numbers lt 200 000 000 If using long alphabetic identities use SLNFORM to see the full identity in the s1n file IDIAG causes the pedigree identifiers the diagonal elements of the Inverse of the Relationship AIF and the inbreeding coefficients for the individuals calculated as the diagonal of A J and a factor with levels Parent and Nonparent indicating if the individual is a parent with progeny in the pedigree or a non p
93. is fitted as fixed to allow for the likely scenario that rather than a single population of treatment by variety effects there are in fact two populations control and treated with a different mean for each There is evidence of this prior to analysis with the large difference in mean sqrt rootwt for the two groups 14 93 and 8 23 for control and treated respectively The inclusion of tmt as a fixed effect ensures that BLUPs of tmt variety effects are shrunk to the correct mean treatment means rather than an overall mean The model for the data is given by y XT Ziu Zou Z3u3 Z4U4 Z5uU5 e 15 7 where y is a vector of length n 264 containing the sqrt rootwt values 7 corresponds to a constant term and the fixed treatment contrast and u us correspond to random variety treatment by variety run treatment by run and variety by run effects The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures var u o Iy where b is the length of u i 1 5 and var e 07I The ASReml code for this analysis is 300 15 8 Paired Case Control study Rice Bloodworm data Dr M Stevens pair 132 rootwt run 66 tmt 2 A id variety 44 A rice asd skip 1 DOPATH 1 PATH 1 sqrt rootwt mu tmt r idv variety idv variety tmt idv run idv pair idv run tmt residual idv units PATH 2 sqrt rootwt mu tmt r idv variety diag tmt id variety idv run id
94. is o MOn where and On are 15 x 1 and 6 x 1 vectors respectively and M is a 15 x 6 matrix 1 0 0 o 0 0 0 5 0 5 0 O 0 1 0 1 0 Oo Q 0 0 5 0 vo 8 i 0 0 6 0 5 0 Q f 0 0 1 Oo 0 0 0 5 0 0 0 5 0 si 0 0 5 0 0 5 0 1 0 0 0 5 0 5 9 1 0 0 0 i Q 0 0 5 0 0 0 5 i 0 0 5 Q 0 0 5 i 0 0 0 5 0 0 5 i 0 0 0 Oo OS AL 0 0 0 0 1 Q A way of fitting this model would be to put the matrix values in a file HuynhFeldt vcm and replace the model specification lines by 134 7 8 Setting relationships among variance structure parameters Supply start values because raw SSP generates bad initial values for HuynhFeldt structure because it does not fit well ASSIGN HFvcm GU INIT 45 20 45 20 20 45 20 20 20 45 20 20 20 20 45 wtO wtl wt2 wt3 wt4 Trait treat Trait treat residual units us Trait HFvcm VCM 5 19 6 HuynhFeldt vcm parameters 5 to 19 explained in terms of 6 parameters Note that if the user fits another model with differing numbers of variance structure param eters so that the variance structure parameters are renumbered then all the user needs to do to continue with the same relationships is to change the parameter_number_list parameters on the VCM line Important The VCM statement must be placed after any residual definition line s The new qualifier DESIGN on the datafile line causes ASReml to write the design matrix not including the response variable to a des file It allows ASReml to create the design matrix requi
95. is strong supporting earlier indications of the dependence between the treated and control root area Figure 15 8 303 15 8 Paired Case Control study Rice Table 15 9 Equivalence of random effects in bivariate and univariate analyses bivariate univariate effects model 15 10 model 15 7 trait variety Uy 1 8 u u trait run U 1 8 u u trait pair e 1 8u e 15 8 2 A multivariate approach In this simple case in which the variance heterogeneity is associated with the two level factor tmt the analysis is equivalent to a bivariate analysis in which the two traits correspond to the two levels of tmt namely sqrt rootwt for control and treated The model for each trait is given by Y XTj Zsu Z u e J 6t 15 9 where y is a vector of length n 132 containing the sqrtroot values for variate j j c for control and j t for treated 7 corresponds to a constant term and u and u correspond to random variety and run effects The design matrices are the same for both traits The random effects and error are assumed to be independent Gaussian variables with zero means and variance structures var u op Tss var un 0 Tes and var e o7T 139 The bivariate model can be written as a direct extension of 15 9 namely y 12 X 7 Lp Zy Uy Lo Z u 15 10 where y y yi Uy u uly Ur ul u and e el ey There is an equivalence between the effects in this b
96. labels of 16 characters long If there are large A factors so that the total across all factors will exceed 2000 you must specify the anticipated size within say 5 of the larger factors If some labels are longer then 16 characters and the extra characters are significant you must lengthen the space for each label by specifying LL c e g cross A 2300 LL 48 indicates the factor cross has about 2300 levels and needs 48 characters to hold the level names only the first 20 characters of the names are ever printed PRUNE on a field definition line means that if fewer levels are actually present in the factor than were declared ASReml will reduce the factor size to the actual number of levels Use PRUNEALL for this action to be taken on the current and subsequent factors up to but 49 5 4 Specifying and reading the data not including a factor with the PRUNEOFF qualifier The user may overestimate the size for large ALPHA and INTEGER coded factors so that ASReml reserves enough space for the list Using PRUNE will mean the extra undefined levels will not appear in the sln file Since it is sometimes necessary that factors not be pruned in this way for example in pedigree GIV factors pruning is only done if requested Normally a character in the data file will have the effect of eliminating whatever text follows on the line This means that ordinarily the character may not be included in the name of the level of an al
97. line 61 qualifiers 62 syntax 61 datasets barley asd 278 coop fmt 309 grass asd 270 harvey dat 148 nin89 asd 27 oats asd 260 orange asd 301 rat dat 144 rats asd 264 ricem asd 296 voltage asd 267 wheat asd 284 debug options 188 Denominator Degrees of Freedom 19 dense 106 design factors 106 diagnostics 17 diallal analysis 97 direct product 10 discussion list 3 Dispersion parameter 103 distribution conditional 12 marginal 12 Ecode 38 Eigen analysis 232 EM update 120 environment variable job control 65 equations mixed model 14 errors 237 Excel 42 execution time 232 F statistics 19 Factor qualifier DATE 49 DMY 49 LL Label Length 49 MDY 49 PRUNE 50 SORT 50 SORTALL 50 TIME 49 factors 41 file GIV 153 pedigree 148 Fisher scoring algorithm 13 fixed effects 5 86 Fixed format files 63 fixed terms 87 93 multivariate 146 primary 93 sparse 94 forum 3 free format 41 functions of variance components 37 201 Convert CORUH and XFA to US 204 correlation 205 linear combinations 203 syntax 201 Gamma distribution 103 GBLUP 159 Generalized Mixed Linear Models 101 genetic data 1 groups 152 links 147 models 147 qualifiers 147 relationships 148 genetic markers 71 GIV 143 153 GLM distribution Binomial 102 Gamma 103 Negative Binomial 103 Normal 102 Ordinal data 102 Poisson 103 338 INDEX GLMM 104 graphics
98. line of the data file nin89 asd the line containing the field labels The data file line row 22 column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r idv repl residual idv units The data file line can contain qualifiers that predict variety control other aspects of the analysis These qualifiers are presented in Section 5 8 31 3 4 The ASReml command file 3 4 5 The tabulate statements are optional They provide a simple way of exploring the struc ture of a data They should appear immedi ately before the model line In this case the 56 simple variety means for yield are formed and written to a tab output file See Chapter 9 for a discussion of tabulation Tabulation 3 4 6 The linear mixed model is specified as a list of model terms and qualifiers All elements must be space separated ASReml accommo dates a wide range of analyses See Section 2 1 for a brief discussion and general algebraic formulation of the linear mixed model The model specified here for the NIN data is a sim ple random effects RCB model having fixed va riety effects and random replicate effects The reserved word mu fits a constant term inter cept variety fits a fixed variety effect and rep1 fits a random replicate effect because the column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r idv repl residual idv units predict variety Specifying the t
99. listed first followed by permitted alternatives qualifiers action NORMAL IDENTITY LOGARITHM INVERSE allows the model to be fitted on the log inverse scale but with the residuals on the natural scale NORMAL IDENTITY is the default IBINOMIAL LOGIT IDENTITY PROBIT COMPLOGLOG TOTAL n p 1 p n Proportions or counts r ny are indicated if TOTAL specifies the variate con re Ea taining the binomial totals Proportions are assumed if no response value exceeds 1 y In j 1 A binary variate 0 1 is indicated if TOTAL is unspecified The expression for d on the left applies when y is proportions or binary The logit is the default link function The variance on the underlying scale is 77 3 3 3 underlying logistic distribution for the logit link MULTINOMIAL k CUMULATIVE LOGIT PROBIT COMPLOGLOG TOTAL n fits a multiple threshold model with t k 1 thresholds to polytomous ordinal Vij pi l uj n data with k categories assuming a multinomial distribution fri lt j lt t Typically the response variable is a single variable containing the ordinal score 1 k or a set of k variables containing counts r in the k categories The response d 2NF may also be a series of t binary variables or a series of t variables containing counts yiln yi pi If counts are supplied the total including the kth category must be given in where another variable indicated
100. lit us TrLit1234 ste C1 pus Triit1234 2 1583 2 2202 2 3077 0 16225 0 16827 0 18881E 01 15 766 11 784 24 024 0 43182 0 88424 0 19460 0 95054 1 1380 0 25006 4 6988 0 89101 2 6165 0 79486E 01 0 68664E 01 1 6644 2 3758 2 7093 6 2253 11219 11514E 01 60077E 01 23849 26281 19102E 01 63142 16291 53335 35085E 02 0 18892 0 13069 1 5643 1 5478 0 75138 0 13421 0 16539 0 38619E 02 15 172 11 107 22 468 0 40378 OS OS SS D OOOO aoa aacstan s a PPP WWWYNY YD BPWNFPWNHRPENDE WS 33589 37368 63232 32 85E 01 47001E 01 59274E 02 31286 37589 63510 33038E 01 44563E 01 55003E 02 29825 37755 37255E 01 adaz 10759 14261 12431E 01 10198 51205E 01 64586 85213 1 5966 T3359E 01 11109 14996E 01 44400 64674 76604E 01 34354 16518 27002 233839E 01 12314 65488E 01 37542 43280 75145 37770E 01 54770E 01 7T0075E 02 satel 31755 50789 28124E 01 ooo oo coco Cocooooocooo oo Oo oO Oo Oo OC 0 8 oo Ooo Coo OOo OO CoO oO oO oOo oO oO Oo Oo Oo Oo So 326 1 53980 2 55497 310141E 01 450851E 01 191030E 01 T21026E 01 794020 417001E 01 897161 466606 811102 730000 609258E 01 786132E 02 220000 1 55000 Da 760000 391773 15 10 Multivariate animal genetics data Sheep 100 resid 64 0 88137 0 35903E 01 101 resid 65 0 17958 0 41634E 02 102 resid 66 0 89091 0 28008
101. model fitting Ir Tr Anim Tr Lit f Tr HYS without LAST the location of singularities will almost surely change if the G structures for Tr Anim or Tr Lit are changed invalidating Like lihood Ratio tests between the models performs the outlier check described on page 17 This can have a large time penalty in large models supplies the name of a program supplied by the user in association with the OWN variance model page 127 causes ASReml to print the transformed data file to basename asp If n lt 0 data fields 1 mod n are written to the file n 0 nothing is written n 1 all data fields are written to the file if it does not exist n 2 all data fields are written to the file overwriting any previous contents n gt 2 data fields n t are written to the file where tis the last defined column sets hardcopy graphics file type to png sets hardcopy graphics file type to ps modifies the format of the tables in the pvs file and changes the file extension of the file to reflect the format PVSFORM 1 is TAB separated pvs gt _pvs txt PVSFORM 2 is COMMA separated pvs gt _pvs csv PVSFORM 3 is Ampersand separated pvs gt _pvs tex See TXTFORM for more detail instructs ASReml to write the transformed data and the residuals to a binary file The residual is the last field The file basename srs is written in single precision unless the argument is 2 in which case basename drs is written in doubl
102. model terms it is often useful or appropriate to consider a partitioning of the vector of residual errors e according to some conditioning factor We use the term section to describe this partitioning and the most common example of the use of sections in e is when we wish to allow sections in the data to have different variance structures For example in the analysis of multi environment trials METs it is natural to expect that each trial will require a separate possibly spatial error structure In this case for s sections we have e e e e assuming that the data vector is ordered by section and where e represents the vector of errors for the j section 2 1 5 R structure for the residual error term T For e partitioned as e e e5 e sum structure with we allow the matrix R to have a similar direct R 0 0O 0 0 R 0 0 R ja Rys 2 to 0 oO Rn 0 0 0 oi O R for s gt 1 sections and the data ordered by section Note that it may be necessary to re order re number the data units in order to achieve this structure In ASReml it is now straightforward to apply possibly different variance structures to each component of R In many cases the residual errors e can be expected to share a common variance structure In this case there is only one section s 1 Typically a variance structure is specified for each random model term and often more complex models than the simple IID model
103. not generated so the ra tios are not numbered and cannot be used to derive other functions To avoid numbering confusion it is better to include H functions at the end of the VPREDICT block In the example H herit 4 3 or H herit genvar phenvar calculates the heritability by calculating component 4 from second line component 3 from first line that is genetic variance phenotypic variance S label 1 77 when 1 7 are assumed positive variance parameters inserts components which are the SQRT of components 2 7 X label i k inserts a component being the product of components 7 and k X label i j k inserts j i 1 components being the products of components 7 j and k X label i 7 k 1 inserts a set of j i 1 components being the pairwise products of components t jgandk l The S and X functions are new in ASReml Release 4 The multiply option X allows a correlation in a CORUV structure to be converted to a covariance The SQRT option allows conversion of CORGH to US provided the dimension is moderate say lt 10 The variances and covariances are calculated using a Taylor series expansion Then for parameters uv and v derived from the set of parameters v with variance matrix V if Va falv and vw falv then if dv fav and if dv Shiv then cou va Up dv V v 12 2 2 Convert CORUH and XFA to US V label i zj where i j spans a CORUH variance structure inserts the US matrix based on the CORUH parameters
104. now di rectly supports Arthur Gilmour and Sue Welham for further computational developments and research on the analysis of mixed models Release 4 of ASReml was first distributed in 2014 A major enhancement in this release is the introduction of an alternative functional specification of linear mixed models For the convenience of users three documents have been prepared 7 a guide to Release 4 using the original still supported model specifica tion ii this document which is a guide using the new functional model specification and iii a document ASReml Update What s new in Release 4 which highlights the changes from Release 3 Linear mixed effects models provide a rich and flexible tool for the analysis of many data sets commonly arising in the agricultural biological medical and environmental sciences Typical applications include the analysis of un balanced longitudinal data repeated measures anal ysis the analysis of un balanced designed experiments the analysis of multi environment trials the analysis of both univariate and multivariate animal breeding and genetics data and the analysis of regular or irregular spatial data ASReml provides a stable platform for delivering well established procedures while also deliv ering current research in the application of linear mixed models The strength of ASReml is the use of the Average Information Al algorithm and sparse matrix methods for fitting the linear mixed model This en
105. number in the data file use the D transformation in association with the VO transformation forms a set of orthogonal polynomials of order n based on the unique values in variate or factor v and any additional interpolated points see PPOINTS and PVAL in Table 5 4 It includes the intercept if n is positive omits it if n is negative For example pol time 2 forms a design matrix with three columns of the orthogonal polynomial of degree 2 from the variable time Alternatively pol time 2 is a term with two columns having centred and scaled linear coefficients in the first column and centred and scaled quadratic coefficients in the second column The actual values Robson 1959 Steep and Torrie 1960 of the coefficients are written to the res file This factor could be interacted with a design factor to fit random regression models The leg function differs from the pol function in the way the quadratic and higher polynomials are calculated defines the covariable x 0 for use in the model where x is a variable in the data p is a power and o is an offset pow z 0 5 0 is equivalent to sqr a2 0 pow z 0 0 is equivalent to log 0 pow z 1 0 is equiva lent to inv a o 99 6 6 Alphabetic list of model functions Table 6 2 Alphabetic list of model functions and descriptions model function action qtl f r sin v 7r spl v k s u k sqrt v r Trait units uni fl 0
106. of 7 and prediction of u although the latter may not always be of interest for given a and ao The other process involves estimation of these variance parameters 2 2 1 Estimation of the variance parameters Estimation of the variance parameters is carried out using residual or restricted maximum likelihood REML developed by Patterson and Thompson 1971 An historical develop ment of the theory can be found in Searle et al 1992 Note firstly that y N X7 H 2 10 where H ZG o Z R o REML does not use 2 10 for estimation of variance parameters but rather uses a distribution free of 7 essentially based on error contrasts or residuals The derivation given below is presented in Verbyla 1990 We transform y using a non singular matrix L L L such that LiX I L X 0 Y n 7 LIAL LHL Y gt 0 IHL Li HL The full distribution of L y can be partitioned into a conditional distribution namely y Yyo for estimation of T and a marginal distribution based on y for estimation of a and o the latter is the basis of the residual likelihood The estimate of T is found by equating y to its conditional expectation and after some algebra we find 7 X H X X H y 12 2 2 Estimation Estimation of k oj of is based on the log residual likelihood 1 lp log det LHL y L H Lo y 1 5 log det X H X log det H y Py 2 11 where P H H X X H X X H Note tha
107. of objects produced with each ASReml run and where to find them in the output files Table 13 2 ASReml output objects and where to find them output object found in comment This table contains Wald F statistics for each term in the fixed part of the model These provide for an incremen tal or optionally a conditional test of significance see Section 6 11 Wald F statistics table asr file 240 13 5 ASReml output objects and where to find them Table 13 2 Table of output objects and where to find them ASReml output object found in comment data summary eigen analysis elapsed time fixed and random effects heritability histogram of residuals intermediate results mean variance relation ship asr file ass file res file asr file asl file sln file pvc file res file asl file res file includes the number of records read and retained for analysis the minimum mean maximum number of zeros number of missing values per data field fac tor variate field distinction An extended report of the data is written to the ass file if the SUM qualifier is specified It includes cell counts for factors histograms of variates and simple correlations among variates When ASReml reports a variance matrix to the asr file it also reports an eigen analysis of the matrix eigen values and eigen vectors to the res file this can be determined by comparing the start t
108. on going testing of the software and numerous helpful discussions and insight Dave Butler has developed the ASReml R package Alison contributed to the development of many of the approaches for the analysis of multi section trials We also thank Ian White for his contribution to the spline methodology and Simon Harding for the licensing and installa tion software and for his development of the user interface program ASReml W The Mat rn function material was developed with Kathy Haskard and Brian Cullis and the denomina tor degrees of freedom material was developed with Sharon Nielsen a Masters student with Brian Cullis Damian Collins contributed the PREDICT PLOT material Greg Dutkowski has contributed to the extended pedigree options The asremload d11 functionality is provided under license to VSN Alison Kelly has helped with the review of the XFA models Finally we especially thank our close associates who continually test the enhancements Arthur Gilmour acknowledges the grace of God through Jesus Christ our Saviour In Him are hidden all the treasures of wisdom and knowledge Colossians 2 3 ill Contents Preface i List of Tables xiii List of Figures XV 1 Introduction 1 1 1 What ASReml can do 6 ke bk ew we bE ae CRS Re CEE EES 1 1 2 OGIO ena oe eee a ee da er dea el Kee E a pG 2 1 3 User Interlace oc eoa popoe a e e E a EE Ew REE Rw a ao OS 2 lasi ASREMIAN e oei seck a e a SE e a a Ee es Tea 2 LIA ConTEAT 26 eeeN ee aiei e de
109. on the basis of smallest SE or SED is not recommended because the model is not necessarily fitting the variability present in the data The predict statement included the qualifier TWOSTAGEWEIGHTS This generates an extra table in the pvs file which we now display for each model 291 15 7 Unreplicated early generation variety trial Wheat Table 15 7 Summary of models for the Slate Hall data REML number of Wald model log likelihood parameters F statistic SED AR1xAR1 700 32 3 13 04 59 0 AR1xAR1 units 696 82 4 10 22 60 5 IB 707 79 4 8 84 62 0 Predicted values with Effective Replication assuming Variance 38754 26 Heron 1 1257 98 22 1504 Heron 2 1501 45 20 6831 Heron 3 1404 99 22 5286 Heron 4 1412 57 22 1623 Heron 5 1514 48 21 1830 Heron 26 1592 02 26 0990 Predicted values with Effective Replication assuming Variance 45796 58 Heron 1 1245 58 23 8842 Heron 2 1516 24 22 4423 Heron 3 1403 99 24 1931 Heron 4 1404 92 24 0811 Heron 5 1471 61 23 2995 Heron 25 1573 89 26 0505 Predicted values with Effective Replication assuming Variance 8061 808 Heron i 1283 59 4 03145 Heron 2 1549 01 4 03145 Heron 3 1420 93 4 03145 Heron 4 1451 86 4 03145 Heron 5 1533 27 4 03145 Heron 25 1630 63 4 03145 The value of 4 for the IB analysis is clearly reasonable given there are 6 actual replicates but this analysis has used up 48 degrees of freedom for the rowblk and colb1k effects The precision from t
110. options 188 GRM 143 help via email 3 heritability 232 IID 10 inbreeding coefficients 150 214 Incremental F Statistics 19 Information Criteria 17 information matrix 13 expected 13 observed 13 input file extension BIN 43 DBL 43 bin 41 43 csv 41 dbl 41 43 pin 207 interactions 95 Introduction 18 job control options 189 qualifiers 65 key output files 210 likelihood comparison 211 convergence 68 log residual 13 offset 211 residual 12 longitudinal data 1 balanced example 300 marginal distribution 12 Mat rn variance structure 135 measurement error 112 MERGE 198 MET 7 meta analysis 1 missing values 41 99 105 215 NA 41 in explanatory variables 105 in response 105 mixed effects 5 model 5 mixed model 5 equations 14 multivariate 145 specifying 32 model animal 147 318 correlation il covariance 11 formulae 87 sire 147 model building 127 moving average 98 multi environment trial 1 7 multivariate analysis 144 295 example 308 half sib analysis 308 Nebraska Intrastate Nursery 25 Negative binomial 103 non singular matrices 132 NRM 143 objective function 14 observed information matrix 13 operators 90 options command line 185 ordering of terms 106 Ordinal data 102 orthogonal polynomials 99 outliers 233 output files 34 objects 231 output file extension aov 208 216 apj 208 ask 209 asl
111. outlined Gilmour et al 1995 AS Reml orders the equations in the sparse part to maintain as much sparsity as it can during the solution After absorbing them it absorbs the model terms associated with the dense equations in the order specified 6 10 3 Aliassing and singularities A singularity is reported in ASReml when the diagonal element of the mixed model equations is effectively zero see the TOLERANCE qualifier during absorption It indicates there is either e no data for that fixed effect or e a linear dependence in the design matrix means there is no information left to estimate 106 6 10 Some technical details about model fitting in ASReml the effect ASReml handles singularities by using a generalized inverse in which the singular row column is zero and the associated fixed effect is zero Which equations are singular depends on the order the equations are processed This is controlled by ASReml for the sparse terms but by the user for the dense terms They should be specified with main effects before interactions so that the table of Wald F statistics has correct marginalization Since ASReml processes the dense terms from the bottom up the first level the last level processed is typically singular The number of singularities is reported in the asr file immediately prior to the REML log likelihood LogL line for that iteration see Section 13 3 The effects and associated standard or prediction error which cor
112. reading error if n is omitted and then process the records it has This allows data to be extracted from a file which contains trailing non data records for example extracting the predicted values from a pvs file The argument n specifies the number of data records to be read If not supplied ASReml reads until a data reading error occurs and then processes the data it has Without this qualifier ASReml aborts the job when it encounters a data error See RSKIP 64 5 8 Job control qualifiers Table 5 2 Qualifiers relating to data input and output qualifier action RSKIP mie allows ASReml to skip lines at the heading of a file down to and includ ing the nth instance of string s For example to read back the third set predicted values in a pvs file you would specify RREC RSKIP 4 Ecode since the line containing the 4th instance of Ecode immediately pre cedes the predicted values The RREC qualifier means that ASReml will read until the end of the predict table The keyword Ecode which occurs once at the beginning and then immediately before each block of data in the pvs file is used to count the sections 5 7 1 Combining rows from separate files ASReml can read data from multiple files provided the files have the same layout The file specified as the primary data file in the command file can contain lines of the form INCLUDE lt filename gt SKIP n where lt filename gt is the
113. s A figure is produced which reports the trends in 0 with increasing distance for each sector ASReml also computes the variogram from predictors of random effects which appear to have a variance structures defined in terms of distance The variogram details are reported in the res file 2 5 Inference Fixed effects 2 5 1 Introduction Inference for fixed effects in linear mixed models introduces some difficulties In general the methods used to construct F tests in analysis of variance and regression cannot be used for the diversity of applications of the general linear mixed model available in ASReml One approach would be to use likelihood ratio methods see Welham and Thompson 1997 although their approach is not easily implemented Wald type test procedures are generally favoured for conducting tests concerning T The traditional Wald statistic to test the hypothesis Hp Lr l for given L r xp andl r x 1 18 2 5 Inference Fixed effects is given by W LF I E X A 1X L Y HL l 2 24 and asymptotically this statistic has a chi square distribution on r degrees of freedom These are marginal tests so that there is an adjustment for all other terms in the fixed part of the model It is also anti conservative if p values are constructed because it assumes the variance parameters are known The small sample behaviour of such statistics has been considered by Kenward and Roger 1997 in some detail They present
114. sample of the data the asr file and the as1 file produced by the debug options d1 running asreml dl basename as In this chapter we show some of the common NIN Alliance Trial 1989 coding problems The code box on the right variety shows our familiar job modified to generate 8 id pid raw faults Following is the output from running T P ss loc yield hi 7 y i s job lat long row column nin9 asd slip 1 yield mu variety IR Repl residual ar1 Row ar1 Col predict varierty ASReml 4 1 01 Apr 2014 NIN alliance trial 1989 Build kt 21 Apr 2014 64 bit Windows x64 23 Apr 2014 09 16 54 727 32 Mbyte ninerri Folder C Users Public ASRem1 Docs Manex4 ERR There is no file called nin9 asd Variable names may not include Warning Unrecognised qualifier at character 10 nin9 asd SLIP 1 17 Error Failed to recognise a data file Check spelling of filename and enclose the name in quotes Fault Error parsing yield mu variety Last line read was yield mu variety Currently defined structures COLS and LEVELS 246 14 4 An example 1 variety a 2 0 0 0 0 2 aid 1 1 0 0 0 0 3 pid i A 0 0 0 0 4 raw 1 al 0 0 0 0 amp repl 1 2 0 0 0 0 6 nloc 1 1 0 0 0 0 7 yield 1 1 0 0 0 0 8 lat 1 1 0 0 0 0 9 long 1 1 0 0 0 0 10 row 1 2 0 0 0 0 11 column 1 2 0 0 0 0 ninerr1 C Users Public ASRem1 Docs Manex4 ERR 11 factors defined max5000 O variance parameters max2500 2 special structures L
115. statement is supplied in the as file the REML log likelihood is given for each iteration The REML log likelihood should have converged and in binary form in dpr file these are printed in col umn 3 Furthermore for multivariate analyses the resid uals will be in data order traits within records How ever in a univariate analysis with missing values that are not fitted there will be fewer residuals than data records there will be no residual where the data was missing so this can make it difficult to line up the values unless you can manipulate them in another program spreadsheet given if the DL command line option is used simple averages of cross classified data are produced by the tabulate directive to the tab file Adjusted means predicted from the fitted model are written to the pvs file by the predict directive based on the inverse of the average information matrix the values at each iteration are printed in the res file The final values are arranged in a table printed with labels and converted if necessary to variances 242 13 5 ASReml output objects and where to find them Table 13 2 Table of output objects and where to find them ASReml output object found in comment 243 14 Error messages 14 1 Introduction Identifying the reason that ASReml does not produce the anticipated results can be a frus trating business This chapter aims to assist you by discussing four kinds of errors
116. terms to test which tests its contribution after all other terms in terms to test and background terms conditional on all terms that appear in the SPARSE equations It should only specify terms which will appear in the table of Wald F statistics For example FOWN ABC mu IFOWN A B B C A C mu ABC FOWN A B C mu ABC A BB C A C would request the Wald F statistics based on see page 19 A mu B C sparse B mu A C sparse mu A B sparse mu A BC B C A C sparse mu A B C A B A C sparse mu A B C A B B C sparse and mu ABC A B A C B C sparse DHAAAAADW 78 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action GDENSE 1GLMM n HPGL 2 HOLD list Warnings e For computational convenience ASReml calculates FOWN tests using a full rank parameterization of the fitted model with rank numerator de grees of freedom NumDF of terms generated by the incremental Wald F tests e Unfortunately if some terms in the implicit model defined by the re quested FOWN test would have more or less NumDF than are present in the full rank parameterization because aliased effects are reordered it can not be calculated correctly from the full rank parameterization In this case ASReml reverts to the conditional test but identifies the terms that need to be reordered in the fitted model to obtain the FOWN test s specified It is necessary to rerun ASR
117. that direct product R structure does not match the multivariate data structure Maybe a trait name is repeated 263 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Negative Sum of Squares NFACT out of range No giv file for No residual variation Out of Out of memory Out of memory forming design Overflow forming PRESENT table Overflow structure table Pedigree coding errors Pedigree factor has wrong size Pedigree too big or in error POWER model setup error POWER Model Unique points disagree with size PROGRAM failed in This is typically caused by negative variance parameters try changing the starting values or using the STEP option If the problem occurs after several iterations it is likely that the vari ance components are very small Try simplifying the model In multivariate analyses it arises if the error variance is becomes negative definite Try specifying GP on the structure line for the error variance too many terms are being defined Fix the argument to giv after fitting the model the residual variation is essentially zero that is the model fully explains the data If this is intended use the BLUP 1 qualifier so that you can see the estimates Otherwise check that the dependent values are what you intend and then identify which v
118. that only appear in random model terms are not included in the averaging set unless specified with the AVERAGE ASSOCIATE or PRESENT qualifiers Explicit weights may be supplied directly or from a file The default is equal weights weights can be expressed like 3 1 0 2 1 5 to represent the sequence 0 2 0 2 0 2 0 0 2 0 2 The string inside the curly brace is expanded first and the expression n c means n occurrences of c When there are a large number of weights it may be convenient to prepare them in a file and retrieve them All values in the file are taken unless n is specified in which case they are taken from field column n is used to control averaging over associated factors The default is to simply average at the base level Hierarchal averaging is achieved by listing the associated factors to average in f Explicit weights may be supplied directly or from a file as for AVERAGE without arguments means all classify variables are expanded in parallel Oth erwise list the variables from the classify set whose levels are to be taken in parallel is used when averaging is to be based only on cells with data v is a list of variables and may include variables in the classify set v may not include variables with an explicit AVERAGE qualifier The variable names in v may optionally be followed by a list of levels for inclusion if such a list has not been supplied in the specification of the classify set ASReml works ou
119. the model a term in the model specification is not among the terms that have been defined Check the spelling there is a problem with the named variable The second field in the R structure line does not refer to a variate in the data the weight and filter columns must be data fields Check the data summary See the discussion of AISINGULARITIES Maybe increase workspace or restructure simplify the model Numerical problems calculating the Mat rn function If rescal ing the X Y cordinates so that the step size is closer to 1 0 does not resolve the issue try AEXP instead special structures are weights the Ainverse and GIV structures The limit is 98 and so no more than 96 GIV structures can be defined The limit is 1500 It may be possible to restructure the job so the limit is not exceeded assuming that the actual number of parameters to be estimated is less ASReml failed to read the first data record Maybe it is a head ing line which should be skipped by using the SKIP qualifier or maybe the field is an alphanumeric field but has not been declared so with the A qualifier You need to identify which design terms contain missing values and decide whether to delete the records containing the missing values in these variables or if it is reasonable to treat the missing values as zero by using MVINCLUDE More missing values in the response were found than expected missing observations have been dropped so
120. the right is the ASReml win Alliance Trial 1989 command file nin89a as for aspatial analysis variety A Alphanumeric of the Nebraska Intrastate Nursery NIN field id experiment introduced Chapter 3 The lines P that are highlighted in bold blue type relate wae 4 to reading in the data In this chapter we use it this example to discuss reading in the data in yield detail lat long row 22 column 11 nin89aug asd skip 1 yield mu variety residual idv units Notice the in line comment indicated by the 5 2 Important rules In the ASReml command file e all blank lines are ignored e is used to annotate the input all characters following a symbol on a line are ignored e lines beginning with followed by a blank are copied to the asr file as comments for the output e a blank is the usual separator TAB is also a separator 44 5 2 Important rules e acomma as the last character on the line is sometimes used to indicate that the current list is continued on the next line a comma is not needed when ASReml knows how many values to read e reserved words used in specifying the linear model Table 6 1 are case sensitive they need to be typed exactly as defined they may not be abbreviated e a qualifier is a letter sequence preceded by which sets an option some qualifiers require arguments qualifiers must appear on the correct context qualifier identifiers are not case s
121. these to disk Before each iteration ASReml writes the own parameters to a file runs MYOWNGDG it assumes MYOWNGDG forms the G and derivative matrix and then reads the matrices back in An example of MYOWNGDG f90 is distributed with ASReml It duplicates the AR1 and AR2 variance structures The following job fits an AR2 structure using this program Example of using the OWN structure rep blcol blrow variety 25 yield barley asd skip 1 OWN MYOWN EXE y variety residual ari 10 own2 15 INIT 2 1 TRR F1 The file written by ASReml has extension own and appears as follows 15 2 1 0 6025860D 000 1164403D 00 This file was written by asreml for reading by your MYOWNGDG program asreml writes this file runs your program and then reads shfown gdg which it presumes has the following format The first lines should agree with the top of this file specifying the order of the matrices 15 the number of variance structure parameters 2 and a control parameter you can specify 1 These are written in 315 format They are followed by the list of variance parameters written in 6D13 7 format Follow this with 3 matrices written in 6D13 7 format These are to be each of 120 elements being lower triangle row wise of the G matrix and its derivatives with respect to the parameters in turn This file contains details about what is expected in the file written by your program The filename used has the same basename as the jo
122. to run ASReml is path ASRem1 basename as c e path provides the path to the ASReml program usually called asreml exe in a PC en vironment In a UNIX environment ASReml is usually run through a shell script called ASReml1 if the ASReml program is in the search path then path is not required and the word ASRem1 will suffice for example ASReml nin89 as will run the NIN analysis assuming it is in the current working folder if asreml exe ASRem1 is not in the search path then path is required for example if asreml exe is in the usual place then C Program Files ASRem13 bin Asreml nin89 as 192 10 2 The command line will run nin89 as e ASRem1 invokes the ASReml program e basename is the name of the as c command file The basic command line can be extended with options and arguments to path ASRem1 options basename as c arguments e options is a string preceded by a minus sign Its components control several operations batch graphic workspace at run time for example the command line ASReml w128 rat as tells ASReml to run the job rat as with workspace allocation of 128mb e arguments provide a mechanism mostly for advanced users to modify a job at run time for example the command line ASReml rat as alpha beta tells ASReml to process the job in rat as as if it read alpha wherever 1 appears in the file rat as beta wherever 2 appears and O wherever 3 appears see below 1
123. to run but does have the field names copied across 10 3 Command line options Command line options and arguments may be specified on the command line or on the top job control line This is an optional first line of the as file which sets command line options and arguments from within the job If the first line of the as file contains a qualifier other than DOPATH it is interpreted as setting command line options and the Title is taken as the next line The option string actually used by ASReml is the combination of what is on the command line and what is on the job control line with options set in both places taking values from the command line Arguments on the top job control line are ignored if there are arguments on the command line This section defines the options Arguments are discussed in detail in a following section Command line options are not case sensitive and are combined in a single string preceded by a minus sign for example LNW128 The options can be set on the command line or on the first line of the job either as a concatenated string in the same format as for the command line or as a list of qualifiers For example the command line ASReml h22r jobname 1 2 3 could be replaced with ASReml jobname if the first line of jobname as was either feh22r 1 2 3 or HARDCOPY EPS RENAME ARGS 1 2 3 Table 10 1 presents the command line options with brief descriptions It also gives the name of the equivalent qualif
124. to understand that all general qualifiers are specified here Many of these qualifiers are referenced in other chapters where their purpose will be more evident Table 5 3 List of commonly used job control qualifiers qualifier action ICONTINUE f New R4 These qualifiers are used to restart resume iterations from the IMSV f point reached in a previous run The qualifier CONTINUE f can alter ITSV f nately be set from the command line using the option letter C f see Section 11 3 on command line options In each run ASReml writes the initial values of the variance parameters to a file with extension tsv template start values with information to identify individual variance parameters After each iteration ASReml writes the current values of the variance parameters to files with extension rsv re start values and msv the msv version has information to clearly identify each vari ance parameter If f is not set then ASReml looks for a rsv file with the same name used for the output files ie the as name possibly appended by arguments ASReml then scans this file for parameter values related to the current model replacing the values obtained from the as file be fore iteration resumes If CONTINUE 2 or TSV is used then the tsv file is used instead of the rsv file Similarly if CONTINUE 3 or MSV are used then the msv file is used instead of the rsv file If f filename with no extension is used with CONTINUE
125. value This is shown in the output in that the parameter will have the code B rather than P reported in the variance component table U unrestricted U does not limit the updates to the parameter This allows vari ance parameters to go negative and correlation parameters to ex ceed 1 Negative variance components may lead to problems the mixed model coefficient matrix may become non positive definite In this case the sequence of REML log likelihoods may be erratic and you may need to experiment with starting values F fixed F fixes the parameter at its starting value Z Zero Z mainly applies to factor analytic models where specific variances and or loadings may be fixed at zero For structures with multiple parameters the form GXXXX can be used to specify F P U or Z for the parameters individually A shorthand notation allows a repeat count before a code letter Thus GPPPPPPPPPPPPPPZPPPZP could be written as G14PZ3PZP For a US model GP makes ASReml attempt to keep the matrix positive definite After each Al update it extracts the eigenvalues of the updated matrix If any are negative or zero the Al update is discarded and an EM update is performed If the highest LogL value relates to a non positive definite form for the matrix ASReml may perform hundreds of iterations and 128 7 7 Variance model function qualifiers never converge Several forms of EM update are possible see EMFLAG and the PXEM option will conv
126. variance covari ance matrix formed from BLUPs and residuals phenotypic variance plot of residuals against field position possible outliers predicted fitted values at the data points predicted values REML log likelihood residuals score tables of means variance of variance pa rameters variance parameters variogram res file pvc file graphics file res file yht file pvs file asr file yht file asl file tab file pvs file vvp file asr file res file graphics file for an interaction fitted as random effects when the first outer dimension is smaller than the inner dimension less 10 ASReml prints an observed variance matrix cal culated from the BLUPs The observed correlations are printed in the upper triangle Since this matrix is not well scaled as an estimate of the underlying variance com ponent matrix a rescaled version is also printed scaled according to the fitted variance parameters The primary purpose for this output is to provide reasonable starting values for fitting more complex variance structure The correlations may also be of interest After a multivari ate analysis a similar matrix is also provided calculated from the residuals placed in the pvc file when postprocessing with a pin file these are residuals that are more than 3 5 standard de viations in magnitude these in the are printed in the second column given if a predict
127. variance function name correlation models One dimensional equally spaced ID id identity C 1 C 0 147 0 1 w AR1 art 1 order C 1 C 1 2 l w autoregressive C C i gt j 1 l lt 1 AR2 ar2 2 order C 1 2 3 2 w autoregressive Cai p 4 C OROREN a ONONE i gt gt l 10 lt 1 b lol lt 1 AR3 ar3 3 order C 1 8 1 4 3 4 3 w autoregressive Capii B Q 5 Q ERT T Q Q bs T 1 ica 2 Ci OOE FOC Fo Czy i gt j 2 1d lt 1 m LA lt 1 PA lt 1 SAR sari symmetric Cy 1 2 l w autoregressive Cis Q 1 4 C RONE 4 Cid i gt j 1 l lt 1 SAR2 sar2 constrained as for AR3 using 2 3 2 w autoregressive b 2 3 ee YN 2 competition bs NYa 147 7 12 Variance models available in ASReml Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr hom het model variance variance function name MA1 mal 1 order C 1 1 2 1l w moving aver C 0 1 02 age EE E T 0 lt 1 MA2 ma2 2 order C 1 2 3 2 w moving ANGE Chg aS 0 1 0 1 6 62 ape Casy 0 1 6 6 Ci 0 j gt i 2 0 0 lt 1 lt 1 lt 1 ARMA arma autoregressive C 4 2 3 2 w moving aver Ciiis e 0 a p 1 A 1 age 02 20 C5 CFs is j gt i 1 la lt 1 lt 1 CORU coru uniform C 1 C 6 145 1 2 1
128. variance parameters In ASRen1 4 linear relationships among variance structure parameters can be defined through a simple linear model and by supplying a design matrix for a set of parameters The design matrix is supplied as an ascii file containing a row for each parameter in a set of contiguous parameters and a column for each new parameter This design matrix is associated with the job through a statement after the residual model definition line s of the form VCM parameter_number_list new filename where parameter_number_list is a list of parameters in the set and can be abbreviated to first and last if all the intermediate parameters are in the set new is the number of new 133 7 8 Setting relationships among variance structure parameters parameters and filename is the name of the file containing the design matrix For example the Wolfinger rats example involves modelling a 5x5 symmetric residual ma trix Wolfinger Rat data treat A wtO wtl wt2 wt3 wt4 subject VO wolfrat dat skip 1 wtO wtl wt2 wt3 wt4 Trait treat Trait treat residual units us Trait uses 15 parameters numbered 5 19 generating symmetric matrix 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Wolfinger 1996 reports the fitting of the HuynhFeldt variance structure to this data This structure is of the form Oii Oni Tij 1 2 Omi t Oni Ono j lt i lt p In the rats example the relationship between the original and new parameters
129. 0 2445 2 1563 02 244 1 36191 0 00000 172 8 11 Factor effects with large Random Regression models 2497 2 2167 01 413 1 21339 0 00000 3180 2 8668 03 42 1 21629 0 00000 3521 CL1577Contigi 03 1 15833 0 00000 3802 CL2573Contigi 03 1 17005 0 00000 4195 CL595Contigi O01 1 19330 0 00000 4351 UMN 1397 01 416 1 34916 0 00000 173 9 Tabulation of the data and prediction from the model 9 1 Introduction This chapter describes the tabulate directive and the predict directive introduced in Sec tion 3 4 under Prediction Tabulation is the process of forming simple tables of averages and counts from the data Such tables are useful for looking at the structure of the data and numbers of observations associated with factor combinations Multiple tabulate directives may be specified in a job Prediction is the process of forming a linear function of the vector of fixed and random effects in the linear model to obtain an estimated or predicted value for a quantity of interest It is primarily used for predicting tables of adjusted means If a table is based on a subset of the explanatory variables then the other variables need to be accounted for It is usual to form a predicted value either at specified values of the remaining variables or averaging over them in some way 9 2 Tabulation A tabulate directive is provided to enable simple summaries of the data to be formed for the purpose of checking the structure of the data The summar
130. 0 0 0 0 0 0 0 0 1 14 1 gt ASSIGN VARF lt diag TrAG1245 INIT 0 0024 0 0019 0 0020 0 00026 age grp diag TrSG123 IINIT 0 93 16 0 0 28 sex grp I gt PART 1 DIAGONAL FOR SIRE DAM AND LITTER UNSTRUCTURED FOR RESIDUAL wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF diag Trait SDIAGI id sire diag TrDam123 DDIAGI id dam diag TrLit1234 LDIAGI id lit lf Trait g rp residual id units us Trait RUSI PART 2 CHANGE DIAGONAL TO XFA1 FOR SIRE DAM AND LITTER wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF xfai Trait id sire xfai TrDam123 id dam xfai TrLit1234 id lit If Trait grp mv residual id units us Trait PART 3 CHANGE XFA1 TO UNSTRUCTURED FOR SIRE AND LITTER wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF us Trait id sire xfai TrDam123 id dam us TrLit1234 id lit If Trait grp mv residual id units us Trait IPART 3 VPREDICT DEFINE USING ASSIGN TO GIVE CONCISE VPREDICT ASSIGN lusT lit us TrLit1234 us TrLit1234 id lit us TrLit1234 ASSIGN susT sire us Trait us Trait id sire us Trait ASSIGN uusT id units us Trait us Trait X Damv xfai TrDam123 defines 54 59 phen uusT 1 6 susT 1 6 lusT 1 6 Damv defines 1 6 elements of phen defines 60 65 136 23 28 44 49 54 59 phen uusT 7 10 susT 7 10 lusT 7 10 defines 7 10 elements of phen defines 66 69
131. 0 2 2 Processing a pin file If the filename argument is a pin file see Chapter 12 then ASReml processes it If the pinfile basename differs from the basename of the output files it is processing then the basename of the output files must be specified with the P option letter Thus ASReml border pin will perform the pinfile calculations defined in border pin on the results in files border asr and border vvp ASReml Pborderwwt border pin will perform the pinfile calculations defined in border pin on the results in files borderwwt asr and borderwwt vvp 10 2 3 Forming a job template from a data file The facility to generate a template as file was introduced in section 3 4 1 Normally the name of a as command file is specified on the command line If a as file does not exist and a file with file extension asd csv dat gsh txt or xls is specified ASReml assumes the data file has field labels in the first row and generates a as file template First it seeks to convert the gsh Genstat or xls Excel see page 42 file to csv format In generating 193 10 3 Command line options the as template ASReml takes the first line of the csv or other file as providing column headings and generates field definition lines from them If some labels have appended these are defined as factors otherwise ASReml attempts to identify factors from the field contents The template needs further editing before it is ready
132. 00 0 9193 6 LogL 182 979 S2 301 45 60 df 1 0000 0 9190 Final parameter values 1 0000 0 9190 Results from analysis of yi y3 y5 y7 y10 Akaike Information Criterion 369 96 assuming 2 parameters Bayesian Information Criterion 374 15 Model_Term Gamma Sigma Sigma SE C id units exp Trait 70 effects Residual SCA_V 70 1 000000 301 449 3 12 OP Trait EXP_P 1 0 919007 0 919007 29 49 OP When fitting power models be careful to ensure the scale of the defining variate here time does not result in an estimate of too close to 1 For example use of days in this example would result in an estimate for of about 993 Residuals plotted against Row and Column position 1 Range 45 11 34 86 O00 p ooo j 4 B 8 006 9 ao 6 oo Figure 15 4 Residual plots for the EXP variance model for the plant data The residual plot from this analysis is presented in Figure 15 4 This suggests increasing variance over time This can be modelled by using the EXPH model which models by y Dc p 282 15 5 Balanced repeated measures Height where D is a diagonal matrix of variances and C is a correlation matrix with elements given by qj gl 4l The coding for this is yl y3 yS y7 y10 Trait tmp Tr tmt residual id units exph Trait INIT 0 5 100 200 300 300 300 COORD 1 3 5 7 10 Abbreviated output from this analysis is 9 LogL 171 512 S2 1 00000 60 df 10 LogL 171 500 52 1 00000 60 df 11 LogL
133. 0000 0 1613 0 2096 0 2162E 01 0 8451 1 298 1 687 0 1243 1 0000 And the eigen analysis in the res file is Eigen Analysis of XFA matrix for xfal TrDam123 id dam Eigen values 4 704 0 246 0 006 Percentage 94 919 4 957 0 124 1 0 6431 0 7647 0 0009 2 0 7637 0 6404 0 0743 3 0 0563 0 0484 0 9972 showing that the smallest eigenvalue is 0 006 On the basis of this ASReml with ARG 3 fits unstructured matrices for sire and litter and xfa1 for dam using initial values derived from the previous analysis in coopmf2 rsv Portions of the asr file from the Path 3 run are Notice ReStartValues taken from coopmf2 rsv Notice LogL values are reported relative to a base of 20000 000 Notice US matrix updates modified 1 time s to keep them positive definite Notice 1084 singularities detected in design matrix 1 LogL 1488 11 S2 1 00000 18085 df 11 components restrained 2 LogL 1486 27 S2 1 00000 18085 df 2 components restrained 3 LogL 1483 34 52 1 00000 18085 df 1 components restrained 4 LogL 1481 89 S2 1 00000 18085 df 5 LogL 1481 10 S2 1 00000 18085 df 6 LogL 1480 91 S2 1 00000 18085 df 7 LogL 1480 89 S2 1 00000 18085 df 8 LogL 1480 89 S2 1 00000 18085 df 9 LogL 1480 89 S2 1 00000 18085 df Results from analysis of wwt ywt gfw fdm fat Notice US structures were modified 1 times to make them positive definite If ASReml has fixed the structure flagged by B it may not have converged to a maximum likelihood sol
134. 00000000000 12 3 0 9672D 01 452 0 2492 0 07906 0 07295 0O 1001100000000000 13 3 0 9579D 01 452 0 2494 0 07830 0 072188 0 0O 101100000000000 14 3 0 9540D 01 452 0 2495 0 07797 0 0718 0 0 011100000000000 15 3 0 1089D 02 452 0 2465 0 08907 0 083022 10010100000000000 16 3 0 2917D 01 452 0 2642 0 02384 0 01736 010 10100000000000 17 3 0 2248D 01 452 0 2657 0 01838 0 01187 00 1 10100000000000 18 3 0 1111D 02 452 0 2460 0 09088 0 08484 1 0100100000000000 19 3 0 1746D 01 452 0 2668 0 01427 0 00773 0 1100100000000000 20 3 0 1030D 02 452 0 2478 0 08423 0 07815 11000 100000000000 21 3 0 1279D 02 452 0 2423 0 10454 0 09890 10000110000000000 22 3 0 8086D 01 452 0 2527 0 06609 0 05989 0 1 000110000000000 23 3 0 7437D 01 452 0 2542 0 06079 0 05456 00100110000000000 24 3 0 1071D 02 452 0 2469 0 08755 0 08149 0 0 010110000000000 25 3 0 1370D 02 452 0 2403 0 11200 0 10611 00001110000000000 SK SK 26 3 0 1511D 02 452 0 2372 0 12351 0 11770 10001010000000000 SOK SOK 27 3 0 1353D 02 452 0 2407 0 11064 0 10473 0 1 001010000000000 680 3 0 1057D 02 452 0 2472 0 08641 0 08035 1 1 000000000000001 The primary tables reported in the asr file are now also written in XML format to a xml file The intended use of this file is by programs written to parse Asreml output The information contained in the xm1 file includes start and finish times the data summary the iteration sequence summary the summary of estimated variance structure parameters and the Wald F statistics Develop
135. 03 for examples of factor analytic models in multi environment trials The general limitations are 144 7 11 Variance model functions available in ASReml that W may not include zeros except in the XFAk formulation constraints are required in I for k gt 1 for identifiability These are automatically set unless the user formally constrains one parameter in the second column two in the third column etc the total number of estimated parameters kw w k k 1 2 may not exceed w w 1 2 In FAk models the variance covariance matrix X is modelled on the correlation scale as X DCD where D is diagonal such that DD diag X C is a correlation matrix of the form F F E where F is a matrix of loadings on the correlation scale and F is diagonal and is defined by difference the parameters are specified in the order loadings for each factor F followed by the variances diag when k is greater than 1 constraints on the elements of F are required see Table 7 5 FACVk models CV for covariance are an alternative formulation of FA models in which is modelled as TT W where T is a matrix of loadings on the covariance scale and W is diagonal The parameters in FACV are specified in the order loadings T followed by variances W when k is greater than 1 constraints on the elements of I are required see Table 7 5 are related to those in FA by I DF and Y D
136. 1 Unstructured 15 158 04 377 50 The split plot in time model can be fitted in two ways either by fitting a units term plus an ind residual as above or by specifying a CORU variance model for the R structure as follows yl y3 yS y7 y10 Trait tmt Tr tmt residual id units coru Trait The two forms for are given by So of J 031 units 15 3 E oI o p J TI CORU i It follows that o o Gi 15 4 P ma Portions of the two outputs are given below The REML log likelihoods for the two models are the same and it is easy to verify that the REML estimates of the variance parameters satisfy 15 4 viz o 286 310 159 858 126 528 286 386 159 858 286 386 0 558191 280 15 5 Balanced repeated measures Height r idv units Trait LogL 204 593 S2 224 61 60 df 0 1000 1 000 LogL 201 233 S2 186 52 60 df 0 2339 1 000 LogL 198 453 S2 155 09 60 df 0 4870 1 000 LogL 197 041 52 133 85 60 df 0 9339 1 000 LogL 196 881 S2 127 86 60 df 1 204 1 000 LogL 196 877 S2 126 53 60 df 1 261 1 000 Final parameter values 1 2634 1 0000 Results from analysis of yl y3 y5 y7 y10 Akaike Information Criterion 397 75 assuming 2 parameters Bayesian Information Criterion 401 9 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv units 12 00 925 584 5 0 1 9 Residual Variance 48 00 126 494 0 0 1 0 Model_Term Gamma Sigma Sigma SE C idv units IDV_V 14 1
137. 1 model for the Tullibigeal data The abbreviated output for this model and the final model in which a nugget effect has been included is AR1xAR1 pol column 1 1 LogL 4271 06 82 gt 0 12731E 06 665 df 2 LogL 4259 03 52 0 11963E 06 665 df 3 LogL 4245 41 S2 0 10556E 06 665 df 4 LogL 4229 98 S2 78754 665 df 5 LogL 4226 66 52 75970 665 df 6 LogL 4226 29 52 T7975 665 df 7 LogL 4226 25 52 78313 665 df 8 LogL 4226 25 52 78396 665 df 9 LogL 4226 25 S2 78419 665 df 295 15 7 Unreplicated early generation variety trial Wheat Tullibigeal trial Batts 26 aud odo2 19 03 22 Outer displacement Wer displacement Figure 15 7 Sample variogram of the residuals from the ARIxAR1 pol column 1 model for the Tullibigeal data 10 LogL 4226 25 S2 78425 665 df Results from analysis of yield Akaike Information Criterion 8460 50 assuming 4 parameters Bayesian Information Criterion 8478 50 Model_Term Gamma Sigma Sigma SE C idv variety IDV_V 532 112313 88081 9 9 81 0 P ar1 row ar1 column 670 effects Residual SCA_V 670 1 000000 78425 4 8 83 OP row AR_R 1 0 665872 0 665872 15 37 OP column AR_R 1 0 266047 0 266047 363 0 P Wald F statistics Source of Variation NumDF DenDF F ingc P ine 7 mu i 42 5 7149 90 lt 001 3 weed 1 459 0 92 14 lt 001 8 pol column 1 1 62 1 t 261 0 008 AR1xAR1 units pol column 1 1 LogL 4272 85 S2 0 11684E 06 665 df 2 LogL 4265 70 S2 83872 66
138. 1017E 01 0 508505E 01 0 Q000000E 00 0 393519E 01 0 430418E 01 0 423685E 01 0 428749E 01 0 417784E 01 0 363262E 01 0 444716E 01 0 527187E 01 239 13 5 ASReml output objects and where to find them 0 855044E 01 0 243553E 01 0 Q000000E 00 0 351279E 01 0 369901E 01 0 383964E 01 0 330102E 01 0 361942E 01 0 352305E 01 0 359462E 01 0 392014E 01 0 406704E 01 0 801337E 01 0 475798E 01 0 000000E 00 0 370878E 01 0 418534E 01 0 452789E 01 0 408589E 01 0 446476E 01 0 375742E 01 0 403945E 01 0 420473E 01 0 406937E 01 0 403049E 01 0 857644E 01 0 606943E 00 0 000000E 00 0 428611E 01 0 506706E 01 0 432088E 01 0 387484E 01 0 436861E 01 0 391305E 01 0 421110E 01 The first 5 rows of the lower triangular matrix are 48 7026 0 0000 2 9841 4 7063 0 3155 0 0000 0 0000 0 0000 0 0000 8 0735 4 5654 8 8650 4 0995 4 7648 8 7656 13 4 12 The vvp file The vvp file contains the inverse of the average information matrix on the components scale The file is formatted for reading back under the control of the pin file described in Chapter 12 The matrix is lower triangular row wise in the order the parameters are printed in the asr file This is nin89a vvp with the parameter estimates in the order error variance spatial row correlation spatial column correlation Variance of Variance components 3 51 1980 0 217689 0 317838E 02 0 673382E 01 0 201115E 02 0 649673E 02 13 5 ASReml output objects and where to find them Table 13 2 presents a list
139. 103 81 61 81 130 94 10 55 53 55 106 15 109 153 23 0 50 66 111 29 75 43 24 90 37 23 64 130 84 122 129 126 90 38 91 133 126 16 57 30 70 99 114 218 332 174 77 19 38 29 58 63 88 4 124 49 101 129 113 45 92 70 198 257 333 352 319 253 166 152 52 28 0 97 135 67 16 9 36 96 24 62 48 27 29 227 167 356 335 183 179 189 118 124 14 52 19 7 56 81 33 63 40 57 15 24 73 183 277 352 323 288 151 56 130 188 29 78 7 12 30 39 57 89 3 116 27 2 64 j yeaa E aao saros a il aii na in sa iba dl eens feed pee ees 3 sal Sgt gg 222 aaa gt 2 S f re rE a xsl jas is 5 wt al a E a a e a gg a a a oa a a 5 E al 5 2 EE 3 5 Mi Pe srs ih ian i cl cis pa PENER Ia a aml hi se as oak oa p7 l E Aa 3 E 22 gt t 4 E eK p shies a ta ted es a stag ttn te scsi sink ca tia ds aca se m n 2 2 e sag 5 ta aK Residual section 1 column 8 11 row A 929 is 3 32 SD Residual section 1 column 9 11 row 2 22 is 3 33 SD Residual section 1 column 9 11 row 3 22 is 3 62 SD Residual section 1 column 10 11 row 3 22 is 3 66 SD Residual section 1 column 10 11 row 4 22 is 3 35 SD Residual section 1 column 11 11 row 3
140. 142 2 53142 19 25 OP Trait US_C 5 3 0 821032E 01 0 821032E 01 4 52 0 P Trait US_C 5 4 0 208739 0 208739 1 60 0 P Trait US_V 5 5 1 54280 1 54280 24 00 0 P diag TrSG123 sex grp 147 effects TrsG123 DIAG_V 1 1 01250 1 01250 2 96 OP TrSG123 DIAG_V 2 15 2159 15 2159 3 49 0 P TrsG123 DIAG_V 3 0 279183 0 279183 Batt OP diag TrAG1245 age grp 196 effects TrAG1245 DIAG_V 1 0 142096E 02 0 142096E 02 2 04 0 P TrAG1245 DIAG_V 2 0 143897E 02 0 143897E 02 1 54 OP TrAG1245 DIAG_V 3 0 163778E 02 0 163778E 02 1 409 OP TrAG1245 DIAG_V 4 0 207274E 03 0 207274E 03 1 61 0 P 330 15 10 Multivariate animal genetics data Sheep us TrLit1234 id lit 19484 effects TrLit1234 USV 1 1 3 84738 3 84738 9 19 0 TrLit1234 US_C 2 1 2 52256 2 52256 5 47 0 TrLit1234 US_V 2 2 4 07860 4 07860 5 46 0 TrLit1i234 US_C 3 1 0 767402E 01 0 767402E 01 2 05 0 TrLiti234 US_C 3 2 0 206265 0 206265 4 36 0 TrLit1i234 US_V 3 3 0 250400E 01 0 250400E 01 3 30 0 TrLit1234 US C 4 1 0 118244 0 118244 0 35 0 TrLit1i234 US_C 4 2 0 824135 0 824135 1 58 0 TrLit1234 US_C 4 3 0 492320E 01 0 492320E 01 0 85 0 TrLit1234 US_V 4 4 0 704947 0 704947 1 74 0 xfa1 TrDam12 id dam 32088 effects TrDam12 XFA_LV O 1 0 00000 0 00000 0 00 0 TrDam12 XFAV O 2 0 00000 0 00000 0 00 0 TrDam12 XFA L 1 1 1 27045 1 27045 10 00 0 TrDam12 SPALL 1 2 1 15350 1 15350 5 66 0 xfa3 Trait nrm tag 85568 effects Trait XFAV O 1 0 00000 0 00000 0 00 0 Trait XFA_LV O 2 0 00000 0 00000 0 00 0 Trait XFA_V O0
141. 16 SIRE 2 Q 116 SIRE_2 0 1 5 168 417 2752 Lif SIRE 3 0 117 SIRE 3 0 1 3 154 389 2383 118 SIRE3 0 118 SIRE 3 O 1 4 184 414 2463 119 SIRE_3 0 119 SIRE_3 0 1 5 174 483 2293 120 SIRES Q 120 SIRE 3 O 1 5 170 430 2303 8 7 Reading in the pedigree file The syntax for specifying a pedigree file in the ASReml command file is pedigree_file qualifiers e the qualifiers are listed in Table 8 1 e the identities individual parent_1 parent_2 are merged into a single list and the inverse relationship is formed before the data file is read e parent l is typically male for animal pedigrees sire but often female for plant pedigrees it must be the XY parent if the XLINK qualifier is specified e when the data file is read data fields with the P qualifier are recoded according to the combined identity list e the inverse relationship matrix is automatically associated with factors coded from the pedigree file unless some other covariance structure is specified The inverse relationship matrix is specified with the variance model name NRM the variance model function name nrm e the inverse relationship matrix is written to ainverse bin 2 http www vsni co uk products asreml user PedigeeNotes pdf contains details of these op tions 158 8 7 Reading in the pedigree file if ainverse bin already exists ASReml assumes it was formed in a previous run and has the correct inverse ainverse bin is read rather than the
142. 169 61 9169 i 7e UP Trait US_V 3 3 259 121 269 121 2 45 OP Trait US _C 4 1 70 8113 70 8113 1 54 OP Trait US_C 4 2 57 6146 57 6146 1 23 OP Trait US_C 4 3 331 807 331 807 2 20 OP Trait US_V 4 4 551 507 551 507 2 45 OP Trait US_C 5 1 73 7857 73 7857 1 60 OF Trait US C 5 2 62 5691 62 5691 1 33 OP Trait US_C 5 3 330 851 330 851 2 29 0 P Trait US_C 5 4 533 756 533 756 2 42 OP Trait US_V 5 5 542 175 542 175 2 45 OP However the usual syntax for fitting an unstructured error model for multivariate data is to omit the ASUV qualifier and write 285 15 6 Spatial analysis of a field experiment Barley yl y3 yS y7 y10 Trait tmt Tr tmt residual id units us Trait The antedependence model of order 1 is clearly more parsimonious than the unstructured model Table 15 5 presents the incremental Wald F statistics for each of the variance models There is a surprising level of discrepancy between models for the Wald F statistics The main effect of treatment is significant for the uniform power and antedependence models Table 15 5 Summary of Wald F statistics for fixed effects for variance models fitted to the plan treatment treatment time model df 1 df 4 Uniform 9 41 5 10 Power 6 86 6 13 Heterogeneous power 0 00 4 81 Antedependence order 1 4 14 3 91 Unstructured 1 71 4 46 15 6 Spatial analysis of a field experiment Barley In this section we illustrate the ASReml syntax for performing spatial and incomplete block ana
143. 177 2955 E 4 2982 5846 178 9939 E 522 2 9127 179 3317 E 523 2907 1301 179 7729 E 524 2776 0280 180 3853 E 525 2716 1221 181 8923 E 526 2381 9697 44 1852 E 527 2696 4092 133 8687 E 528 2723 5890 112 6784 E 529 2701 6306 104 2832 E 530 3006 8237 112 7234 E 531 3019 5559 112 6742 E 532 3064 3052 113 0868 E SED Overall Standard Error of Difference 246 2 297 15 8 Paired Case Control study Rice Note that the replicated check lines have lower SE than the unreplicated test lines There will also be large diffeneces in SEDs Rather than obtaining the large table of all SEDs you could do the prediction in parts predict var 1 525 column 5 5 predict var 526 532 column 5 5 SED to examine the matrix of pairwise prediction errors of variety differences 15 8 Paired Case Control study Rice This data is concerned with an experiment conducted to investigate the tolerance of rice varieties to attack by the larvae of bloodworms The data have been kindly provided by Dr Mark Stevens Yanco Agricultural Institute A full description of the experiment is given by Stevens et al 1999 Bloodworms are a significant pest of rice in the Murray and Murrumbidgee irrigation areas where they can cause poor establishment and substantial yield loss The experiment commenced with the transplanting of rice seedlings into trays Each tray contained 32 seedlings and the trays were paired so that a control tray no bloodworms and
144. 18 124 130 188 ooo w 22 ao eK omitting 11 5 0 50 O 0 40 5 0 30 0 0 15 99 23 32 25 32 lt 9 m2 55 53 122 129 63 88 97 135 52 19 78 7 233 ooo 18 zeros 38 0 28 32 0 26 24 0 19 0 0 9 37 44 46 120 33 53 41 55 106 127 90 4 124 67 16 7 56 12 30 84 109 10 15 38 49 61 39 64 110 83 ied 123 153 133 129 96 63 89 228 67 113 47 23 126 113 91 49 68 109 1419 16 45 62 57 116 86 131 141 63 181 50 57 92 48 15 27 65 20 69 57 101 66 30 70 a5 24 141 40 25 104 114 70 198 29 T3 64 13 4 Other ASReml output files Residual section 11 column 9 of 11 row 2 of 22 is 3 33 SD Residual section 11 column 9 of 11 row 3 of 22 is 3 52 SD Residual section 11 column 10 of 11 row 3 of 22 is 3 56 SD Residual section 11 column 10 of 11 row 4 of 22 is 3 35 SD Residual section 11 column 11 of 11 row 3 of 22 is 3 52 SD 6 possible outliers in section 11 test value 23 0297999308 Residuals Percentage of sigma 6 979 0 o o A GS SO 84 BS G8 144 72 29 52 20 61 11 132 26 63 15 99 37 84 48 110 228 49 131 20 9 87 1 32 14 26 30 3 37 6 4 23 32 44 46 109 97 83 67 68 141 69 40 44 11 0 3 6 O 21 41 15 51 25 32 120 33 10 58 117 113 109 63 57 25 18 18 2 84 19 51 45 18 30 56 9 12 53 41 7 99 123 47 119 181 101 104 40 29 87
145. 2 0 10969E 06 666 df 4 LogL 4243 76 S2 88040 666 df 5 LogL 4240 59 S2 84420 666 df 293 15 7 Unreplicated early generation variety trial Wheat 6 LogL 4240 01 52 85617 666 df 7 LogL 4239 91 S5S2 86032 666 df 8 LogL 4239 88 52 86189 666 df 9 LogL 4239 88 S2 86253 666 df 10 LogL 4239 88 52 86280 666 df Results from analysis of yield Akaike Information Criterion 8485 76 assuming 3 parameters Bayesian Information Criterion 8499 26 Model_Term Gamma Sigma Sigma SE C idv variety IDV_V 532 0 959184 82758 6 8 98 0 P ar1 row id column 670 effects Residual SCA_V 670 1 000000 86280 2 9 12 0 P row AR_R 1 0 672052 0 672052 16 04 1P Wald F statistics Source of Variation NumDF DenDF F ine P ine 7 mu 1 83 6 9799 20 lt 001 3 weed 1 477 0 109 33 lt 001 The iterative sequence converged the REML estimate of the autoregressive parameter indi cating substantial within column heterogeneity The abbreviated output from the two dimensional AR1xAR1 spatial model is 1 LogL 4277 99 S2 0 12850E 06 666 df 2 LogL 4266 14 S2 0 12097E 06 666 df 3 LogL 4253 06 S2 0 10778E 06 666 df 4 LogL 4238 72 52 83163 666 df 5 LogL 4234 53 52 79867 666 df 6 LogL 4233 78 S2 82024 666 df 7 LogL 4233 67 S2 82724 666 df 8 LogL 4233 65 52 82975 666 df 9 LogL 4233 65 S2 83065 666 df 10 LogL 4233 65 S2 83100 666 df Results from analysis of yield Akaike Information Criterion 8475 29
146. 22 is 3 52 SD 6 possible outliers in section 1 test value 23 0311757288330 234 13 4 Other ASReml output files variogram of resiguats fd sul 2b8512 41 18 Outer displacement Figure 13 2 Variogram of residuals Figures 13 2 to 13 5 show the graphics derived from the residuals when the DISPLAY 15 qualifier is specified and which are written to eps files by running ASReml g22 nin89a as The graphs are a variogram of the residuals from the spatial analysis for site 1 Figure 13 2 a plot of the residuals in field plan order Figure 13 3 plots of the marginal means of the residuals Figure 13 4 and a histogram of the residuals Figure 13 5 The selection of which plots are displayed is controlled by the DISPLAY qualifier Table 5 4 By default the variogram and field plan are displayed The sample variogram is a plot of the semi variances of differences of residuals at particular distances The 0 0 position is zero because the difference is identically zero ASReml displays the plot for distances 0 1 2 8 9 10 11 14 15 20 The plot of residuals in field plan order Figure 13 3 contains in its top and right margins a diamond showing the minimum mean and maximum residual for that row or column Note that a gap identifies where the missing values occur The plot of marginal means of residuals shows residuals for each row column as well as the trend in their means Finally we present a small e
147. 29 4 25 6 spl age 7 5 effects fitted Finished 19 Aug 2005 10 08 11 980 LogL Converged The REML estimate of the smoothing constant indicates that there is some nonlinearity The fitted cubic smoothing spline is presented in Figure 15 13 The fitted values were obtained from the pvs file The four points below the line were the spring measurements 200 600 1000 1400 li fi fi fi fi L fi fi fi Marginal 200 5 Ze 150 Lo 100 j 50 200 150 100 Trunk circumference mm 50 200 150 Z Zo 100 LA Ze 50 AA I 200 600 1000 1400 Time since December 31 1968 Days Figure 15 13 Fitted cubic smoothing spline for tree 1 We now consider the analysis of the full dataset Following Verbyla et al 1999 we con sider the analysis of variance decomposition see Table 15 11 which models the overall and individual curves An overall spline is fitted as well as tree deviation splines We note however that the intercept and slope for the tree deviation splines are assumed to be random effects This is consistent with Verbyla et al 1999 In this sense the tree deviation splines play a role in modelling the conditional curves for each tree and variance modelling The intercept and 311 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges Table 15 11 Orange data AOV decomposition stratum decomposition type df or ne co
148. 3 MBF statements required to ex tract markers 35 75 and 125 from the marker file markers csv The names of model terms must begin with a letter hence the marker names are the letter M followed by the position number Alternatively IRFIELDlettersinteger is interpreted as RFIELD integer so the FOR statement can be written even more concisely as mbf Geno 1 markers csv key 1 RFIELD S RENAME S without the need to assign Markern Now to add another marker to the model one can just add the marker integer to the ASSIGN statement Restriction forlist and command are both limited to 200 charac ters 203 10 4 Advanced processing arguments High level qualifiers qualifier action LIF stringi New R4 One form of the IF statement is string2 text IF string1 string2 ASSIGN M1 brt DamAge which makes the ASSIGN statement active if string1 is the same as string2 Note that there need to be spaces before and after to avoid confusion with the strings This has been used when performing a large number of bivariate analyses with trait specific fixed effects being fitted So IIF 1 wwt ASSIGN M1 brt DamAge IIF 1 ywt ASSIGN M1 brt IF 1 fwt ASSIGN M1 DamAge IF 2 wwt ASSIGN M2 brt DamAge IF 2 ywt ASSIGN M2 brt IF 2 fwt ASSIGN M2 DamAge 1 2 Trait at Trait 1 M1 at Trait 2 M2 PATH pathlist The PATH or PART control statement may list multiple path numbers so that the follo
149. 3 NE83498 16 LANCER LANCOTA NES87451 NE87409 NE86607 NE87612 CHEYENNE NE83404 NE86503 NE83T12 NE87613 17 BRULE NE86501 NES87457 NE87513 NE83498 NE87613 SIOUXLAND NE86503 NE87408 CENTURAK78 NE86501 18 REDLAND NE86503 NES87463 NE87627 NE83404 NE86T666 NES87451 NE86582 COLT NE87627 TAM200 19 CODY NE86507 NES87499 ARAPAHOE NE87446 GAGE NE87619 LANCER NE86606 NE87522 20 ARAPAHOE NE86509 NE87512 LANCER SIOUXLAND NES86607 LANCER NE87463 NE83406 NE87457 NE84557 21 NES83404 TAM107 NE87513 TAM107 HOMESTEAD LANCOTA NES87446 NES86606 NE86607 NE86509 TAM107 22 NE83406 CHEYENNE NE87522 REDLAND NE86501 NE87518 NES86482 BRULE SIOUXLAND LANCOTA HOMESTEAD yu wu dx pjay NIN Asasanyy a2eyseaquy eyseaq N ZE 3 3 The ASReml data file 3 3 The ASReml data file The standard format of an ASReml data file is to have the data arranged in space TAB or comma separated columns fields with a line for each sampling unit The columns contain covariates factors response variates traits and weight variables in any convenient order This is the first 30 lines of the file nin89 asd containing the data for the NIN variety trial The data are in field order rows within columns and an optional heading first line of the file has been included to document the file In this case there are 11 space separated data fields variety column and the complete file has 224 data lines one for each variety in each replicate variety id pid raw repl nloc yield lat lo
150. 5 df 1 components restrained 3 LogL 4240 99 S2 80942 665 df 4 LogL 4227 44 52 53712 665 df 5 LogL 4221 09 52 52201 665 df 6 LogL 4220 94 S2 54803 665 df 296 15 7 Unreplicated early generation variety trial Wheat 7 LogL 4220 94 S2 54935 665 df 8 LogL 4220 94 S2 54934 665 df Results from analysis of yield Akaike Information Criterion 8451 88 assuming 5 parameters Bayesian Information Criterion 8474 37 Model_Term Gamma Sigma Sigma SE C idv variety IDV V 532 1 32827 72967 0 6 99 OP idv units IDV_ 670 0 562308 30889 9 3 18 0 P ar1 row ar1 column 670 effects Residual SCA_V 670 1 000000 54934 0 65 19 OP row AR_R 1 0 835396 0 835396 18 38 0P column AR_R 1 0 375499 0 375499 a0 0 P Wald F statistics Source of Variation NumDF DenDF F ine Ping 7 mu 1 13 6 4272 13 lt 001 3 weed 1 470 3 86 31 lt 001 8 pol column 1 1 27 4 3 69 0 065 The increase in REML log likelihood is significant The predicted means for the varieties can be produced and printed in the pvs file as Ecode is E for Estimable for Not Estimable Warning mv_estimates is ignored for prediction Warning units is ignored for prediction Sa le ee el ee mame il AR ete ee ee Mmmm Predicted values of yield column is evaluated at 5 5000 Model terms involving weed are predicted at the average 0 4597 variety Predicted_Value Standard_Error Ecode i 2916 6768 179 5421 E 2 2955 1002 179 0278 E 3 2869 7482
151. 7 0 10040 02 394 140099 2 2 1 2 2 2 2 2 2 1 2 1 2 1 1 2 1 2 2 2 2 2 1 2 141099 2 2 0 0 2 2 1 2 2 1 2 1 2 2 0 2 2 2 2 1 2 2 1 1 54785 2 2 gt gt 2 2 2 gt 2 2 2 2 2 gt 3 2 2 2 gt 2 2 3 2 2 1 2 2 2 1 2 2 0 2 1 2 2 2 2 2 2 2 1 2 547966 2 2 1 1 1 2 0 2 2 1 2 2 2 2 2 2 2 2 2 1 2 548082 2 2 1 2 2 2 1 2 1 2 2 1 2 2 1 2 2 2 2 1 2 2 2 gt gt 2 2 2 2 2 gt 2 2 2 2 2 2 2 2 2 2 gt The primary output follows Nfam 71 A Nfemale 26 A Nmale 37 A Clone A 860 MatOrder 914 A 170 8 11 Factor effects with large Random Regression models rep 8 A iblk 80 A prop i A culture 2 A treat 2 A measure 1 A CWAC6 M 9 Parsing snpData grr Clone Class names for factor Clone are initialized from the grr file GRR Header line begins Genotype 0 10024 01 114 0 10037 01 257 0 4854 Marker labels found Marker labels 0 10024 01 114 UMN CL98Contig1 Notice The header line indicates there are 4854 regressors in the file Notice SNP data line begins 140099 2 2 1 2 2 2 2 2 2 1 2 1 2 1 1 Notice Markers coded 9 treated as missing Marker data 0 1 2 for 923 genotypes and 4854 markers read from snpData grr 160414 missing Regressor values 3 6 replaced by column average Regressor values ranged 0 00 to 2 00 Regressor Means ranged 1 00 to 2 00 Sigma2p 1 p is 1057 12558 GIV1 snpData grr 923 9 946 27 QUALIFIERS MAXIT 30 SKIP 1 DFF 1 QUALIFIER DOPART 2 is active Reading nassau_cut_v3 csv
152. 7 15 91 Fitted values X 16 77 35 94 o o o o o o o N Q o o o 6 o o 8 0920 ae o 8 e oo 4 9 Oo O oo o e 8 o 98 wo o o o oo o o 998 o o 8 o Co o o o Bo amp 5 99 o o a 85 99 q Oa o o o 8 8 n 5 oe 2o Blo 9 g o o 8 g8 o o 5 a 8 o j 7 e o a o o o j oo o o o o o o o o 6 o o o o o Figure 13 1 Residual versus Fitted values This is part of nin89a yht Note that the values corresponding to the missing data first 15 records are all 0 1000E 36 which is the internal value used for missing values 224 13 4 Other ASReml output files Record Yhat Residual Hat 1 0 10000E 36 0 1000E 36 0 1000E 36 2 0 10000E 36 0 1000E 36 0 1000E 36 3 0 10000E 36 0 1000E 36 0 1000E 36 4 0 10000E 36 0 1000E 36 0 1000E 36 15 0 10000E 36 0 1000E 36 0 1000E 36 16 24 089 5 161 6 075 17 2f OT 4 477 6 223 18 28 795 6 255 6 233 19 PERTE 6 327 6 236 20 27 043 6 007 5 963 239 21 522 8 128 6 314 240 24 696 1 854 6 114 241 25 452 0 1480 6 159 242 22 464 4 436 6 605 13 4 Other ASReml output files 13 4 1 The aov file This file reports details of the calculation of Wald F statistics particularly as relating to the conditional Wald F statistics not computed in this demonstration In the following table relating to the incremental Wald F statistic the columns are e model term e columns in design matrix e numerator degrees of freedom e simple Wald F sta
153. 8 var trt rep mu var trt If var trt var trt fitted before mu var and trt var trt fully fitted mu var and trt are completely singular and set to zero The order within var trt rep is de termined internally 6 11 Wald F Statistics The so called ANOVA table of Wald F statistics has 4 forms Source NumDF F inc Source NumDF F inc F con M Source NumDF DDF_inc F inc P inc Source NumDF DDF_con F inc F con M P con depending on whether conditional Wald F statistics are reported requested by the FCON qualifier and whether the denominator degrees of freedom are reported ASReml always reports incremental Wald F statistics F inc for the fixed model terms in the DENSE partition conditional on the order the terms were nominated in the model Note that probability values are only available when the denominator degrees of freedom is calculated and this must be explicitly requested with the DDF qualifier in larger jobs Users should study Section 2 5 to understand the contents of this table The conditional maximum model used as the basis for the conditional F statistic is spelt out in the aov file described in Section 13 4 The numerator degrees of freedom NumDF for each term is easily determined as the number of non singular equations involved in the term However in general calculation of the denominator degrees of freedom DDF is not trivial ASReml will by default attempt the calculation for small analyses by one of tw
154. 8 9 0 8776 0 8566 44 60 79 27 331 5 550 9 0 9761 43 16 f6 2 320 8 533 2 541 6 Wald F statistics Source of Variation NumDF F ine 8 Trait 5 188 83 1 tmt 1 4 14 9 Trait tmt 4 3 91 The iterative sequence converged and the antedependence parameter estimates are printed columnwise by time the column of U and the element of D L e 284 15 5 Balanced repeated measures Height 0 0269 1 0 6284 0 0 0 0 0373 0 1 1 4911 0 0 D diag 0 0060 U 0 0 1 1 2804 0 0 0079 0 0 0 1 0 9678 0 0391 0 0 0 0 1 Finally the input and output files for the unstructured model are presented below The REML estimate of X from the ANTE model is used to provide starting values ASSIGN USI lt INIT 37 20 23 38 41 55 34 83 61 89 258 9 44 58 79 22 331 4 550 8 43 14 T667 320 7 533 0 541 4 1 gt yl y3 yo y7 yi0 Trait tmt Trait tmt residual id units us Trait USI 1 LogL 160 368 S2 1 0000 60 df 2 LogL 159 027 S2 1 0000 60 df 3 LogL 158 247 S2 1 0000 60 df 4 LogL 158 040 S2 1 0000 60 df 5 LogL 158 036 S2 1 0000 60 df Results from analysis of yi y3 yS y7 y10 Akaike Information Criterion 346 07 assuming 15 parameters Bayesian Information Criterion 377 49 Model_Term Sigma Sigma Sigma SE C id units us Trait 70 effects Trait USV 1 1 37 2262 of 2262 2 45 OP Trait US_C 2 1 23 3935 23 3935 it OP Trait US_V 2 2 41 5195 41 5195 2 45 OP Trait US_C 3 1 51 6524 51 6524 1 61 OP Trait US_C 3 2 61 9
155. CA_V 242 1 000000 48 7026 6 81 0O P parameter row AR_R 1 0 655480 0 655480 11 63 O P estimates column AR_R 1 0 437505 0 437505 5 43 OP Wald F statistics Source of Variation NumDF DenDF Fine P inc testing 12 mu 1 25 0 331 93 lt 001 fixed 1 variety 55 110 8 2 22 lt 001 effects Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 13 mv_estimates 18 effects fitted 6 possible outliers in section 11 see res file Finished 29 Jan 2014 09 34 34 861 LogL Converged Following is a table of Wald F statistics augmented with a portion of Regression Screen output The qualifier was SCREEN 3 SMX 3 Model_Term Gamma Sigma Sigma SE C idsize IDV V 92 0 581102 0 136683 3 31 OP expt idsize IDV_V 828 0 121231 0 285153E 01 1 12 OP idv units 504 effects Residual SCA_V 504 1 000000 0 235214 12 70 0P Wald F statistics Source of Variation NumDF DenDF_con F_inc F_con M P_con 113 mu 1 72 4 65452 25 56223 68 lt 001 2 expt 6 ey dc Be 0 64 A 0 695 221 13 3 Key output files 4 type 4 63 8 22 95 3 01 A 0 024 114 expt type 10 79 3 1 31 0 93 B 0 508 23 x20 1 55 4 4 33 2 37 B 0 130 24 x21 1 63 3 1 91 0 87 B 0 355 25 x23 1 68 3 23 93 0 11 B 0 745 26 x39 1 yt 1 85 0 35 B 0 556 27 x48 1 69 9 1 58 2 08 B 0 154 28 x59 1 49 7 1 41 0 08 B 0 779 29 x60 1 69 6 1 46 0 42 B 0 518 30 x61 1 64 0 1 11 0 04 B 0 838 31 x62 1 61 8 2 18 0 09 Batre 32 x64 1 55 6 31 48 4 50 B 0 038 33 x65 2
156. Di on LOG PVBari for Section 1 37 xk kkk xk kkk kxk kkk kkk FKK KK kk KK K K at RK OK kkk KK DG kk k k k k k k k k k kk k ok 232 13 4 Other ASReml output files OK x ORK FRR 2 k a k k kk kk kk k k k k xk kk kk Ra kk kk kkk k k k k gk 3k 2k 2K 2k k ok k K K K K k Min Mean Max 24 873 0 27959 15 915 Spatial diagnostic statistics of Residuals Residual Plot and Autocorrelations lt L0o xXH gt se 0 077 Exx K X x x gt 4X o xxx X x xxx o E PEER 2X axeixx x o EKAXKK xXK OoL lt 0o x xXx x il lt lt lt lt lt O0 xX z lt O lt lt LLLoo o L lt lt lt lt O OL o lt x x 1 0 28 0 38 0 50 0 65 0 77 2 O17 s27 0 39 0 51 3 0 08 0 11 Residuals Percentage of sigma 0 0 0 0 0 0 0 2 29 lt 2 20 61 11 132 a87 i 32 14 26 30 3 44 11 0 3 6 0 21 18 18 2 84 19 1 45 40 29 Bf 103 81 61 81 29 Yh 43 24 90 37 lt 23 99 114 218 332 174 77 19 257 333 352 319 253 166 152 227 167 356 S35 163 lt 179 169 183 277 352 lt 323 288 151 56 I 1 i I I i I I I i l I I I I I I I I Residual section 11 column 8 of 11 row 4 of 22 is 3 32 SD ds 0 56 0 64 0 56 0 19 0 28 0 35 0 42 0 40 00 0 77 GITA 0 0 26 0 37 6 41 15 18 30 130 94 64 130 26 29 52 25 1
157. E qualifier to resume iterating from the current point To abort the job at the end of the current iteration create a file named ABORTASR NOW in the directory in which the job is running At the end of each iteration ASReml checks for this file and if present stops the job producing the usual output but not producing predicted values since these are calculated in the last iteration Creating FINALASR NOW will stop ASReml after one more iteration during which predictions will be formed On case sensitive operating systems eg Unix the filename ABORTASR NOW or FINALASR NOW must be upper case Note that the ABORTASR NOW file is deleted so nothing of importance should be in it If you perform a system level abort CTRL C or close the program win dow output files other than the rsv file will be incomplete The rsv file should still be functional for resuming iteration at the most recent parameter estimates see CONTINUE Use MAXIT 1 where you want estimates of fixed effects and predictions of random effects for the particular set of variance parameters supplied as initial values Otherwise the estimates and predictions will be for the updated variance parameters see the BLUP qualifier below If MAXIT 1 is used and an Unstructured Variance model is fitted AS Reml will perform a Score test of the US matrix Thus assume the variance structure is modelled with reduced parameters if that modelled structure is then processed as t
158. ED XFAk X for extended is the third form of the factor analytic model and has the same parameterisation as for FACV that is II WY However XFA models have parameters specified in the order diag W and vec T when k is greater than 1 constraints on the elements of I are required see Table 7 5 may not be used in R structures return the factors as well as the effects permit some elements of V to be fixed to zero are computationally faster than the FACV formulation for large problems when k is much smaller than w With multiple factors some constraints are required to maintain identifiablity Traditionally this has simply been to set the leading loadings of new factors to zero Loadings then need to be rotated to orthogonality If no loadings are constrained ASReml will rotate the loadings to orthogonality after holding the loadings of lower factors fixed for a few iterations The 145 7 11 Variance model functions available in ASReml orthogonalization process occurs at the beginning of the iteration so the final returned values have not been formally rotated Finding the REML solutions for multifactor Factor Analytic models can be difficult The first problem is specifying initial values When using CONTINUE and progressing XFA k to XFA k 1 ASReml 3 initialises the factor k 1 at W x 0 2 changing the sign of the relatively largest loading to negative One strategy which sometimes works
159. ENSE 76 DESIGN 69 IDEVIANCE residuals 104 IDF 76 IDTAG 150 IDISPLAY 70 IDISP dispersion 103 IDOM dominance 59 IDOPART 193 IDOPATH 193 IDO 55 IDV 55 ID 55 EMFLAG 77 ENDDO 55 IEPS 70 EXCLUDE 62 EXP 55 EXTRA 78 FACPOINTS 83 FACTOR 71 IFCON 22 67 FGEN 150 IFIELD 71 IFILTER 62 IFINAL 186 FOLDER 63 FORMAT 63 IFOR 194 IFOWN 22 78 GAMMA GLM 103 GDENSE 79 IGIV 151 IGKRIGE 70 IGLMM 79 IGOFFSET 151 GRAPHICS 186 GROUPFACTOR 70 GROUPSDF 156 GROUPS 151 IG 49 68 70 HARDCOPY 186 HOLD 79 341 HPGL 79 IDENTITY link 103 IDLIMIT 70 INBRED 151 INCLUDE 65 INIT 120 INTERACTIVE 186 IT 48 JOIN 68 70 Jddm 56 Jmmd 56 Jyyd 56 KEEP 199 IKEY 71 199 IKNOTS 84 ILAST 80 151 LOGARITHM 103 LOGFILE 186 LOGIT 102 LOGIT link 102 ILOG link 102 LONGINTEGER 151 IL 47 MAKE 151 IMATCH 64 IMAXIT 68 IMAX 56 MBF 71 IMERGE 64 IMEUWISSEN 151 IMGS 151 IMIN 56 IMM transformation 56 58 IMOD 56 IMVREMOVE 72 IM 56 INAME 122 INA 56 NEGBIN GLM 103 NOCHECK 84 NODUP 199 NOGRAPHS 186 NOKEY 71 NOREORDER 84 NORMAL 56 NORMAL GLM 102 NOSCRATCH 84 INDEX OFFSET variable 103 ONERUN 186 QUTFOLDER 186 OQUTLIER 17 OWN 80 PEARSON residuals 104 IPLOT 173 IPNG 80 POISSON GLM 103 POLPOINTS 84 PPOINTS 84 PRINTALL 173 IPRINT 80 PR
160. If ASReml does not run at all it is a setup or licensing issue which is not discussed in this chapter It is hoped that the new syntax for variance structure specification will reduce the incidence of coding errors Even when the job appears to run successfully you should check that e the records read lines read records used are correct e mean min max information is correct for each variable e the Loglikelihood has converged and the variance parameters are stable e the fixed effects have the expected degrees of freedom Coding errors can be classified as e typing errors these are difficult to resolve because we tend to read what we intended to type rather than what we actually typed Section 14 4 demonstrates the consequences of the common typographical errors that users make wrong coding this arises often from misunderstanding the guide or making assumptions arising from past experience which are not valid for ASReml The best strategy here is to closely follow a worked example or to build up to the required model Sections 14 3 and 14 2 may help as well as reviewing all the relevant sections of this Guide It may be as simple as adding or deleting a space inserting a comma changing case or adding one more qualifier inappropriate model the variance model you propose may not be suited to the data in which case ASReml may fail to produce a solution You can verify the model is appropriate by closer examination of the structure o
161. It will generally be preferable to presepecify the levels than to use SORT because most other references to particular levels of factors will refer to the unsorted levels Therefore users should verify that ASReml has made the correct interpretation when nominating specific levels of SORTed factors In particular any transformations are performed as the data is read in and before the sorting occurs SORTALL means that the levels of this and subsequent factors are to be sorted 5 4 4 Skipping input fields This is particularly useful in large files with alphabetic fields that are not needed as it saves ASReml the time required to classify the alphabetic labels New R4 CSKIP f can be used to skip f fields Thus ICSKIP 1 AB 50 5 5 Transforming the data skips the first data field and reads the second and third fields into variables A and B and CSKIP Sire I CSKIP 2 Y will define two variables Sire taken from the second data field and Y taken from the fifth data field Also SKIP f will skip f data fields BEFORE reading this field Thus Sire I SKIP 1 Y SKIP 2 achieves the same result but in a less obvious way These qualifiers are ignored when reading binary data Important Using the SKIP qualifier in association with the specification of a file to be read in allows initial lines of the file to be skipped SKIP can also be used to skip columns when reading in a data file Use of CSKIP for skipping data fields is recommen
162. LogL 2799 30 S2 8568 1 6390 df 3 LogL 2759 03 52 8131 3 6390 df 4 LogL 2741 99 52 7766 2 6390 df 5 LogL 2741 40 S2 7702 9 6390 df 6 LogL 2741 40 Sa 7700 1 6390 df Results from analysis of HIG Akaike Information Criterion 65490 79 assuming 4 parameters Bayesian Information Criterion 65517 84 Model_Term Gamma Sigma Sigma SE C rep iblk IDV_V 640 0 307856 2370 52 13 00 0 P grm1 Clone GRM_V 923 0 275656 2122 58 5 82 OP Clone IDV_V 926 0 152554 1174 68 6 08 OP Residual SCA_V 6399 1 000000 7700 10 49 64 OP Wald F statistics Source of Variation NumDF F ine 20 mu 1 0 11E 06 12 culture 1 2615 96 21 culture rep 6 30 44 23 rep iblk 640 effects fitted 22 grm1 Clone 923 effects fitted 4 Clone 926 effects fitted 66 are zero 78 possible outliers see res file Notes e of 926 clones identified 860 have data and 923 have genomic data e The res file contains additional details about the analysis including a listing of the larger marker effects All marker effects are reported in the mef file e Particular columns of the grr data can be included in the model using the grr Factor i model term where and i specifies which number regressor variable to include Listing of the larger marker effects 36 12 61 01 121 1 40736 0 00000 617 0 14383 01 111 1 26081 0 00000 777 0 15417 01 138 1 25597 0 00000 1246 0 18644 02 210 122522 0 00000 1903 0 6863 01 202 1 24800 0 00000 2102 0 8683 02 432 1 15496 0 0000
163. Notice The parameter estimates are followed by their approximate standard errors The first 8 lines are based on the asr file 12 3 VPREDICT PIN file processing There are four forms of the VPREDICT directive e Ifthe pin file exists and has the same name as the jobname including any suffix appended by using RENAME just specify the VPREDICT directive e If the pin file exists but has a different name to the jobname specify the VPREDICT directive with the pin file name as its argument e Ifthe pin file does not exist or must be reformed a name argument for the file is optional but the DEFINE qualifier should be set Then the lines of the pin file should follow on the next lines terminated by a blank line An alternative to using VPREDICT is process the contents of the pin file by running ASReml with the P command line option specifying the pin file as the input file Note that in this case the code must be self contained and any substitution variable used needs defining in the pin file For example if we wish to use sub to indicate fullname then the assignment of fullname to sub using ASSIGN sub fullname needs to be in the pin file 216 13 Description of output files 13 1 Introduction With each ASReml run a number of output files are produced ASReml generates the out put files by appending various filename extensions to basename A brief description of the filename extensions is presented in Table 13
164. O 21 00 510 5 840 0 149 0 5 repl 4 O 0 1 2 4132 4 6 nloc 0 O 4 000 4 000 4 000 0 000 7 yield Variate 18 0 1 050 25 53 42 00 7 450 8 lat 0 O 4 300 25 80 47 30 13 63 9 long 0 O 1 200 13 89 26 40 7 629 10 row 22 0 0 1 11 5000 22 220 13 3 Key output files 11 column 11 0 0 i 6 0000 11 12 mu 1 13 mv_estimates 18 ariv row in ari row ari column has size 22 parameters 5 5 ari column in ari row ari column has size 11 parameters 6 6 ari row ar1 column 4 6 initialized Sorting Section 1 22 rows by 11 columns Forming 75 equations 57 dense Initial updates will be shrunk by factor 0 316 Notice Specify SIGMAP to allow the Sigma parameterisation Notice 1 singularities detected in design matrix iterations 1 LogL 449 818 s2 49 775 168 df 1 0000 0 1000E00 0 1000E 00 2 LogL 424 315 S2 40 233 168 df 1 0000 0 2937 0 2323 3 LogL 405 419 52 38 922 168 df 1 0000 0 4813 0 3587 4 LogL 399 552 S2 45 601 168 df 1 0000 0 6156 0 4398 5 LogL 399 336 S2 47 986 168 df 1 0000 0 6456 0 4417 6 LogL 399 325 S2 48 546 168 df 1 0000 0 6530 0 4391 7 LogL 399 324 S2 48 672 168 df 1 0000 0 6549 0 4380 8 LogL 399 324 S2 48 703 168 df 1 0000 0 6554 0 4376 Final parameter values 1 0000 0 6555 0 4375 Results from analysis of yield Akaike Information Criterion 804 65 assuming 3 parameters Bayesian Information Criterion 814 02 Model_Term Gamma Sigma Sigma SE C ari row ar1 column 242 effects Residual S
165. OBIT 102 PROBIT 102 IPS 80 IPVAL 72 IPVR GLM fitted values 104 PVSFORM 80 IPVW GLM fitted values 104 IP 48 QUASS 151 QUIET 186 READ 64 RECODE 64 IRENAME 71 186 IREPEAT 151 REPLACE 56 IREPORT 84 RESCALE 56 RESIDUALS 80 81 RESPONSE residuals 104 IRFIELD 71 ROWFACTOR 64 RREC 64 IRSKIP 65 ISAMEDATA 193 SARGOLZAEI 152 ISAVEGIV 155 ISAVE 81 ISCALE 84 ISCORE 84 ISCREEN 81 ISECTION 73 ISED 173 ISEED 56 ISELECT 62 342 ISELF 152 ISEQ 57 ISETN 57 SETU 57 ISET 57 ISIGMAP 111 ISIN 55 ISKIP 62 71 152 199 SLNFORM 81 SLOW 85 ISMX 81 ISORT 152 199 ISPARSE 71 ISPATIAL 81 SPLINE 73 SQRT link 103 ISTEP 73 SUBGROUP 73 I SUBSET 74 15UB 57 ISUM 68 TABFORM 82 ITARGET 51 57 THRESHOLD GLM 102 TOLERANCE 85 ITOTAL 102 104 TWOSTAGEWEIGHTS 174 TWOWAY 82 TXTFORM 82 UNIFORM 57 IUSE 122 UpArrow 54 IVCC 82 VGSECTORS 82 IVPV 174 IVRB 85 IV 58 IWMF 74 WORKSPACE 186 WORK residuals 104 IXLINK 152 IX 68 YHTFORM 82 YSS 76 82 YVAR 186 IY 68 ICENTRE 160 INDEX COORD 116 IGSCALE 161 IND 153 NOID 160 INSD 153 ONLYUSE 170 PEV 160 161 PRECISION 153 IPSD 153 RANGE 161 SPECIALCHAR 41 50 SUBSECTION 116 ITDIFF 173 USE 116 qualifiers datafile line 62 genetic 147 job control 65 RAM 157 random
166. ORE 178 9 3 Prediction model terms The qualifier ONLYUSE explicitly specifies the model terms to use ignoring all others The qualifier EXCEPT explicitly specifies the model terms not to use including all others These qualifiers will not override the definition of the averaging set The fourth step is to choose the weights to use when averaging over dimensions in the hyper table The default is to simply average over the specified levels but the qualifier AVERAGE factor weights allows other weights to be specified PRESENT and ASSOCIATE ASAVERAGE generate more complicated averaging processes The basic prediction process is described in the following example yield site variety r idv site id variety at site idv block predict variety puts variety in the classify set site in the averaging set and block in the ignore set Consequently ASReml implicitly forms the sitexvariety hyper table from model terms site variety and site variety but ignoring all terms in at site block and then averages across the sites to produce variety predictions This prediction will work even if some varieties were not grown at some sites because the site variety term was fitted as random If site variety was fitted as fixed variety predictions would be non estimable for those varieties that were not grown at every site 179 9 3 Prediction 9 3 3 Predict failure It is not uncommon for users to get the message Warning non e
167. Reml to discard records which have missing values in the design matrix see Section 6 9 suppresses the graphic display of the variogram and residuals which is otherwise produced for spatial analyses in the PC version This option is usually set on the command line using the option letter N see Section 10 3 on graphics The text version of the graphics is still written to the res file is a mechanism for specifying the particular points to be predicted for covariates modelled using fac v leg v k spl v k and pol v k The points are specified here so that they can be included in the ap propriate design matrices v is the name of a data field p is the list of values at which prediction is required See GKRIGE for special conditions pertaining to fac z y prediction is used to read predict_points for several variables from a file f vlist is the names of the variables having values defined If the file contains unwanted fields put the pseudo variate label skip in the appropriate position in vlzst to ignore them The file should only have numeric values predict_points cannot be specified for design factors 72 5 8 Job control qualifiers Table 5 4 List of occasionally used job control qualifiers qualifier action SECTION v ISPLINE spl v n p ISTEP r SUBGROUP t v p specifies the variable in the data that defines the data sections This qualifier enables ASReml to check that sections have been correctl
168. SCA_V 256 1 000000 511400E 01 9 12 OP Warning Code B fixed at a boundary GP liable to change from P to B C Constrained by user VCC S Singular Information matrix fixed by user positive definite unbounded So aS The convergence criteria has been satisfied after six iterations A warning message is printed below the summary of the variance components because the variance component for the setstat teststat term has been fixed near the boundary The default constraint for vari ance components GP is to ensure that the REML estimate remains positive Under this constraint if an update for any variance component results in a negative value then ASReml sets that variance component to a small positive value If this occurs in subsequent iterations the parameter is fixed to a small positive value and the code B replaces P in the C column of the summary table The default constraint can be overridden using the GU qualifier but it is not generally recommended for standard analyses Figure 15 2 presents the residual plot which indicates two unusual data values These values are successive observations namely observation 210 and 211 being testing stations 2 and 3 for setting station 9 J regulator 2 These observations will not be dropped from the following analyses for consistency with other analyses conducted by Cox and Snell 1981 and in the GENSTAT manual The REML log likelihood from the model without the setstat test
169. V lt v in the field but keeps records with DV gt 100 IDV lt missing value in the field if DV IDV gt is used after A or I v should re IDV gt fer to the encoded factor level rather than the value in the data file see also Section 4 2 Use DV to dis InitialWt DV card just those records with a miss ing value in the field D v is equivalent to DV DV v DO n 2z 2v causes ASRem to perform the fol See below lowing transformations n times de fault is variables in current term incrementing the target by i de fault 1 and the argument if present by i default 0 Loops may not be nested A loop is ter minated by ENDDO another DO or a new field definition DOM f copies and converts additive marker ChrAadd G 10 MM covariables 1 0 1 to dominance ChrAdom DOM ChrAadd marker covariables see below ENDDO terminates a DO transformation See below block EXP takes antilog base e no argument Rate EXP required 55 5 5 Transforming the data Table 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples Jddm Jmmd Jyyd IM M lt gt IM lt M lt IM gt IM gt IMAX IMIN MOD MM INA NORMAL REPLACE RESCALE SEED Jddm converts a number represent ing a date in the form ddmmccyy ddmmyy or ddmm into days Jmmd converts a date in the form ccyym mdd yymm
170. a direct sum structure with common parameters Note that SUBSECTION is only available when the residual variance function is expressed in terms of one variance function SUBSECTION f performs two tasks similar to those described in Section 7 3 2 that is defining a direct sum structure for the residual vector in a section with the number of subsections in section 7 s given by the number of levels of the factor f and pruning the levels of the factor defining the variance structure within each section but allowing common variance parameters across sections The data needs to be sorted in order of the variable f The following code would specify a common AR1 structure across sections assumed sorted in to the appropriate order within the section variable with an initial spatial autocorrelation parameter of 0 5 residual ari units INIT 0 5 SUBSECTION section If there was data sorted on date within plot then we might use residual exp date INIT 0 2 SUBSECTION plot to specify a common EXP structure across plots 7 7 7 Parameter types Ts Each variance parameter also has a type which may be set explicitly with the qualifier Ts where s is the type code The following is a list of the possible parameter types and their code They are usually set internally are reported in the tsv file and are used to define the parameter space type code action if GP is set variance V forced positive variance ratio G forced positive correlation R
171. a ro kerar der aara 245 14 3 Things to check in the asr file aoaaa aa 247 l44 Aneampe ao scce canore a aa Ee eh a h eee e a 247 14 5 Information Warning and Error messages oaoa aaa 254 15 Examples 268 15 TION lt oee a e ee we a Swe a E he Oe Se ee rece Gg 268 15 2 Split plotdesign Qats oo ke eh osc neait ner RRS EERE RES OS 268 15 3 Unbalanced nested design Rats oaoa aaa pe ee 273 15 4 Source of variability in unbalanced data Volts ooa aaa aaa 276 15 5 Balanced repeated measures Height oaaae aaa 279 15 6 Spatial analysis of a field experiment Barley o aoa aa aaa 286 15 7 Unreplicated early generation variety trial Wheat oaoa aaa aaa 292 15 8 Paired Case Control study Rice oaoa aaa 0000004 298 158 1 Standard analysis oa so senad Ke SRS COD OAS RO 300 15 8 2 A multivariate approach a ek aa eR Eee ee KES 304 15 8 3 Interpretation of results oa aaa 00002022 ee 307 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines EMM a 6 6 a a 6 E Ge e a eaa Ge a e n 309 15 10 Multivariate animal genetics data Sheep a aaa a a 317 1510 1 Half sib analysis o o s scopa oe eee eani e a e e RES OO 317 15 10 2Animal del o ke we hee eR ee we Crins 327 Bibliography 333 xi Index 337 xii List of Tables 3 1 5 1 5 2 5 3 5 4 5 9 5 6 6 1 6 2 6 3 6 4 6 5 toh 2 7 3 7 4 Ta 7 6 8 1 91 ga 9 3 9 4 Trial layout and allocati
172. a treated tray bloodworms added were grown in a controlled environment room for the duration of the experiment At the end of this time rice plants were carefully extracted the root system washed and root area determined for the tray using an image analysis system described by Stevens et al 1999 Two pairs of trays each pair corresponding to a different variety were included in each run A new batch of bloodworm larvae was used for each run A total of 44 varieties was investigated with three replicates of each Unfortunately the variety concurrence within runs was less than optimal Eight varieties occurred with only one other variety 22 with two other varieties and the remaining 14 with three different varieties In the next three sections we present an exhaustive analysis of these data using equivalent univariate and multivariate techniques It is convenient to use two data files one for each approach The univariate data file consists of factors pair run variety tmt unit and variate rootwt The factor unit labels the individual trays pair labels pairs of trays to which varieties are allocated and tmt is the two level bloodworm treatment factor control treated The multivariate data file consists of factors variety and run and variates for root weight of both the control and exposed treatments labelled yc and ye respectively Preliminary analyses indicated variance heterogeneity so that subsequent analyses were con ducted on the sq
173. able of predicted values where n is 0 9 The default is 4 G15 9 format is used if n exceeds 9 When VVP or SED are used the values are displayed with 6 significant digits unless n is specified and even then the values are displayed with 9 significant digits instructs ASReml to attempt a plot of the predicted values This qualifier is only applicable in versions of ASReml linked with the Winteracter Graphics library If there is no argument ASReml produces a figure of the predicted values as best it can The user can modify the appearance by typing lt Esc gt to expose a menu or with the plot arguments listed in Table 9 2 instructs ASReml to print the predicted value even if it is not of an estimable function By default ASReml only prints predictions that are of estimable functions requests all standard errors of difference be printed Normally only an aver age value is printed Note that the default average SED is actually an SED calculated from the average variance if the predicted values and the average covariance among the predicted values rather than being the average of the individual SED values However when SED is specified the average of the individual SED values is reported requests t statistics be printed for all combinations of predicted values requests ASReml to scan the predicted values from a fitted line for possible turning points and if found report them and save them internally in a vector which can be a
174. ables it to analyse large and complex data sets quite efficiently One of the strengths of ASReml is the wide range of variance models for the random effects in the linear mixed model that are available There is a potential cost for this wide choice Users should be aware of the dangers of either overfitting or attempting to fit inappropriate variance models to small or highly unbalanced data sets We stress the importance of using data driven diagnostics and encourage the user to read the examples chapter in which we have attempted to not only present the syntax of ASReml in the context of real analyses but also to indicate some of the modelling approaches we have found useful There are several interfaces to the core functionality of ASReml The program name ASReml relates to the primary program ASReml W refers to the user interface program developed by VSN and distributed with ASReml ASReml R refers to the S language interface to a DLL of the core ASReml routines GenStat uses the same core routines for its REML directive Both of these have good data manipulation and graphical facilities The focus in developing ASReml has been on the core engine and it is freely acknowledged that its user interface is not to the level of these other packages Nevertheless as the developer s interface it is functional it gives access to everything that the core can do and is especially suited to batch processing and running of large models without the over
175. age of the 8 location means in Table 9 5 Further discussion of associated factors The user may specify their own weights using file input if necessary Thus predict region ASAVERAGE location 1 2 3 6 1 1 1 2 1 6 would give region predictions of 11 67 and 10 84 respectively derived from the location predictions in Table 9 5 Note that because location is nested in region the location weights should sum to 1 0 within levels of region when forming region means The AVERAGE ASAVERAGE qualifier allows the weights to be read from a file which the user can create elsewhere Thus the code ASAVERAGE trial Tweight csv 2 will read the weights from the second field of file Tweight csv The user must ensure the weights are in the coding order ASReml uses trial order in this instance given in the sln file or by using the TABULATE command It was noted that it is the base ASSOCIATE factor that is formally included in the hyper table If the lowest stratum is random it may be appropriate to ignore it Omitting it from 187 9 3 Prediction the ASSOCIATE list will allow it to reenter the Ignore set Specifying it with the IGNORE qualifier will exclude its effects from the prediction but not ignore the structural information implied by the association Normally it is not necessary for any model term to involve more than 1 of the associated factors One exception is if an interaction is required so that the variance can differ betw
176. ail in Chapter 7 with examples 2 1 9 Variance models for terms with several factors A random model term may comprise either a single factor or several component factors to give a compound model term Consider a compound model term represented by A B where the component factors A and B have m and n levels respectively and the operator forms a term with levels corresponding to the combinations of all levels of A with all levels of B The effects ab for A B are generated with the levels of B nested in the levels of A ie the levels of B cycling fastest ab ab ab ab ab ab ab ab ee abpa in 2n m1 Now consider the variance model for the term A B If we specify our variance model generi cally as vmodeli A vmodel2 B where vmodel1 is a variance model function with variance matrix A A and vmode12 is a variance model function with variance matrix B Bw then the G structure for this term is defined by cov abjx abji Aij X Bri 2 9 This means that the covariance between two effects ab and ab in ab is constructed as the product of the covariance between a and a in model A i e its i 7 element A and the covariance between b and b in model B ice its k l element By apo Example 2 3 A simple direct product structure If A has 3 levels and B has 2 levels then the term A B would have the 6 levels ab ab abs ab abs ab ab Using magenta and b
177. al order but generational order is still required 160 8 8 Genetic groups List of pedigree file qualifiers qualifier description SARGOLZAETI ISELF s SKIP n SORT XLINKR an alternative procedure for computing A7 was developed by Sargolzaei et al 2005 allows partial selfing when second parent is unknown It indicates that progeny from a cross where the second parent male_parent is unknown is assumed to be from selfing with probability s and from outcrossing with probability 1 s This is appropriate in some forestry tree breeding studies where seed collected from a tree may have been pollinated by the mother tree or pollinated by some other tree Dutkowski and Gilmour 2001 Do not use the SELF qualifier with the INBRED or MGS qualifiers allows you to skip n header lines at the top of the file causes ASReml to sort the pedigree into an acceptable order that is parents before offspring before forming the A Inverse The sorted pedigree is written to a file whose name has srt appended to its name requests the formation of the inverse relationship matrix for the X chromosome as described by Fernando and Grossman 1990 where the first parent is XY and the second is XX This NRM inverse matrix is formed in addition to the usual A and can be accessed as GRM1 or as specified in the output The pedigree must include a fourth field which codes the SEX of the individual The actual code used
178. alid characters in the variable names vari able names must not include any of these symbols and the data file name is misspelt there are too many variables declared or there is no valid value supplied with an arithmetic transformation option there is a problem reading G structure header line An earlier error for example insufficient initial values may mean the ac tual line read is not actually a G header line at all A G header line must contain the name of a term in the linear model spelt exactly as it appears in the model a G structure line cannot be interpreted The size of the structure defined does not agree with the model term that it is associated with an error occurred processing the pedigree The pedigree file must be ascii free format with ANIMAL SIRE and DAM as the first three fields ASReml failed to calculate the GLM working variables or weights Check the data Either the field has alphanumeric values but has not been de clared using the A qualifier or there is not enough space to hold the levels of the factor To increase the levels insert the expected number of levels after the A or I qualifier in the field definition Use WORKSPACE s to increase the workspace available to AS Reml If the data set is not extremely big check the data summary Maybe the response variable is all missing there must be at least 3 distinct data values for a spline term If ASReml has not
179. ametric con straints and relationships equality and scale between parameters A file msv is produced similar to tsv but containing final values that can be edited and used with MSV If TSV or MSV is specified ASRem1 will read the current created with the same PART number tsv or msv file If there is no current tsv or msv file a non current produced from a different PART of the same job tsv or msv file will be read Alternative ways of specifying TSV and MSV are CONTINUE 2 and CONTINUE 3 and these qualifiers can be used as options on the command line as C2 and C3 Note that the constraints in the tsv msv files take precedence over those in the as file 7 9 2 Using estimates from simpler models Sometimes we have estimates from simpler models and we wish to reduce the need for the user to type in updated starting values The CONTINUE command line qualifier instructs ASReml to update initial parameter values from a rsv file When it is specified ASRem1 first looks for a current rsv file and if found will read it and report the constructed initial values in the tsv file If there is no current rsv file it looks for the most recent noncurrent rsv file and uses that to construct initial values As discussed below current means having the same basename and run number A non current file will have the same basename but a different run number When reading the rsv file if the variance st
180. ance covariables from a set of additive marker covariables previously declared with the MM marker map qualifier It assumes the argument A is an existing group of marker variables relating to a linkage group defined using MM which rep resents additive marker variation coded 1 0 1 representing marker states aa aA and AA respectively It is a group transformation which takes the 1 1 interval values and calculates X 0 5 2 i e 1 and 1 become one 0 becomes 1 The marker map is also copied and applied to this model term so it can be the argument in a qt1 term page 100 IDO ENDDO provides a mechanism to repeat transformations on a set of variables All tranformations except DOM and RESCALE operate once on a single field unless preceded by a DO qualifier The DO qualifier has three arguments n n is the number of times the following transformations are to be performed i default 1 is the increment applied to the target field i default 0 0 is the increment applied to the transformation argument The default for n is the number of variables in the current field definition ENDDO is formally equivalent to DO 1 and is implicit when another DO appears or the next field definition begins Note that when several transformations are repeated the processing order is that each is performed n times before the next is processed contrary to the implication of the syntax However the target is reset for each transfo
181. ance when a correlation structure is applied to the residual creates a factor with a new level whenever there is a level present for the factor f Levels effects are not created if the level of factor fis 0 missing or negative The size may be set in the third argument by setting the second argument to zero creates a factor with a level for every record subject to the factor level of f equalling k i e a new level is created for the factor whenever a new record is encountered whose integer truncated data value from data field fis k Thus uni site 2 would be used to create an independent error term for site 2 in a multi environment trial and is equivalent to at site 2 units The default size of this model term is the number of data records The user may specify a lower number as the third argument There is little computational penalty from the default but the s1n file may be substantially larger than needed However if the units vector is full size the effects are mapped by record number and added back to the fitted residual for creating residual plots 100 6 8 Generalized Linear Mixed Models Table 6 2 Alphabetic list of model functions and descriptions model function action vect v is used in a multivariate analysis on a multivariate set of covariates v to pair them with the variates The test example included signal G 93 93 slides background G 93 dart asd ASUV signal Trait Trait vect background
182. and c Note that 7 is the best linear unbiased estimator BLUE of 7 while is the best linear unbiased predictor BLUP of u for known o and a We also note that s o Eog 3he 2 2 3 Use of the gamma parameterization ASReml uses either the gamma or sigma parameterization for estimation depending on the residual specification The current default for univariate single section data sets is the gamma parameterization In this case all scale parameters are estimated as a ratio with respect to the residual variance o and any parameters that measure only correlation are unchanged See Chapter 7 for more detail 2 3 What are BLUPs Consider a balanced one way classification For data records ordered r repeats within b treatments regarded as random effects the linear mixed model is y XT Zu e where X 1 1 is the design matrix for 7 the overall mean Z I 1 is the design matrix for the b random treatment effects u and e is the error vector Assuming that the treatment effects are random implies that u N Aw o7I for some design matrix A and parameter vector w It can be shown that ro o a 9 17 Aw 2 19 ro o ro 0 where y is the vector of treatment means y is the grand mean The differences of the treatment means and the grand mean are the estimates of treatment effects if treatment effects are fixed The BLUP is therefore a weighted mean of the data based estimate and the prior
183. arable 111 singularities 106 107 slow processes 196 sparse 106 sparse fixed 87 spatial analysis 277 data 1 model 110 specifying the data 46 split plot design 259 tabulation 32 qualifiers 165 syntax 165 template as file 29 tests of hypotheses 18 Timing processes 197 title line 31 46 TPREDICT 182 trait 41 144 transformation 51 syntax 53 typographic conventions 4 unbalanced data 267 nested design 264 UNIX 183 Unix crashes 188 Unix debugging 219 unreplicated trial 283 variance components functions of 201 variance model combining 12 description 131 forming from correlation models 132 variance parameters 12 relationships 124 variance structure parameters Simple relationships 123 variance structures 32 multivariate 146 VCM 69 Wald F statistics 19 weight 101 weights 41 Working Folder 63 workspace options 189 XFA extension 137 344
184. are erroneous Warning This US structure is not positive definite Warning Unrecognised qualifier at character Warning US matrix was not positive definite MODIFIED Warning User specified spline points Warning Variance parameters were modified by BENDing Warning Likelihood decreased Check gammas and singularities revise the qualifier arguments The issue is to match the declared R structure to the physi cal data Dropping observations which are missing will often usually destroy the pattern Estimating missing values al lows the pattern to be retained Do not accept the estimates printed The FOWN test requested is not calculated because it re sults in different numbers of degrees of freedom to that ob tained for the incremental tests for the terms in the model as fitted the FOWN calculations are based on the reduced design matrix formed for the incremental model ASReml performs the standard conditional test instead The user must reorder swap the terms in the model specification and rerun the job to perform the requested FOWN test the labels for predicted terms are probably out of kilter Try a simpler predict statement If the problem persists send for help check the initial values the qualifier either is misspelt or is in the wrong place the initial values were modified by a bending process the points have been rescaled to suit the data values ASReml may not have converged to th
185. are specified ASReml offers a wide range of variance models to choose from A full listing is in Table 7 6 and details are provided in Chapter 7 2 1 The general linear mixed model 2 1 6 Gamma parameterization for the linear mixed model The sigma parameterization of model 2 3 is one possible parameterization of var y In this parameterization both G o and R o are variance matrices and the variance structure parameters in o and a are referred to as sigmas see above Other parameterizations are possible and are sometimes useful For example in some of the early development of REML for the traditional mixed model of 2 5 the variance matrix was parameterized as the equivalent model b var y o gt Yg ZiZ r 2 6 for 7 being the ratio of the variance component for the random term u relative to error variance that is yg 02 02 In this case ASReml calculated a simple estimate of o and initial values for the iterative process were specified in terms of the ratios yg rather than in terms of the variance components Tie It was often easier to specify initial values in terms of these ratios rather than the variance components which is why this approach was adopted Where R o can be written as a scaled correlation matrix that is R o 0 R 7 this suggests the alternative specification of 2 2 ejor lo eS gtl an where y and y represent the variance structure parameters associated with scaled by o
186. ared in AI matrix Singularity in Average Information Matrix SINGULARITY IN Sorting data by Section Row Sorting the data into field order STOP SCRATCH FILE DATA STORAGE ERROR Structure Factor mismatch Too many alphanumeric factor level labels Too many factors with A or T max 100 Too many max 20 dependent variables Unable to invert R or G US matrix Unable to invert R or G CORR matrix this is a Unix memory error It typically occurs when a mem ory address is outside the job memory The first thing to try is to increase the memory workspace using the WORKSPACE see Section 10 3 on memory command line option Otherwise you may need to send your data and the as files to Customer Sup port for debugging See the discussion on AISINGULARITIES Problem performing the Regression Screen the field order coding in the spatial error model does not gen erate a complete grid with one observation in each cell missing values may be deleted they should be fitted Also may be due to incorrect specification of number of rows or columns ASReml attempts to hold the data on a scratch file Check that the disk partition where the scratch files might be written is not too full use the NOSCRATCH qualifier to avoid these scratch files the declared size of a variance structure does not match the size of the model term that it is associated with if the factor level labels are actually all integer
187. arent with no progeny in the pedigree to be written to basename aif FGEN f indicates the pedigree file has a fourth field containing the level of selfing or the level of inbreeding in a base individual In the fourth field 0 indicates a simple cross 1 indicates selfed once 2 indicates selfed twice etc A value between 0 and 1 for a base individual is taken as its inbreeding value If the pedigree has implicit individuals they appear as parents but not in the first field of the pedigree file they will be assumed base non inbred individuals unless their inbreeding level is set with FGEN f where 0 lt f lt 1 is the inbreeding level of such individuals Individuals with one or both parents unknown and without a specific non zero inbreeding coefficient provided in the fourth filed of the pedigree will are assigned an inbreeding coefficient f 159 8 7 Reading in the pedigree file List of pedigree file qualifiers qualifier description GIV IGIV 2 GOFFSET o IGROUPS g INBRED LONGINTEGER MAKE MEUWISSEN MGS QUAAS REPEAT instructs ASReml to write out the A inverse in the format of giv files GIV 2 writes the pedigree of the parents to basename_Parent ped and the diagonal elements of the A inverse to basename_Q giv with offspring identifiers see Section 8 10 If GROUPS is also specified this giv file will include the GROUPSDF qualifier on its first line An alternative to gr
188. ariables explain it Again the BLUP 1 qualifier might help A program limit has been breached Try simplifying the model use WORKSPACE qualifier to increase the workspace allocation It may be possible to revise the models to increase sparsity factors are probably not declared properly Check the number of levels Possibly use the WORKSPACE qualifier The predict table appears to be too big WORKSPACE or predicting in parts Try increasng occurs when space allocated for the structure table is exceeded There is room for three structures for each model term for which G structures are explicitly declared The error might occur when ASReml needs to construct rows of the table for structured terms when the user has not formally declared the structures Increasing g on the variance header line for the number of G structures see ASReml User Guide Structural Specification will increase the space allocated for the table You will need to add extra explicit declarations also check the pedigree file and see any messages in the output Check that identifiers and pedigrees are in chronological order the A inverse factors are not the same size as the A inverse Delete the ainverse bin file and rerun the job Typically this arises when there is a problem processing the pedigree file Check the details for the distance based variance structure Check the distances specified for the distance based variance structure Try increas
189. ariance Section 2 1 6 fe o 220 ronl 109 7 2 Process to define a consolidated model term where y and represent the variance structure parameters associated with scaled by g variance matrices Under this parameterization var y o ZG y Z Re 7 In this chapter we give a detailed account of variance modeling in ASReml 7 1 Applying variance models to random terms In the previous chapter we showed how to specify the random model terms u in u and associated design matrices and we assumed the effects were IID by using an idv function We can naturally extend this using other functions Some common variance functions are defined in Table 7 1 the full range of variance model functions and their detailed definition is presented in Table 7 6 The models are classified as variance models if they include a scale parameter or as correla tion models if their scale is fixed Except for the giv models correlation models take value 1 on the diagonal Names of correlation models can be appended with v eg idv to add a common variance ie same variance across all rows or with h eg idhQ to allow a separate variance for each row If all of the variables in a term do not have a variance model specified then the default variance model idv will be applied to these variables We further generalise this in Section 7 2 and Table 7 2 by introducing the idea of a consolidated model term that simultaneously defines
190. ast line read was yield mu variety Finished 23 Apr 2014 09 16 54 931 Error parsing yield mu variety ASReml happily reads down to the nin9 asd line This name contains a which is not permitted in a variable name so nin9 asd is expected to be a file name but there is no such file in the working folder The data file is actually nin89 asd 14 3 Things to check in the asr file The information that ASReml dumps in the asr file when an error is encountered is intended to give you some idea of the particular error e if there is no data summary ASReml has failed before or while reading the model line e if ASReml has completed one iteration the problem is probably associated with starting values of the variance parameters or the logic of the model rather than the syntax per se 14 4 An example Briefly the 8 coding errors in the example above in the order they will be detected are 1 filename misspelt there is no file nin9 asd in the working folder 2 unrecognised qualifier should be SKIP 3 Variety has alphabetic level labels but not declared has such A required 4 comma missing from first line of model R Repl is part of the model but not recognised as such 5 misspelt variable label in linear model Repl should be rep1 6 misspelt variable labels in residual model 247 14 4 An example 7 the data has missing cells with respect to the declared residual structure 8 misspelt variable la
191. ates the appropriate inverse R structure arising from the distribution and link function and so in general a residual line is not needed The only exception is in this bivariate case when id units us Trait is needed and us Trait has the three residual components and often the first one associated with the GLM is constrained to an initial value of 1 6 8 1 Generalized Linear Mixed Models This section was written by Damian Collins A Generalized Linear Mixed Model GLMM is an extension of a GLM to include random terms in the linear predictor Inference concerning GLMMs is impeded by the lack of a closed form expression for the likelihood ASReml currently uses an approximate likelihood technique called penalized quasi likelihood or PQL Breslow and Clayton 1993 which is based on a first order Taylor series approximation This technique is also known as Schalls technique Schall 1991 pseudo likelihood Wolfinger and OConnell 1993 and joint maximisation Harville and Mee 1984 Gilmour et al 1985 Implementations of PQL are 104 6 9 Missing values found in many statistical packages for instance in the GLMM Welham 2005 and the IRREML procedures of Genstat Keen 1994 the MLwiN package Goldstein et al 1998 the GLMMIX macro in SAS Wolfinger 1994 and in the GLMMPQL function in R The PQL technique is well known to suffer from estimation biases for some types of GLMMs For grouped binary data with small group sizes
192. ating component 5 component 4 x component 6 where components 4 5 and 6 are variance components from the analysis 12 2 4 A more detailed example The following example for a bivariate sire model is a little more complicated The job file bsiremod as contains coop fmt ywt fat Trait Trait age c brr sex sex age r us Trait id sire us Trait f Tr grp residual id units us Trait VPREDICT DEFINE phenvar id units us Trait us Trait us Trait sire us Trait addvar sire us Trait 4 heritA addvar 1 phenvar 1 heritB addvar 3 phenvar 3 phencorr phenvar gencorr addvar Do mmm The relevant lines of the asr file are 214 12 2 Syntax Model_Term Sigma Sigma Sigma SE C id units us Trait 8140 effects Trait US V 1 2 23 2055 23 20565 44 44 OP Trait Wc 2 T 2 50402 2 50402 18 56 0 P Trait yS y 2 2 1 66292 1 66292 32 02 0 P us Trait id sire 184 effects Trait USV 2 4 1 45821 1 45821 3 66 OP Trait US_G 2 1 0 280280 0 130280 1 92 OP Trait US_V 2 2 0 344381E 01 0 344381E 01 2 03 OP Numbering the parameters reported in bsiremod asr and bsiremod vvp 1 error variance for ywt 2 error covariance for ywt and fat 3 error variance for fat 4 sire variance component for ywt 5 sire covariance for ywt and fat 6 sire variance for fat then F phenvar id units us Trait us Trait us Trait id sire us Trait or F phenvar units us Trait sire us Trait or F phenvar 1 3 4 6 creates new components 7
193. average of the region effects into all of the location means which is not appropriate With ASSOCIATE it knows which trials to average and which region effects to include to form each location mean That is ASReml knows how to construct the trial means including the appropriate region and location effects and which trials means to then average to form the location table However for region means we have a choice We can average the trial means in Table 9 4 according to region obtaining region means of 11 83 and 11 33 or we can average the location means in Table 9 5 to get region means of 12 and 11 The former is the default in ASReml produced by predict region ASSOCIATE region location trial ASAVERAGE trial or equivalently by predict region ASSOCIATE region location trial Again this is base averaging By contrast predict region ASSOC region location trial ASAVE location trial or predict region ASSOC region location trial ASAVE location produces sequential averaging giving region means of 12 and 11 respectively Similarly an overall sequential mean of 11 5 is given by predict mu ASSOC region location trial ASAVE region location while predict mu ASSOC region location trial ASAVE region gives a value of 11 58 being the average of region means 11 83 and 11 33 obtained by averaging trials within regions from Table 9 4 and predict mu ASSOCIATE region location trial ASAVE location predicts mu as 11 38 the aver
194. ay be supplied as the second argument For example at Type TEST Entry where Type is a factor variable with level names TEST and CONTROL at a b creates a series of model terms representing b nested within a for any model term b A model term is created for each level of a each has the size of b For example if site and geno are factors with 3 and 10 levels respectively then at site geno is shorthand for 3 model terms at site 1 geno at site 2 geno at site 3 geno each with 10 levels this is similar to forming an interaction except that a separate model term is created for each level of the first factor this is useful for random terms when each component can have a different variance The same effect is achieved by using an interaction e g site geno and associating a DIAG variance structure with the first component see Section 7 11 any at term to be expanded MUST be the FIRST component of the interaction geno at site will not work at site 1 at year geno will not work but at year at site 1 geno is OK the at factor must be declared with the correct number of levels because the model line is expanded BEFORE the data is read Thus if site is declared as site or site A in the data definitions at site geno will expand to at site 01 geno at site 02 geno regardless of the actual number of sites 6 5 4 Associated Factors Sometimes there is a hierarchical structure to factors which should be recognised as
195. b you are running with extension own for the file written by ASReml and gdg for the file your program writes The type of the parameters is set with the T qualifier see Section 7 7 7 and the control parameter is set using the F 127 7 7 Variance model function qualifiers qualifier F1 applies to the own variance model function With own the argument of F is passed to the MYOWNGDG program as an argument the program can access This is the mechanism that allows several OWN models to be fitted in a single run Ts is used to set the type of the parameters It is primarily used in conjunction with the own variance model function as ASReml knows the type of the parameters in other cases The valid type codes are given in Section 7 7 7 7 7 4 Parameter space constraints Gs Each parameter has an associated constraint code which may be expressed explicitly with the qualifier Gs where s is the code The following is a list of the possible constraint codes code constraint type description P in the space P is the default in most cases and attempts to keep the parameter in the theoretical parameter space It is activated when the update of a parameter would take it outside its space For example if an update would make a variance negative the negative value is replaced by a small positive value Under the GP condition repeated attempts to make a variance negative are detected and the value is then fixed at a small positive
196. basics Statistics and Computing 4 221 234 Patterson H D and Thompson R 1971 Recovery of interblock information when block sizes are unequal Biometrika 31 100 109 Pinheiro J C and Bates D M 2000 Mized Effects Models in S and S PLUS Springer Verlaag Quaas R L 1976 Computing the diagonal elements and inverse of a large numerator relationship matrix Biometrics 32 949 953 Robinson G K 1991 That blup is a good thing The estimation of random effects Statistical Science 6 15 51 Rodriguez G and Goldman N 2001 Improved estimation procedures for multilevel models with binary response A case study Journal of the Royal Statistical Society A General 164 2 339 355 Sargolzaei Iwaisaki and Colleau 2005 A fast algorithm for computing inbreeding coeffi cients in large populations Genetics Selection and Evolution 122 325 331 Schall R 1991 Estimation in generalized linear models with random effects Biometrika 78 4 719 27 Searle S R 1971 Linear Models New York John Wiley and Sons Inc Searle S R 1982 Matrix algebra useful for statistics New York John Wiley and Sons Inc Searle S R Casella G and McCulloch C E 1992 Variance Components New York John Wiley and Sons Inc Self S C and Y L K 1987 Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under non standard conditions Journal of the American Statis
197. bel in predict statement varierty should be variety 1 Data file not found Running this job produces the asr file in Section 14 1 The first problem is that ASReml cannot find a data file nin9 asd in the current working folder as indicated in the error message above the Fault line Since nin9 asd contains a which is not permitted in variable names ASReml checks for a file of this name in the working directory since no path is supplied But ASReml did not find a file with this name ASReml cannot tell whether the filename is misspelt or that an invalid variable name has been specified In this case the data file was given as nin9 asd rather than nin89 asd However ASReml kept going and read the model line which it recognised because of the character The message Fault Error parsing yield mu variety does not mean that the error is in the model yield mu variety but that it recognised this as the model line and gave up because it had not encountered a valid data file line The message Warning Unrecognised qualifier at character 10 nin9 asd ISLIP 1 simply indicates that the qualifier SLIP 1 has not been processed 2 An unrecognised qualifier After correcting the filename we get the fol NIN Alliance Trial 1989 lowing abbreviated output The problem is variety that SKIP 1 which would cause ASReml to skip the first line of the data file was mistyped as SLIP 1 which ASReml failed to recognise yi
198. bia debia daue ERE ee 34 302 Theal file ere Be ee Se ES Se Ce ee ee ae 35 Ce Go S er a ee a ee od a a 37 3 7 Tabulation predicted values and functions of the variance components Data file preparation 4 1 MO ee a ee Ea we Ph eee ee a Pee Bae R ake 4 2 TG data ile o a eR EME eR EHS HESS SEEDER EES 4 2 1 Free format data files oaa a 0000004 42 2 Pred format data files oe e o eee eh ed eo Pee ee Deaan 4 2 5 Preparing data fiks in Excel lt o oo lt s oe ee goewa cpana eee es 4 2 4 Binary format data files aa a Command file Reading the data 5A IO o Awe a e e ESE a a SED we e D aR Da Important muis e iera be ee he e h e i ea a Me hee eee BS 5 3 Te WHS oe ew a a a he wee ee ea eee ee ee eee a e 5 4 Specifying and reading the data 2 0 00 00 00 eee 5 4 1 Data field definition syntax 2 4 2 24562 h8 400d SERRE ES 5 4 2 Storage of alphabetic factor labels 0 0 5 4 3 Ordering factor levels de RAO RE eR Oe eR 544 Skipping input fields lt 4 o eor aoe 6 eo a eG ew eG ew ee ew we 55 Transforming the dala ws es wh hg a 55 1 Transformation Synta o cc sec at 444848 2 oR dR A EGS 5 5 2 QTL marker transformations 64 666 4a ee ee ee 5 5 3 Remarks concerning transformations 5 5 4 Special note on covariates 2 a a a 5 6 OPE te srren ee ey eo Be ed a ee GOR a pA 561 Data Re SYRTEN cc ha ee EE RES ERE EERE EEE ES 5 7 Data file qualifiers oaa ee ee eee ee 5 7 1 Combining rows from separate
199. bic smoothing spline between knot points in the prediction process Since the spline knot points are specifically nominated in the SPLINE line these extra points have no effect on the analysis run time The SPLINE line does not modify the analysis in this example since it simply nominates the 7 ages in the data file The same analysis would result if the SPLINE line was omitted and spl age 7 in the model was replaced with spl age An extract of the output file is 1 LogL 20 9043 S2 48 470 5 df 0 1000E 00 2 LogL 20 9013 S2 49 152 5 df 0 9102E 01 3 LogL 20 8998 S2 49 892 5 df 0 8221E 01 4 LogL 20 8996 82 50 273 5 df 0 7802E 01 Final parameter values G 7892E 01 Results from analysis of cire Akaike Information Criterion 45 80 assuming 2 parameters Bayesian Information Criterion 45 02 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv spl age 7 1 49 98 4896 1232 1 0 Residual Variance 3 51 50 2726 0 0 1 0 Model_Term Gamma Sigma Sigma SE C idv spl age 7 IDV_V 5 0 789210E 01 3 96756 0 40 OP 310 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges Residual SCA_V 7 1 000000 50 2726 1 32 OP Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives Estimate Standard Error T value T prev 3 age 1 0 814772E 01 0 552336E 02 14 75 7 mu 1 24 4378 5 754
200. blem on the SPLINE line It could be a wrong variable name or the wrong number of knot points Knot points should be in increasing order Try increasing workspace The problem may be due to the use of the SORT qualifier in the data definition section The PREDICT statement seems in error the named factor is not present in the model An INCLUDE file could not be opened May be an unrecognised factor model term name or variance structure name or wrong count of initial values possible on an earlier line May be insufficient lines in the job Check your MYOWNGDG program and the gdg file Maybe increase WORKSPACE Messages may identify a prob lem with the pedigree 261 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Failed while ordering equations FORMAT error reading G structure header Factor order G structure ORDER O MODEL GAMMAS G structure size does not match Getting Pedigree GLM Bounds failure Increase declared levels for factor Increase workspace Insufficient data read from file Insufficient points for Insufficient workspace invalid analysis trait number This indicates the job needs more memory than was allocated or is available Try increasing the workspace or simplifying the model Likely causes are bad syntax or inv
201. both the design matrix Z and variance model G in particular allowing G to be the direct product of variance structures In Section 7 4 we further generalise the consolidated model specification to allow the residual variance structure to be the direct sum of variance structures 7 2 Process to define a consolidated model term Consider a linear model term column row comprising the interaction between the single factors column and row We refer to column row as a compound model term If the vari ance structure for column row is the direct product of two matrices the first of which is an IID variance structure that is a scaled identity matrix with dimension equal to the number of levels of the factor column and the second of which is a matrix with dimension equal to the number of levels of the factor row and with elements representing a first order autoregressive correlation structure AR1 then we represent this by the consolidated model term idv column ar1 row This specifies a two dimensional separable spatial variance structure for column row but with spatial correlation in the column direction only A con solidated model term is therefore comprised of component terms each with a variance model function applied to give the required direct product form of the variance structure Table 7 2 demonstrates how to build consolidated terms in ASReml for a small selection of examples The linear model term single or compound is first identified
202. by the TOTAL qualifier the multinomial model requires y Xi Yj a particular variance structure across the multinomial classes This is formally Li E Y and specified as residual id units mthr Trait Hi Hi 1 102 6 8 Generalized Linear Mixed Models Table 6 4 GLM distribution qualifiers qualifier action The multinomial threshold model is fitted as a cumulative probability model The proportions y r n in the ordered categories are summed to form the cumu lative proportions Y which are modelled with logit LOGIT probit PROBIT or Complementary LogLog CLOG link functions The implicit residual variance on the underlying scale is 77 3 3 3 underlying logistic distribution for the logit link 1 for the probit link The distribution underlying the Complementary LogLog link is the Gumbel distribution with implicit residual variance on the underlying scale of 72 6 1 65 For example Lodging MULTINOMIAL 4 CUMULATIVE Trait Variety r block predict Variety where Lodging is a factor with 4 ordered categories Predicted values are reported for the cumulative proportions POISSON LOGARITHM IDENTITY SQRT v p Natural logarithms are the default link function d 2 yln y t ASReml assumes the Poisson variable is not negative y H IGAMMA INVERSE IDENTITY LOGARITHM PHI TOTAL n v u lon The inverse is the default link function n is defined with the TOTAL q
203. c moves the last character read pointer to line position c so that the next field starts at position c 1 For example TO goes back to the beginning of the line e the string D invokes debug mode A format showing these components is FORMAT D 314 8X A6 3 2x F5 2 4x BZ 2011 and is suitable for reading 27 fields from 2 data records such as 111122223333xxxxxxxxALPHAFxx 4 12xx 5 32xx 6 32 xxxx123 567 901 345 7890 63 5 7 Data file qualifiers Table 5 2 Qualifiers relating to data input and output qualifier action IMERGE c f SKIP n IMATCH a b READ n RECODE ROWFACTOR v ROWFAC v IRREC n may be specified on a line following the datafile line The purpose is to combine data fields from the primary data file with data fields from a secondary file f This MERGE qualifier has been superseded by the much more powerful MERGE statement see Chapter 11 The effect is to open the named file skip n lines and then insert the columns from the new file into field positions starting at position c If IMATCH a b is specified ASReml checks that the field a 0 lt a lt c has the same value as field b If not it is assumed that the merged file has some missing records and missing values are inserted into the data record and the line from the MERGE file is kept for comparison with the next record It is assumed that the lines in the MERGE file are in the same order as the corresponding lines
204. ccessed by subsequent parts of the same job using TPn This was added to facilitate location of putative QTL Gilmour 2007 182 9 3 Prediction Table 9 1 List of prediction qualifiers qualifier action TWOSTAGEWEIGHTS VPV is intended for use with variety trials which will subsequently be combined in a meta analysis It forms the variance matrix for the predictions inverts it and writes the predicted variety means with the corresponding diagonal elements of this matrix to the pvs file These values are used in some variety testing programs in Australia for a subsequent second stage analysis across many trials Smith et al 2001 A data base is used to collect the results from the individual trials and write out the combined data set The diagonal elements scaled by the variance which is also reported and held in the data base are used as weights in the combined analysis requests that the variance matrix of predicted values be printed to the pvs file PLOT graphic control qualifiers This functionality was developed and this section was written by Damian Collins The PLOT qualifier produces a graphic of the predictions Where there is more than one prediction factor a multi panel trellis arrangement may be used Alternatively one or more factors can be superimposed on the one panel The data can be added to the plot to assist informal examination of the model fit With no plot options ASReml c
205. ce model function name given in Table 7 6 for example for a factor row exp row is an exponential correlation model with a single correlation parameter to specify an homogeneous variance model append a v to the variance model function name for example expv row is an exponential variance model with 2 parameters correlation and variance to specify a heterogeneous variance model append an h to the variance model function name for example coruh site is a variance matrix with different variances for each site but the same correlation for all pairs of sites Important See Section 7 4 for rules on combining variance models and Section 7 7 5 for important notes regarding initial values 7 11 2 Non singular variance matrices For REML estimation ASReml needs to invert each variance matrix For this it requires that the matrices be negative definite or positive definite They must not be singular Negative definite matrices will have negative elements on the diagonal of the matrix and or its inverse There are two exceptions the XFA model which has been specifically designed to fit singular matrices Thompson et al 2003 page 144 and singular relationship matrices described in 141 7 11 Variance model functions available in ASReml Chapter 8 3 1 If an estimated matrix comes too close to being singular ASReml will stop iterating Let Ax represent an arbitrary quadratic form for x x The quadratic
206. cord after transformation Thus A B LagA V4 V4 A reads two fields A and B and constructs LagA as the value of A from the previous record by extracting a value for LagA from working variable V4 before loading V4 with the current value of A 5 5 1 Transformation syntax Transformation qualifiers have one of seven forms namely operator to perform an operation on the current field for example absY ABS to take absolute values operator value to perform an operation involving an argument on the cur rent field for example logY Y 0 copies Y and then takes logs operator V field to perform an operation on the current field using the data in another field for example V2 to subtract field 2 from the current field V target to reset the focus for subsequent transformations to field number target TARGET target to reset the focus for subsequent transformations to the pre viously named field target V target value to set the target field to a particular value 1V target V field to overwrite the data in a target field by the data values of another field a special case is when field is O instructing ASReml to put the record number into the target field e operator is one of the symbols defined in Table 5 1 e value is the argument a real number required by the transformation e V is the literal character and is followed by the number target or field of a data field the data field is used or modifie
207. creases to the correct value indent them to avert this message user nominated more levels than are permitted constraint parameter is probably wrongly assigned fix the argument The model term Trait was not present in the multivariate analysis model you may need more iterations restart to do more iterations see CONTINUE The computed LogL value is occasionally very large in mag nitude but our interest is in relative changes Reporting relative to an offset ensures that differences at the units level are apparent missing cells are normally not reported consider setting levels correctly the limit is 100 PREDICT statements because it contains errors if you really want to fit this term twice create a copy with another name gives details so you can check ASReml is doing what you intend that is these standard errors are approximate use the correct syntax the A fields will be treated as factors but are coded as they appear in the binary file use correct syntax 257 14 5 Information Warning and Error messages Table 14 2 List of warning messages and likely meaning s warning message likely meaning Warning The X Y G qualifiers are ignored There is no data to plot Warning Warning The default action with missing values in multivariate data Warning The estimation was ABORTED Warning The FOWN test of is not calculated Warning The labels for predictions
208. critical value for a xy variate with 1 degree of freedom The distribution of the REMLRT for the test that k variance components are zero or tests involved in random regressions which involve both variance and covariance components involves a mixture of x variates from 0 to k degrees of freedom See Self and Liang 1987 for details Tests concerning variance components in generally balanced designs such as the balanced 16 2 4 Inference Random effects one way classification can be derived from the usual analysis of variance It can be shown that the REMLRT for a variance component being zero is a monotone function of the F statistic for the associated term To compare two or more non nested models we can evaluate the Akaike Information Cri teria AIC or the Bayesian Information Criteria BIC for each model These are given by AIC 2 pri 2t BIC 2lri t logy 2 22 where t is the number of variance parameters in model and v n p is the residual degrees of freedom AIC and BIC are calculated for each model and the model with the smallest value is chosen as the preferred model 2 4 2 Diagnostics In this section we will briefly review some of the diagnostics that have been implemented in ASReml for examining the adequacy of the assumed variance matrix for either R or G structures or for examining the distributional assumptions regarding e or u Firstly we note that the BLUP of the residual vector is
209. cture or to implicit residual variance parameters The VCC syntax is required for these cases 125 7 7 Variance model function qualifiers Table 7 5 Examples of constraining variance parameters in ASReml ASReml code action ABACBADCBA constrains all parameters corresponding to A to be equal similarly for B and C The fourth parameter symbol D is only associated with one parameter and can be replaced by 0 to indicate that it is unconstrained This sequence applied to an unstructured US 4 4 matrix would make it banded that is A BA CBA DCBA this example defines a structure for the genotype by site interaction effects in a multi environment trial with 3 sites in which the genotypes are independent random effects within sites but are correlated across sites with equal covariance The initial value for the common co variance is 0 1 us site GP OAOAAO IINIT eS od wt od oh 233 a factor analytic model of order 2 for 4 sites with equal variance across sites is specified using this code For the fak variance model functions ASReml orders the param eters as the loadings followed by the specific variances In this example the first loading in the second factor is con strained to be equal to zero for identifiability P restricts the magnitude of the loadings and the variances to be positive fa2 site G4PZ3P4P 00000000VVVV INIT 4 9 O 3 1 4 2 gen code for a factor analytic model of order 2 for 4 s
210. cture for rows and idv column models the IDV variance structure for columns The consolidated model term idy column ari row directly mirrors the algebraic form var e 02 I Er pr Important points e the same residual variance structure could be achieved by specifying id column ariv row which mirrors the alternate but equivalent algebraic form var e I 02 X pr It is arbitrary which variable the common variance is attached to column in the code box row in the latter see Section 7 4 on identifiability if the correlation structure id column ar1 row was specified ASReml would automat ically add a common variance to model var e 077 pr see Section 7 4 If mv is now included in the model specification This tells ASReml to estimate the missing values The f before mv indicates that the missing values are fixed effects in the sparse set of terms An equivalent way of specifying this model is yield mu variety mv r idv repl where mv is the last fixed effect term and ASReml will include mv and succeeding terms in the sparse set ASReml would report an error if the consolidated model term idv column ariv row was specified this would correspond to var e 021 02 p and o2 and o are unidentifiable in this case that is it is not possible to estimate them separately 119 7 5 A sequence of variance structures for the NIN data e this is a univariate analysis i
211. d as after basename in the example italic font is used to name information to be supplied by the user for example basename stands for the name of a file with an as filename extension square brackets indicate that the enclosed text and or arguments are not always re quired Do not enter these square brackets e ASReml output is in this size and font see page 34 e this font is used for all other code 2 Some theory 2 1 The general linear mixed model If y n x 1 denotes the vector of observations the general linear mixed model can be written as y XT Zut e 2 1 where 7 p x 1 is a vector of fixed effects X n x p is the design matrix of full column rank that associates observations with the appropriate combination of fixed effects u q x 1 is a vector of random effects Z n x q is the design matrix that associates observations with the appropriate combination of random effects and e n x 1 is the vector of residual errors 2 1 1 Sigma parameterization of the linear mixed model Model 2 1 is called a linear mixed model or linear mixed effects model It is assumed ejl SE nvea 2a where the matrices G and R are variance matrices for u and e and are functions of pa rameters o and o This requires that the random effects u and residual errors e are uncorrelated The variance matrix for y is then of the form var y ZG o Z R o 2 3 which we will refer to as the sigma parameterization of the G a
212. d allowed specifi cation of parametric constraints and relationships equality and scale between parameters to be defined This parametric information was interspersed within the structure definition Release 4 allows an alternative way of specifying this parametric information essentially con structing a table in a tsv file with the rows labelled by the specific parameters columns for initial values and parametric constraints and two columns that allow specification of relationships This tsv file is written by ASReml after the input file has been parsed using to represent initial values and setting MAXITER 0 gives an easy construction Once the tsv file has been edited it can be read by inserting TSV on the data file line As an example Wolfinger Rat data treat A wtO wti wt2 wt3 wt4 subject V0 wolfrat dat skip 1 ASUV MAXITER 0 wtO wti wt2 wt3 wt4 Trait treat Trait treat 1 2 9 27 O ID error variance Trait 0 US indicates generates initial values generates a tsv file This tsv file is a mechanism for resetting initial parameter values by changing the values here and rerunning the job with TSV You may only change values in the last 4 fields Fields are GN Term Type PSpace Initial_value RP_GN RP _scale 136 7 9 Ways to present initial values to ASReml 5 units ue Trait us Trait 1 G P 4 7911110 5 1 6 units us Trait us Trait 2 G P 5 0231481 6 1 7 un
213. d are the numbers or names of existing components Ug Up Ve and vg and cp is a multiplier for v m is a number greater than the current length of v to flag the special case of adding the offset k When using the component numbers the form a b can be used to reference blocks of components as in F label a b k c d The instructions in the ASReml code box corresponds to a simple sire model so that variance component 1 is the Sire variance and variance component 2 is the residual variance then F phenvar 1 2 or F phenvar idv Sire idv units creates a third component called phenvar which is the sum of the variance components that is the phenotypic variance F genvar 1 4 or F genvar idv Sire 4 creates a fourth component called genvar which is the sire variance component multiplied by 4 that is the genotypic variance 212 12 2 Syntax Ratios or in particular cases heritabilities are Y lt mu tr iav Sire requested by function lines beginning with an residual idv units H The specific form of the directive is VPREDICT DEFINE F phenvar idv Sire idv units H label n d F genvar idv Sire 4 R herit genvar phenvar This calculates 02 07 and se o o7 where n and d are the names of the components or integers pointing to components v and vg that are to be used as the numerator and denom inator respectively in the heritability calculation Note that covariances between ratios and other components are
214. d as it is running it will attempt to restart the job with increased workspace If the system has already allocated all available memory the job will stop 10 3 7 Examples ASReml code action asreml LW64 rat as increase workspace to 64 Mbyte send screen output to rat asl and sup press interactive graphics asreml IL rat as send screen output to rat asl but display interactive graphics asreml N rat as allow screen output but suppress interactive graphics asreml ILW512 increase workspace to 512 Mbyte send screen output to rat asl but rat as display interactive graphics asreml rwi coop wwt runs coop as twice using 1Gbyte workspace and writing results to ywt coopwwt as and coopywt as and substituting wwt and ywt for 1 in the two runs 10 4 Advanced processing arguments 10 4 1 Standard use of arguments Command line arguments are intended to facilitate the running of a sequence of jobs that require small changes to the command file between runs The output file name is modified by the use of this feature if the R option is specified This use is demonstrated in the Coopworth example of Section 15 10 199 10 4 Advanced processing arguments Command line arguments are strings listed on the command line after basename the com mand file name or specified on the top job control line after the ARGS qualifier These strings are inserted into the command file at run time When the input routine finds a n in the com
215. d by the data summary You should al ways check the data summary to ensure that row column nin89 asd skip 1 yield mu variety the correct number of records have been de jp Repl tected and the data values match the names residual ari Row ar1 Col appropriately predict varierty The problem is that R Repl is meant to be part of the linear model but it is on a separate line and the first part of the model on the preceding line does not end with a COMMA to indicate that the model is incomplete Appending a comma to the first model line resolves this problem Folder C Users Public ASRem1 Docs Manex4 ERR variety A QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Univariate analysis of yield Summary of 224 records retained of 224 read Model term Size miss zero MinNonO Mean MaxNonO StndDevn 1 variety 56 0 0 1 28 5000 56 2 ad O 0 1 000 28 50 56 00 16 20 3 pid 0 0 1101 2628 4156 I2 4 raw 0 0 21 00 610 5 840 0 149 0 5 repl 4 0 0 1 2 5000 4 6 nloc 0 O 4 000 4 000 4 000 0 000 7 yield Variate 0 O 1 050 25 53 42 00 7 450 8 lat 0 O 4 300 Af raz 47 30 12 90 9 long 0 O 1 200 14 08 26 40 7 698 10 row Ze 0 0 l 17321 22 11 column 11 Q 0 1 6 3304 11 12 mu 1 QUALIFIERS R Repl Fault Error in variance header line R Repl Last line read was IR Repl 0 0 0 0 ninerr4 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu 1 12 factors defin
216. d depending on the context e Vfield may be replaced by the label of the field if it already has a label e in the first three forms the operation is performed on the current field this will be the field associated with the label unless the focus has been reset by specifying a new target in a preceding transformation 53 5 5 Transforming the data e the last four forms change the focus of subsequent transformations to target e in the last two forms a value is assigned to the target field For example V22 V1i1 copies existing field 11 into field 22 Such a statement would typically be followed by more transformations If there are fewer than 22 variables labelled then V22 is used in the transformation stage but not kept for analysis e only the DOM and RESCALE transformations automatically process a set of variables defined with the G field definition All other transformations always operate on only a single field Use the DO ENDDO transformations to perform them on a set of variables Table 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples I v used to overwrite create a variable half 0 5 with v It usually implies the vari zero 0 able is not read see examples on page 52 I l bx v usual arithmetic meaning note yield 10 17 that 0 0 gives 0 but v 0 gives a missing value where v is not 0 Le v raises the data which must be pos yield itive
217. dd or mmdd into days Jyyd converts a date in the form ccyyddd or yyddd into days These calculate the number of days since December 31 1900 and are valid for dates from January 1 1900 to De cember 31 2099 note that if cc is omitted it is taken as 19 if yy gt 32 and 20 if yy lt 33 the date must be entirely numeric charac ters such as may not be present but see DATE IMv converts data values of v to missing if M is used after A or T v should refer to the encoded fac tor level rather than the value in the data file see also Section 4 2 the maximum minimum and mod ulus of the field values and the value v assigns Haldane map positions s to marker variables and imputes miss ing values to the markers see be low replaces any missing values in the variate with the value v If v is an other field its value is copied replaces the variate with normal random variables having variance v replaces data values o with n in the current variable I e IF DataValue EQ o DataValue n rescales the column s in the current variable G group of variables us ing Y Y 0 s sets the seed for the random number generator 56 yield M 9 yield M lt 0 M gt 100 yield MAX 9 ChrAadd G 10 MM i Rate NA O WT Wt2 NA Wtt Ndat 0 Normal 4 5 is equivalent to Ndat Normal 4 5 Rate REPLACE 9 0 Rate RESCALE 10 0 1 ISEED 848586 5 5 Transforming the data Table
218. ded to avoid confusion 5 5 Transforming the data Transformation is the process of modifying the data for example dividing all of the data values in a field by 10 forming new variables for example summing the data in two fields or creating temporary data for example a test variable used to discard some records from analysis and subsequently discarded Occasional users may find it easier to use a spreadsheet to calculate derived variables than to modify variables using ASReml transformations Transformation qualifiers are listed after data field labels and the field_type if present They define an operation e g often involving an argument a constant or another variable which is performed on a target variable By default the target is the current field but can be changed with the TARGET qualifier For a G group of variables the target is the first variable in the set Using transformations will be easier if you understand the process As ASReml parses the variable definitions it sequentially assigns them column positions in the internal data vector It notes which is the last variable which is not created by say the transformation and that determines how many fields are read from the data file unless overridden by READ qualifier in Table 5 2 ASReml actually reads the data file after parsing the model line It reads a line into a temporary vector performs the transformations in that vector and saves the values tha
219. default value of n is 1000 so that points closer than 0 1 of the range are regarded as the same point 83 5 8 Job control qualifiers Table 5 6 List of very rarely used job control qualifiers qualifier action KNOTS n NOCHECK NOREORDER NOSCRATCH POLPOINTS PPOINTS n REPORT ISCALE 1 SCORE changes the default knot points used when fitting a spline to data with more than n different values of the spline variable When there are more than n default 50 points ASReml will default to using n equally spaced knot points forces ASReml to use any explicitly set spline knot points see SPLINE even if they do not appear to adequately cover the data values prevents the automatic reversal of the order of the fixed terms in the dense equations and possible reordering of terms in the sparse equations forces ASReml to hold the data in memory ASReml will usually hold the data on a scratch file rather than in memory In large jobs the system area where scratch files are held may not be large enough A Unix system may put this file in the tmp directory which may not have enough space to hold it affects the number of distinct points recognised by the pol model func tion Table 6 1 The default value of n is 1000 so that points closer than 0 1 of the range are regarded as the same point influences the number of points used when predicting splines and poly nomials The design matrix ge
220. defining factor names and improved facilities for reading relationship matrices and better explanation of a simpler way of constructing variances of functions of parameters Among the developments associated with analysis are making it easier to specify functions of variance parameters using names rather than numbers fitting factor effects with large random regression models such as commonly used with marker data fitting linear rela tionships among variance structure parameters and calculating information criteria The developments associated with output include writing out design matrices A major devel opment in Release 4 is an alternative model specification using a functional approach Prior to Release 4 a structural specification was used in which variance models were applied by imposing variance structures on random model terms and or the residual error term after the mixed model had been specified In this case the variance models were presented in a ii separate part of the input file The functional specification offers an alternative to the struc tural specfication in which the variance structures for random model terms and the residual error term are specified in the linear mixed model definition by wrapping terms with the required variance model function This approach is more concise less error prone and more automatic for specifying multi section residual variances The data sets and ASReml input used in this guide are available fro
221. dominant spatial processes are aligned with rows columns as occurs in field experiments Geometric anisotropy is discussed in most geostatistical books Webster and Oliver 2001 Diggle et al 2003 but rarely are the anisotropy angle or ratio estimated from the data Similarly the smoothness parameter v is often set a priori Kammann and Wand 2003 Diggle et al 2003 However Stein 1999 and Haskard 2006 demonstrate that v can be reliably estimated even for modest sized data sets subject to caveats regarding the sampling design 143 7 11 Variance model functions available in ASReml The syntax for the Mat rn class in ASReml is given by MATk where k is the number of parameters to be specified the remaining parameters take their default values Use the G qualifier to control whether a specified parameter is estimated or fixed The order of the parameters in ASReml with their defaults is v 0 5 6 1 a 0 A 2 For example if we wish to fit a Mat rn model with only estimated and the other parameters set at their defaults then we use MAT1 MAT2 allows v to be estimated or fixed at some other value for example mat2 fac xcoord ycoord INIT 0 2 1 0 GPF ie The parameters and v are highly correlated so it may be better to manually cover a grid of v values We note that there is non uniqueness in the anisotropy parameters of this metric d since inverting and adding 5 to a gives the same distance This non uniqu
222. dvantages arising from a balanced spatial layout can be exploited The equations for mv and any terms that follow are always included in the sparse set of equations Missing values are handled in three possible ways during analysis see Section 6 9 In the simplest case records containing missing values in the response variable are deleted For multivariate including some repeated measures analysis records with missing values are not deleted but ASReml drops the missing observation and uses the appropriate unstructured R inverse matrix For regular spatial analysis we prefer to retain separability and therefore estimate the missing value s by including the special term mv in the model out n out n t establishes a binary variable which is out i 1 if data relates to observation i trait 1 else is 0 out i t 1 if data relates to observation i trait t else is 0 The intention is that this be used to test remove single observations for example to remove the influence of an outlier or influential point Possible outliers will be evident in the plot of residuals versus fitted values see the res file and the appropriate record numbers for the out term are reported in the res file Note that i relates to the data analysed and will not be the same as the record number as obtained by counting data lines in the data file if there were missing observations in the data and they have not been estimated To drop records based on the record
223. e that the data file is misnamed Check the argument There is probably a problem with the output from MY OWNGDG Check the files including the time stamps to check the gdg file is being formed properly if you read less data than you expect there are two likely expla nations First the data file has less fields than implied by the data structure definitions you will probably read half the ex pected number Second there is an alphanumeric field where a numeric field is expected check the STEP qualifier argument either all data is deleted or the model fully fits the data error with the variance header line Often some other error has meant that the wrong line is being interpreted as the variance header line Commonly the model is written over several lines but the incomplete lines do not all end with a comma an error reading the error model Maybe you need to include mv in the model to stop ASReml discarding records with missing values in the response variable Without the ASUV qualifier the multivariate error variance MUST be specified as US Apparently ASReml could not open a scratch file to hold the transformed data On unix check the temp directory tmp for old large scratch files 265 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Segmentation fault Singularity appe
224. e M I Ir M I ral 5 8 Job control qualifiers Table 5 4 List of occasionally used job control qualifiers qualifier action MVINCLUDE MVREMOVE NODISPLAY PVAL v p PVAL f ulist Restrictions The key field MUST be numeric In particular if the data field it relates to is either an A or I encoded factor the original uncoded level labels may not specified in the MBF file Rather the coded levels must be specified The MBF file is processed before the data file is read in and so the mapping to coded levels has not been defined in ASReml when the MBF file is processed although the user can must anticipate what it will be Comment If this MBF process is to be used repeatedly for example to process a large set of marker variables in conjunction with CYCLE processing will be much faster if the markers variables are in separate files ASReml will read 10 files containing a single field much faster than reading a single file containing 400 fields ten times to extract 10 different markers Also note that the file may be a binary file and will be read much quicker than a formatted file A binary file may be formed in a previous run using SAVE When missing values occur in the design ASReml will report this fact and abort the job unless MVINCLUDE is specified see Section 6 9 then missing values are treated as zeros Use the DV transformation to drop the records with the missing values instructs AS
225. e best estimate a common reason is that some constraints have restricted the gammas Add the GU qualifier to any factor definition whose gamma value is approaching zero or the correlation is approaching 1 Alternatively more singularities may have been detected You should identify where the singularities are expected and modify the data so that they are omitted or consistently detected One possibility is to centre and scale covariates involved in interactions so that their standard deviation is close to 1 258 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy COLFAC confusion ROWFAC confusion PRINT Cannot open output file SUBSECTION not permitted AINV GIV matrix undefined or wrong size ALNORM Error Apparent error in pedigree relationships ASReml command file is EMPTY ASReml failed in at string too long Badly formed model term CALC reference to large Check IDV structure Context of read error Data Error At record Continue from rsv file Convergence failed ICOLFAC ROWFAC arguments contradict RESIDUAL state ment order If the variables have the correct names reverse the order Check filename This variance structure qualifier is only permitted in single sec tion RESIDUAL structures Check the size of the factor associated with the
226. e component for this term should be positive Had we mistakenly specified level 1 then ASReml would have estimated a negative component by setting the GU option for this term The portion of the ASReml output for this analysis is 5 LogL 343 759 S2 1 2242 262 df 1 components restrained 6 LogL 343 577 52 1 1738 262 df 1 components restrained 7 LogL 343 543 S2 1 1559 262 df 8 LogL 343 535 S2 1 1469 262 df 9 LogL 343 535 S2 1 1451 262 df Results from analysis of sqrt rootwt Akaike Information Criterion 705 07 assuming 9 parameters Bayesian Information Criterion 737 18 302 15 8 Paired Case Control study Rice Model_Term Gamma Sigma Sigma SE C idv variety IDV_V 44 1 89172 2 16613 2 99 QF idv run IDV_V 66 0 296929 0 340000 0 62 QP idv pair IDY V 132 0 871770 0 998227 2 64 OP idv uni tmt 2 IDV_V 264 0 144454 0 165408 0 27 OP idv units 264 effects Residual SCA_V 264 1 000000 1 14506 2 19 OP diag tmt id variety 88 effects tmt DIAG_V 1 1 09032 1 24848 2 21 OP tmt DIAG_V 2 0 148952E 05 0 170558E 05 2 79 OB diag tmt id run 132 effects tmt DIAG_V 1 1 25736 1 43975 2 25 QF tmt DIAG_V 2 1 86671 2 13750 2 97 OF Warning Code B fixed at a boundary GP F fixed by user liable to change from P to B P positive definite C Constrained by user VCC U unbounded S Singular Information matrix S means there is no information in the data for this parameter Very small components with Com
227. e components suppress screen output repeat run for each argument renaming output file names set workspace size over ride y variate specified in the command file with variate number v reports current license details requests that the main output from the asr pvs and sln files be also written in the xml1 file 195 10 3 Command line options 10 3 1 Prompt for arguments A A ASK makes it easier to specify command line options in Windows Explorer One of the options available when right clicking a as file invokes ASReml with this option ASReml then prompts for the options and arguments allowing these to be set interactively at run time With ASK on the top job control line it is assumed that no other qualifiers are set on the line For example a response of hoor 12 35 would be equivalent to ASReml h22r basename 1 2 3 10 3 2 Output control B OUTFOLDER XML B b BRIEF b suppresses some of the information written to the asr file The data summary and regression coefficient estimates are suppressed by the options B B1 or B2 This option should not be used for initial runs of a job before you have confirmed by checking the data summary that ASReml has read the data as you intended Use B2 to also have the predicted values written to the asr file instead of the pvs file Use B 1 to get BLUE estimates reported in asr file OUTFOLDER path allows most of the output files to be written to a folder o
228. e defined in terms of X and Y axes hy x 2 hy Yi Yj Sx Cos a hy sin a hy sy sin a h cos a hy d d s gt s d 1 For a given v the range parameter affects the rate of decay of p with increasing d The parameter v gt 0 controls the analytic smoothness of the underlying process us the process being v 1 times mean square differentiable where v is the smallest integer greater than or equal to v Stein 1999 page 31 Larger v correspond to smoother processes ASReml uses numerical derivatives for v when its current value is outside the interval 0 2 5 When v m z with m a non negative integer pm is the product of exp d and a polynomial of degree m in d Thus v 5 yields the exponential correlation func tion pm d Q 5 exp d and v 1 yields Whittle s elementary correlation function pm d 1 d Ki d Webster and Oliver 2001 When v 1 5 then pau d 1 5 exp d 1 d which is the correlation function of a random field which is continuous and once differentiable This has been used recently by Kammann and Wand 2003 As v oo then pm tends to the gaussian correlation function The final metric parameter A is not estimated by ASReml it has default value of 2 for Euclidean distance Setting A 1 provides the cityblock metric which together with v 0 5 models a separable AR1xAR1 process Cityblock metric may be appropriate when the
229. e different from the fitted matrix because BLUPs are shrunken phenotypes The BLUPs matrix retains much of the character of the phenotypes the rescaled has the variance from the fitted and the covariance from the BLUPs and might be more suitable as an initial matrix if the variances have been estimated The BLUPs and rescaled matrices should not be reported relevant portions of the estimated variance matrix for each term for which an R structure or a G structure has been associated a variogram and spatial correlations for spatial analysis the spatial correlations are based on distance between data points see Gilmour et al 1997 the slope of the log absolute residual on log predicted value for assessing possible mean variance relationships and the location of large residuals For example SLOPES FOR LOG ABS RES ON LOG PV for section 1 0 99 2 01 4 34 produced from a trivariate analysis reports the slopes A slope of b suggests that y 231 13 4 Other ASReml output files might have less mean variance relationship If there is no mean variance relation a slope of zero is expected A slope of 5 suggests a SQRT transformation might resolve the depen dence a slope of 1 means a LOG transformation might be appropriate So for the 3 traits log y1 ya and y3 are indicated This diagnostic strategy works better when based on grouped data regressing log standard deviation on log mean Also STND RES 16 2 35 6 58 5 64
230. e field width is no longer restricted See TXTFORM for more detail increases the amount of information reported on the residuals obtained from the analysis of a two dimensional regular grid field trial The infor mation is written to the res file 81 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action TABFORM n TXTFORM n TWOWAY IVCC n VGSECTORS s YHTFORM f 1YSS r controls form of the tab file TABFORM 1 is TAB separated tab becomes _tab txt TABFORM 2 is COMMA separated tab becomes _tab csv TABFORM 3 is Ampersand separated tab becomes _tab tex See TXTFORM for more detail sets the default argument for PVSFORM SLNFORM TABFORM and YHTFORM if these are not explicitly set TXTFORM or TXTFORM 1 re places multiple spaces with TAB and changes the file extension to say _sln txt This makes it easier to load the solutions into Excel ITXTFORM 2 replaces multiple spaces with COMMA and changes the file extension to say _sln csv However since factor labels sometimes con tain COMMAS this form is not so convenient TXTFORM 3 replaces multiple spaces with Ampersand appends a double backslash to each line and changes the file extension to say _sln tex Latex style Additional significant digits are reported with these formats Omitting the qualifier means the standard fixed field format is used For yht and sln fi
231. e has the matrix presented lower triangle rowwise with each row begin ning on a new line e a sparse format file must be free format 11 1 with three numbers per line namely i i row column value 441 defining the lower triangle row wise of the 5 1 0666667 matrix 6 5 0 2666667 6 6 1 0666667 7 T 1 0666667 e the file must be sorted column within row 8 7 0 2666667 8 8 1 0666667 e every diagonal element must be present 279 d vopebor ae 10 9 0 2666667 missing off diagonal elements are assumed 10 10 1 0666667 to be zero cells 11 11 1 0666667 12 11 0 2666667 e the file is used by associating it with a fac 12 12 1 0666667 tor in the model The number and order of the rows must agree with the size and order of the associated factor e the SKIP n qualifier tells ASReml to skip n header lines in the file The giv file presented in the code I 0 box gives the G inverse matrix on ior 02i the right mi i 0 4 8 coger L007 The easiest way to ensure the variable is coded to match the order of the GRM file is to supply a list of level names in the variable definition For example genotype A L Gorder txt would code the variable genotype to agree with the order of level names present in the file Gorder txt which would be the order used in creating the GRM GIV matrix If the file has a grm file extension ASReml will invert the GRM matrix If it is not Positive Definite the job will abort unless an appro
232. e job may just consist of a title line and MERGE directives The MERGE qualifier on the other hand combines information from two files into the internal data set which ASReml uses for analysis and does not save it to file It has very limited in functionality The files to be merged must conform to the following basic structure e the data fields must be TAB COMMA or SPACE separated e there will be one heading line that names the columns in the file e the names may not have embedded spaces e the number of fields is determined from the number of names e missing values are implied by adjacent commas in comma delimited files Otherwise they are indicated by NA or as in normal ASReml files e the merged file will be TAB separated if a txt file COMMA separated if a csv file and SPACE separated otherwise 11 2 Merge Syntax The basic merge command is 207 11 2 Merge Syntax MERGE filel WITH file2 TO newfile Typically files to be merged will have common key fields In the basic merge KEY not specified any fields having the same names are taken as the key fields and if the files have no fields in common they are assumed to match on row number Fields are referenced by name case sensitive The full command is MERGE file1 KEY keyfields KEEP SKIP fields WITH file2 KEY keyfields KEEP NODUP SKIP fields ITO newfile CHECK SORT Warning Fields in the merged file will
233. e precision Factor names are held in a v11 file see ISAVE below 80 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action ISAVE n I SCREEN n SMX m SLNFORM n I SPATIAL The file will not be written from a spatial analysis two dimensional error when the data records have been sorted into field order because the residuals are not in the same order that the data is stored The residual from a spatial analysis will have the units part added to it when units is also fitted The drs file could be renamed with extension db1 and used for input in a subsequent run instructs ASReml to write the data to a binary file The file asrdata bin is written in single precision if the argument n is 1 or 3 asrdata dbl is written in double precision if the argument n is 2 or 4 the data values are written before transformation if the argument is 1 or 2 and after transformation if the argument is 3 or 4 The default is single precision after transformation see Section 4 2 When either SAVE or RESIDUALS is specified ASReml saves the factor level labels to a basename v1l and attempts to read them back when data input is from a binary file Note that if the job basename changes between runs the v11 file will need to be copied to the new basename If the v11 file does not match the factor structure i e the same factors in the same order reading the v11 file is abort
234. e remedy Variance structure is not Use better initial values or a structured variance matricx that positive definite is positive definite XFA model not permitted in R You may use FA or FACV The R structure must be positive structures definite XFA may not be used as an R structure 267 15 Examples 15 1 Introduction In this chapter we present the analysis of a variety of examples The primary aim is to illustrate the capabilities of ASReml in the context of analysing real data sets We also discuss the output produced by ASReml and indicate when problems may occur Statistical concepts and issues are discussed as necessary but we stress that the analyses are illustrative not prescriptive 15 2 Split plot design Oats The first example involves the analysis of a split plot design originally presented by Yates 1935 The experiment was conducted to assess the effects on yield of three oat varieties Golden Rain Marvellous and Victory with four levels of nitrogen application 0 0 2 0 4 and 0 6 cwt acre The field layout consisted of six blocks labelled I II HI IV V and VI with three whole plots each split into four sub plots The three varieties were randomly allocated to the three whole plots while the four levels of nitrogen application were randomly assigned to the four sub plots within each whole plot The data is presented in Table 15 1 Table 15 1 A split plot field trial of oat varieties and nitrogen application
235. e same line because of the way ASReml processes the command file Example 7 1 Random coefficient regression In the first order random coefficient regression model it is required to specify a covariance between the intercept and slope for each subject to ensure translation invariance that is equivalent variance parameter estimates for addition of any constant to the independent variable For example in a random coefficient regression where a set of random intercepts is specified by the model term Animal with 10 levels and a set of random slopes is specified by the model term age Animal translation invariance is achieved using str as str Animal age Animal us 2 id 10 The algorithm places the model terms specified using the argument form together in the processed random model here Animal followed by age Animal The variance structure s begins at the start of the first term specified in str and is expected to exactly span the whole set of terms given within the brackets The overall size of the variance model is checked against the total number of levels of these terms but the user must verify that the ordering is appropriate for matches the variance model specified In our example this random model generates a combined set of random effects from the individual animal intercepts ur ur uro and animal slopes ws us1 us10 as urs ul us The consolidated term then has variance structure of the form var urs
236. e second file only be inserted once into the merged file For example assume we want to merge two files containing data from sheep The first file has several records per animal containing fleece data from various years The sec ond file has one record per animal containing birth and weaning weights Merging with NODUP bwt wwt will copy these traits only once into the merged file ISKIP fields is used to exclude fields from the merged file It may be specified with 8 either or both input files SORT instructs ASReml to produce the merged file sorted on the key fields 8 Otherwise the records are return in the order they appear in the primary file 208 11 3 Examples The merging algorithm is briefly as follows The secondary file is read in skip fields being omitted and the records are sorted on the key fields If sorted output is required the primary file is also read in and sorted The primary file or its sorted form is then processed line by line and the merged file is produced Matching of key fields is on a string basis not a value basis If there are no key fields the files are merged by interleaving If there are multiple records with the same key these are severally matched That is if 3 lines of file 1 match 4 lines of file 2 the merged file will contain all 12 combinations 11 3 Examples Key fields have different names IMERGE filel KEY keyla keylb WITH file2 KEY key2a key2b TO newfile Key fields have commo
237. e section into independent subsec tions with subsections having common variance parameters see Section tae existing Ts is used with the own variance model function to set the parameter types see Sections 7 7 7 and 7 7 3 existing USE t t is a compound model term component used elsewhere in the model allows this variance structure and its parameters to be the same as that used for t see Section 7 7 8 for an example e all parameters with the same letter in the structure are constrained to be equal e 1 9 a z and A Z are all unique so that 61 equalities can be specified O and indicate that the corresponding parameter is not related to any other parameter A colon generates a sequence that is a e is the same as abcde e putting as the first character in s makes the interpretation of codes absolute so that they apply across structures e putting as the first character in s indicates that numbers are for repeat counts A Z are equality codes and are not different from a z giving only 26 equalities In this case only represents unrelated to any other parameter Thus 3A2 is equivalent to AAA or Qaaa00 or BAAACD Some users might find the contractions appealing other users find an explicit definition less error prone Examples are presented in Table 7 5 Important This syntax is limited in that it cannot apply relationships to simple variance components random terms that do not have an explicit variance stru
238. e trimmed but empty rows in the middle of a block are kept Empty columns are ignored A single row of labels as the first non empty row in the block will be taken as column names Empty cells in this row will have default names C1 C2 etc assigned Missing values are commonly represented in ASReml data files by NA or ASReml will also recognise empty fields as missing values in csv x1s files 42 4 2 The data file 4 2 4 Binary format data files Conventions for binary files are as follows e binary files are read as unformatted Fortran binary in single precision if the filename has a bin or BIN extension Fortran binary data files are read in double precision if the filename has a dbl or DBL extension ASReml recognises the value 1e37 as a missing value in binary files Fortran binary in the above means all real bin or all double precision db1 vari ables mixed types that is integer and alphabetic binary representation of variables is not allowed in binary files binary files can only be used in conjunction with a pedigree file if the pedigree fields are coded in the binary file so that they correspond with the pedigree file this can be done using the SAVE option in ASReml to form the binary file see Table 5 5 or the identifiers are whole numbers less than 9 999 999 and the RECODE qualifier is specified see Table 5 5 43 5 Command file Reading the data 5 1 Introduction In the code box to
239. ear mixed models This example differs from the split plot example as it is unbalanced and so more care is required in assessing the significance of fixed effects The experiment was reported by Dempster et al 1984 and was designed to compare the effect of three doses of an experimental compound control low and high on the maternal performance of rats Thirty female rats dams were randomly split into three groups of 10 and each group randomly assigned to the three different doses All pups in each litter were weighed The litters differed in total size and in the numbers of males and females Thus the additional covariate littersize was included in the analysis The differential effect of the compound on male and female pups was also of interest Three litters had to be dropped from experiment which meant that one dose had only 7 dams The analysis must account for the presence of between dam variation but must also recognise the stratification of the experimental units pups within litters and that doses and littersize belong to the dam stratum Table 15 2 presents an indicative AOV decomposition for this experiment Table 15 2 Rat data AOV decomposition stratum decomposition type df or ne constant 1 F 1 dams dose F littersize F 1 dam R 27 dams pups sex F 1 dose sex F 2 error R The dose and littersize effects are tested against the residual dam variation while the re maining effects are tested against the r
240. ection 2 1 5 described partitioning the data observations into data sections to which sepa rate variance structures are applied There are three data sections in the fourth example on page 115 When variance structures are specified using dimensions rather than factor names idv 23 for section 1 idv 27 for section 2 in the example the data must be ordered into sections and the variance structures must be ordered to match the order of the sections in the data file It is usually more convenient to use a variable in the data file to identify sections within the data The data will be sorted internally by ASReml ie the data file does not need to be ordered in any particular way and the variance structures for sections can then be specified using the sat function for example residual sat section idv units for the simple example with 3 data sections where section is a new column in the data file to separate the data into the three sections units 1 23 24 50 and 51 70 The sat function shorthand for section at is new with Release 4 and performs several different tasks it tells ASReml that the variance structure for the residual error term is a direct sum structure see Section 2 1 5 where the different components of the direct sum apply to the different levels of the sectioning variable in the data file it prunes the levels for a section so that only the levels of factors defining the residual variance structure for that sect
241. ed performs a Regression Screen a form of all subsets regression For d model terms in the DENSE equations there are 27 1 possible submodels Since for d gt 8 24 1 is large the submodels explored are reduced by the parameters n and m so that only models with at least n default 1 terms but no more than m default 6 terms are considered The output see page 221 is a report to the asr file with a line for every submodel showing the sums of squares degrees of freedom and terms in the model There is a limit of d 20 model terms in the screen ASReml will not allow interactions to be included in the screened terms For example to identify which three of my set of 12 covariates best explain my dependent variable given the other terms in the model specify SCREEN 3 SMX 3 The number of models evaluated quickly increases with d but ASReml has an arbitrary limit of 900 submodels evaluated Use the DENSE qualifier to control which terms are screened The screen is conditional on all other terms those in the SPARSE equations being present modifies the format of the s1n file SLNFORM 1 prevents the sln file from being written SLNFORM 1 is TAB separated sln becomes _sln txt SLNFORM 2 is COMMA separated sln becomes _sln csv SLNFORM 3 is Ampersand separated sln becomes _sln tex Note that extra signifcant digits are reported when SLNFORM is set and expanded labelling of the levels in interactions is used becaus
242. ed max5000 O variance parameters max2500 2 special structures Final parameter values 2 0 Last line read was R Repl 0 0 0 0 Finished 23 Apr 2014 09 17 08 861 Error in variance header line R Repl 250 14 4 An example 5 A misspelt factor name in linear model After correcting the definition of variety we NIN Alliance Trial 1989 get the following abbreviated output Now variety A it has failed to parse the model line because id pid raw the model term Repl was declared as repl T P1 and so is unrecognised Changing Repl to row column repl or vice versa resolves this problem nin89 asd skip 1 yield mu variety IR Repl residual ari Row ar1 Col predict varierty Folder C Users Public ASRem1 Docs Manex4 ERR variety A QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Model term Repl is not valid recognised Fault Error reading model terms Last line read was Repl Currently defined structures COLS and LEVELS 1 variety 1 2 2 0 0 0 2 ad 1 1 1 0 1 0 3 pid 1 1 1 0 2 0 4 raw 1 1 1 0 3 0 5 repl 1 2 2 0 4 0 12 mu 0 1 8 0 1 0 ninerrdS variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS mu 0 variety 0 12 factors defined max5000 O variance parameters max2500 2 special structures Last line read was Repl Finished 23 Apr 2014 09 17 15 785 Error reading model terms 251 14 4 An example 6 Misspelt fact
243. ed a scaled Wald statistic together with an F approximation to its sampling distribution which they showed performed well in a range though limited in terms of the range of variance models available in ASReml of settings In the following we describe the facilities now available in ASReml for conducting inference concerning terms which are the in dense fixed effects model component of the general linear mixed model These facilities are not available for any terms in the sparse model These include facilities for computing two types of Wald F statistics and partial implementation of the Kenward and Roger adjustments 2 5 2 Incremental and conditional Wald F Statistics The basic tool for inference is the Wald statistic defined in equation 2 17 ASReml produces a test of fixed effects that reduces to an F statistic in special cases by dividing the Wald statistic constructed with l 0 by r the numerator degrees of freedom In this form it is possible to perform an approximate F test if we can deduce the denominator degrees of freedom However there are several ways L can be defined to construct a test for a particular model term two of which are available in ASReml These Wald F statistics are labelled F inc for incremental and F con for conditional respectively For balanced designs these Wald F statistics are numerically identical to the F statistics obtained from the standard analysis of variance The first method for computing Wald s
244. ed in detail in Section 7 11 1 7 2 1 Modelling a single variance structure over several model terms This facility was motivated by two considerations Typically the random effects from any two distinct model terms are uncorrelated However in some models one G structure may apply across several model terms Sometimes one also wishes to partition the random effects into sets with independent variance structures In ASReml we can accomplish these two models using the special variance model function str where the name str is for structure 112 7 2 Process to define a consolidated model term and str has the following general form str model term s variance structure s The m individual model terms generate the design matrices Z and effect vectors w of size b i 1 m and the v variance structure terms generate variance structures G of size b j 1 v The function str generates a combined model design matrix Z Z Zm and a combined effects vector ul u u of size be Lft b and the variance structure for ue is Ge Cja1G j for u and Ge to be conformable Xib Dy If v 1 then there is one variance structure associated with the combined set of effects and if v gt 1 we can partition u and G with ul us u and G G G and the effect vectors are independent of each other and the effects u have variance structure G A restriction with str is that the closing parenthesis must be on th
245. eee ae ea we 5 7 8 Setting relationships among variance structure parameters 7 8 1 Simple relationships among variance structure parameters 7 8 2 Fitting linear relationships among variance structure parameters 7 9 Ways to present initial values to ASReml 2 20004 7 9 1 Using templates to set parametric information associated with variance structures using tsv and msvfiles 2004 7 9 2 Using estimates from simpler models 20 7 10 Default variance structures in ASReml 2 00004 7 11 Variance model functions available in ASReml 7 11 1 Forming variance models from correlation models 7 11 2 Non singular variance matrices 2 02 0200004 7 11 3 Notes on the variance models 2 2 4 2 4 eee ed ewae de ced 7 14 Hives oe Rise o ee eo Re Re eS OR SO eee 7 11 5 Notes on power models 2 26 eh eb we eR Re eR ee ee 7 11 6 Notes on Factor Analytic models 0 0 7 12 Variance models available in ASReml 2 2 2 20 eee Command file Multivariate analysis 8 1 Inrodugctiok cak eee oe BERD EBLE EADS EES G 8 1 1 Repeated measures on rats 2 812 Wether tial data 2 21 chee deadedeved Gah aeeahans 8 2 Model specification 6 o ce oe RODE RRERAEH RHE HE OR RODGERS 8 3 Residual variance structures 6 kee Ye cee K eee EP ewe ee ee 8 8 3 1 Specifying multivariate variance structures in ASReml 8 4 introduction s v
246. een sections For example fitting the terms at region trial as random effects would allow the trials in region 1 to have a different variance component to those in region 2 Prediction in these cases is more complicated and has only been implemented for this specific case and the analagous region trial case The associated factors must occur together in this order for the prediction to give correct answers The ASSOCIATE effect with base averaging can usually be achieved with the PRESENT qualifier except when the factors have many levels so that the product of levels exceeds 2147 000 000 it fails in this case because the KEY for identifying the cells present is a simple combination of the levels and is stored as a normal 32bit integer However ASSOCIATE is preferred because it formally checks the association structure as well as allowing sequential averaging Two ASSOCIATE clauses may be specified for example PRED entry ASSOC family entry ASSOC reg loc trial ASAVE reg loc Only one member of an ASSOCIATE list may also appear in a PRESENT list If one member appears in the classify set only that member may appear in the PRESENT list For example yield region r idv region id family idv entry PREDICT entry ASSOCIATE family entry PRESENT entry region Association averaging is used to form the cells in the PRESENT table and PRESENT averaging is then applied 9 3 5 Complicated weighting with PRESENT Generally when
247. effects 5 86 correlated 16 terms multivariate 146 random terms 94 RCB 29 design 25 reading the data 31 46 Reduced animal model 157 relationships variance structure parameters 124 REML 1 12 16 REMLRT 16 repeated measures 1 270 reserved terms 90 Trait 90 100 a t r 97 abs v 90 97 and t r 90 97 at 97 at f n 90 97 cos v r 91 97 fac v y 90 97 fac v 90 97 g f n 98 giv f n 91 98 grm f n 91 h 98 i f 98 ide f 91 98 inv v r 91 98 1 f 98 leg v n 91 98 lin f 90 98 log v r 91 98 ma1 f 91 98 mal 91 98 mbf v 7r 91 mu 90 99 mv 90 99 out 99 p v n 99 pol v n 91 99 pow z p 0 99 qtl 100 s v k 100 sin v r 91 100 spl v k 90 100 sqrt v r 91 100 uni f k 100 uni f n 92 uni f 91 units 90 100 vect u 92 reserved words GRM 143 AINV 143 ANTE 1 142 CHOL 1 142 CORGH 139 FACV 1 142 FA 1 142 GIV 143 GRM 143 IDH 141 MAT 141 NRM 143 OWN 141 US 141 XFA 1 142 AEXP 141 AGAU 141 AR2 138 343 INDEX AR3 138 ARMA 139 AR 1 138 CIR 141 CORB 139 CORGB 139 CORU 139 DIAG 141 EXP 140 GAU 140 ID 138 IEUC 140 IEXP 140 IGAU 140 LVR 140 MA2 139 MA 1 139 SAR2 138 SAR 138 SPH 141 residual 29 error 5 86 likelihood 12 response 87 running the job 33 score 13 Score test 68 Segmentation fault 219 sep
248. efined with P In this case there is no difference between fitting nrm dam and id ide dam since there is no pedigree information on dams It is preferable to be explicit specify nrm dam when the relationship matrix is required and id ide dam in the G structure definition In this case PATH 1 2 and 3 were run in turn but in PATH 3 ASReml had trouble converging because in each iteration the unstructured us tag matrix is not positive definite and so ASReml uses a slower EM algorithm that keeps the estimates in the parameter space but the convergence is very slow Here is the convergence log for PATH 3 Notice 15358 singularities detected in design matrix 1 LogL 1543 55 S2 1 00000 18085 df 15 components restrained Notice US matrix updates modified 1 time s to keep them positive definite 2 LogL 1540 93 S2 1 00000 18085 df 15 components restrained Notice US matrix updates modified 1 time s to keep them positive definite 38 LogL 1538 34 S2 1 00000 18085 df 15 components restrained reported in the asr file 329 15 10 Multivariate animal genetics data Sheep 39 LogL 1538 33 40 LogL 1538 32 To avoid this problem in PATH 4 and 5 we use xfa2 and xfa3 structures These converge S2 S2 1 00000 1 00000 18085 df 18085 df 14 components restrained 15 components restrained much faster Here is the convergence log and resulting estimates for PATH 5 Notice ReStartValues taken from pcoopf4 r
249. eh e Nen e ne Eee 2 1 4 How to use this guide oaa aa CEES ee ee eS Brook 3 1 5 Getting assistance and the ASReml forum aoaaa aaa 3 1 6 Typographic CONVENTIONS lt lt c e ke REE EHR SEE RR ERK G 4 2 Some theory 5 2A The general linear mixed model 222 020000 5 2 1 1 Sigma parameterization of the linear mixed model 5 2 1 2 Partitioning the fixed and random model terms 6 2 1 3 G structure for the random model terms 6 2 1 4 Partitioning the residual error term 2 020 T 2 1 5 R structure for the residual error term aaa aaa aaa 7 2 1 6 Gamma parameterization for the linear mixed model 8 2L7 Farameter Woes ec cede bei dundee dt Gavdseeaeees 8 2 1 8 Variance structures for the random model terms 8 2 1 9 Variance models for terms with several factors 9 2 110 Direct product structures e so oes he ee Ee ee ew a 10 2 1 11 Direct products in R structures oc eee eee a 10 2 1 12 Direct products in G structures 2 lt s oc csa eee tapeet pad 11 2 1 13 Range of variance models for R and G structures 11 2 1 14 Combining variance models in R and G structures 12 a2 Be ne be eee Bee eee bee ee PAS ee oe 6 eee 12 2 2 1 Estimation of the variance parameters 12 2 2 2 Estimation prediction of the fixed and random effects 14 2 2 3 Use of the gamma parameterization
250. el brief description common usage term fixed random cos v r forms cosine from v with period r J ge f condition on factor variable f gt r J giv f n associates the nth giv G inverse with the factor J f grm f n associates the nth grm G with the factor f J gt f condition on factor variable f gt r J h f factor fis fitted Helmert constraints J ide f fits pedigree factor f without relationship matrix J inv vL 7r forms reciprocal of v r lt le f condition on factor variable f lt r lt leg v n forms n 1 Legendre polynomials of order 0 in vy tercept 1 linear n from the values in v the intercept polynomial is omitted if v is preceded by the negative sign lt f condition on factor variable f lt r lt log v r forms natural logarithm of v r mai f constructs MA1 design matrix for factor f J mai forms an MA1 design matrix from plot numbers J mbf v r is a factor derived from data factor v by using the y y MBF qualifier out n condition on observation n y out n t condition on record n trait t J v pol v n forms n 1 orthogonal polynomials of order 0 in tercept 1 linear n from the values in v the intercept polynomial is omitted if n is preceded by the negative sign pow x p o defines the covariable x 0 for use in the model where zx is a variable in the data p is a power and o is an offset qtl f p impute a covariable from marker ma
251. eld mu variety and ignored But then it was unable to read jp Repl the first line of the data file residual ari Row ar1 Col predict varierty row column nin89 asd slip 1 Folder C Users Public ASRem1 Docs Manex4 ERR QUALIFIERS SLIP 1 Warning Unrecognised qualifier at character 11 SLIP 1 Reading nin89 asd FREE FORMAT skipping 0 lines Univariate analysis of yield Notice Maybe you want A L qualifiers for this factor variety Error at field 1 variety of record 1 line 1 Since this is the first data record you may need to skip some header lines see SKIP or append the A qualifier to the definition of factor variety Fault Missing faulty SKIP or A needed for variety Last line read was variety id pid raw rep nloc yield lat long row column Currently defined structures COLS and LEVELS 1 variety 1 2 2 0 0 0 10 row 1 2 2 0 9 0 248 14 4 An example 11 column 1 2 2 0 10 0 12 mu 0 4 o 0 z 0 ninerr2 nin89 asd Model specification TERM LEVELS GAMMAS mu 0 variety 0 12 factors defined max5000 O variance parameters max2500 2 special structures Last line read was variety id pid raw rep nloc yield lat long row column Finished 23 Apr 2014 09 17 01 765 Missing faulty SKIP or A needed for variety 3 An incorrectly defined factor After correcting slip 1 to skip 1 we qin Alliance Trial 1989 get the following abbreviated output The variety problem is that variety is coded in
252. ements Normally predict points will be defined for all combinations of X and Y values This qualifier is required with optional argument 1 to specify the lists are to be taken in parallel The lists must be the same length if to be taken in parallel Be aware that adding two dimensional prediction points is likely to sub stantially slow iterations because the variance structure is dense and becomes larger For this reason ASReml will ignore the extra PVAL points unless either FINAL or GKRIGE are set to save processing time The GROUPFACTOR qualifier like SUBSET must appear on a line by itself after the data line and before the model line Its purpose is to define a factor t by merging levels of an existing factor v The syntax is GROUPFACTOR lt Group_factor gt lt Exist_factor gt lt new codes gt for example GROUPFACTOR Year YearLoc 1 112233344 forms a new factor Year with 4 levels from the existing factor YearLoc with 10 levels Alternatively Year could be formed by data transformation Year YearLoc set 11122333 4 4 L 2001 2002 2003 2004 IDLIMIT v JOIN is used when ASReml expands a residual statement like residual sat Site ar1 row ari col and the dimension of row or col is small The ari structure is changed to id Q When the number of rows columns is less than or equal to v the structure is set to ID instead of AR1 v has a default value of 4 and cannot be reset to less than 3 If the qualifier i
253. eml after reordering these terms to obtain the FOWN test s specified Several reruns may be needed to perform all FOWN tests specified e Any model terms in the FOWN lists which do not appear in the actual model are ignored without flagging an error e Any model terms which are omitted from FOWN statements are tested with the usual conditional test e If any model terms are listed twice only the first test is performed F con tests specified in FOWN statements are given model codes 0 P The FOWN statements are parsed by the routine that parses the model line and so accepts the same model syntax options Care should be taken to ensure term names are spelt exactly as they appear in the model is used to have the first random term included in the dense equations if it is a GRM GIV variance structure This will result in faster processing when the GRM inverse matrix is not sparse sets the number of inner iterations performed when a iteratively weighted least squares analysis is performed Inner iterations are iterations to es timate the effects in the linear model for the current set of variance parameters Outer iterations are the AI updates to the variance param eters The default is to perform 4 inner iterations in the first round and 2 in subsequent rounds of the outer iteration Set n to 2 or more to increase the number of inner iterations sets hardcopy graphics file type to HP GL An argument of 2 sets the hardcopy g
254. eml user interface is terse Most effort has been directed towards efficiency of the engine It normally operates in a batch mode Problem size depends on the sparsity of the mixed model equations and the size of your computer However models with 500 000 effects have been fitted successfully The compu tational efficiency of ASReml arises from using the Average Information REML procedure giving quadratic convergence and sparse matrix operations ASReml has been operational since March 1996 and is updated periodically 1 3 User Interface 1 2 Installation Installation instructions are distributed with the program If you require help with installa tion or licensing please email support asreml co uk 1 3 User Interface ASReml is essentially a batch program with some optional interactive features The typical sequence of operations when using ASReml is e Prepare the data typically using a spreadsheet or data base program e Export that data as an ASCII file for example export it as a csv comma separated values file from Excel e Prepare a job file with filename extension as e Run the job file with ASReml e Review the various output files e revise the job and re run it or e extract pertinent results for your report You need an ASCII editor to prepare input files and review and print output files Two commonly used editors are 1 3 1 ASReml W The ASReml W interface is a graphical tool allowing the user to edit pr
255. en Marker35 a new name because it is still also generated by the CYCLE unless it is modified to read ICYCLE 1 34 36 1000 After several cycles we might have Marker screen Genotype yield PhenData txt ASSIGN MSET R21 R35 R376 R645 R879 ICYCLE 1 1000 IMBF mbf Genotype MLIB Marker I csv RENAME Marker I FOR MSET DO MBF mbf Genotype MLIB Marke S csv RENAME S yld mu r MSET Marker I 10 4 4 Order of Substitution The substitution order is ASSIGN FOR CYCLE TP command line arguments and finally the interactive prompt 10 5 Performance issues 10 5 1 Multiple processors ASReml has not been configured for parallel processing Performance is downgraded if it tries to use two processors simultaneously as it wastes time swapping between processors 10 5 2 Slow processes The processing time is related to the size of the model the complexity of the variance model in particular the number of parameters the sparsity of the mixed model equations the 205 10 5 Performance issues amount of data being processed Typically the first iteration take longer than other iterations The extra work in the first iteration is to determine an optimum equation order for processing the model see EQORDER The extra processes in the last iteration are optional They include e calculation of predicted values see PREDICT statement e calculation of denominator degrees of freedom see DDF e calculation of outlier stati
256. eness can be removed by considering 0 lt a lt 5 and gt 0 or by considering 0 lt a lt m and either 0 lt 6 lt 1 or gt 1 With A 2 isotropy occurs when 1 and then the rotation angle qa is irrelevant correlation contours are circles compared with ellipses in general With A 1 correlation contours are diamonds 7 11 5 Notes on power models Power models rely on the definition of distance for the associated term for example the distance between time points in a one dimensional longitudinal analysis the spatial distance between plot coordinates in a two dimensional field trial analysis Information for determining distances is supplied either implicitly by applying the model to the fac of the coordinate variables or explicitly with the COORD qualifier For one dimensional cases either expv fac X where X contains the positions expv Trait COORD x where x is a vector of positions In two directions IEXP IGAU IEUC AEXP AGAU MATn For a G structure relating to the model term fac x y use fac x y For example yield mu r ieucv fac xcoord ycoord INIT 0 7 1 3 7 11 6 Notes on Factor Analytic models FAk FACVk and XFAk are different parameterizations of the factor analytic model in which is modelled as IT Y where T is a matrix of loadings on the covariance scale and W is a diagonal vector of specific variances See Smith et al 2001 and Thompson et al 20
257. ensitive most qualifier identifiers may be truncated to 3 characters 45 5 4 Specifying and reading the data 5 3 Title line The first 40 characters of the first nonblank NIN Alliance Trial 1989 text line in an ASReml command file are taken variety 1A as a title for the job Use this to document id the analysis for future reference An optional pid qualifier line see section 10 3 may precede the title line It is recognised by the presence of the qualifier prefix letter Therefore the title MUST NOT include an exclamation mark 5 4 Specifying and reading the data Typically a data record consists of all the information pertaining to an experimental unit plot animal assessment Data field definitions manage the process of converting the fields as they appear in the data file to the internal form needed by ASReml This involves mapping coding factors general transformations skipping fields and discarding unnecessary records If the necessary information is not in a single file the MERGE facility see Chapter 11 may help Variables are defined immediately after the job title These definitions indicate how each field in the data file is handled as it is read into ASReml Transformations can be used to create additional variables Users can explicity nominate how many are read with the READ qualifier described in Table 5 5 No more than 10 000 variables may be read or formed Data field definit
258. ents are idv repl and ar1 row see Table 7 2 A general form for a covariance component is umfname component qualifiers where qualifiers is an optional list of one or more qualifiers to be applied to the variance structure being defined A simple example of this is the extension of idv repl to idv repl INIT 0 65 which specifies an IDV structure of dimension 4 for replicates NIN example 2 with an initial variance of 0 65 for the variance component associated with replicates under the sigma parameterization or an initial variance component ratio of 0 65 for the variance component ratio associated with replicates under the gamma parameterization Note that a variance structure of a particular dimension w say can been specified directly as umfname w qualifiers For example idv 3 defines the IDV variance structure of dimension 3 that is o7I and idv 3 INIT 1 1 specifies an initial value of 1 1 for the associated variance component under the sigma parameterization or variance ratio under the gamma parameterization Likewise ar1 10 specifies an autoregressive correlation structure AR1 of order 10 and ari 10 INIT 0 4 specifies this same structure with an initial autocorrelation parameter of 0 4 A simple variance component o would be defined as idv 1 Note that an integer value for the first argument is only valid in variance model functions associated with residual terms and str The full list of variance model functio
259. er if labels are not abbreviated If abbreviations are used then they need to be chosen to avoid confusion if the model is written over several lines all but the final line must end with a COMMA or to indicate that the list is continued In Tables 6 1 and 6 2 the arguments in model term functions are represented by the following symbols f the label of a data variable defined as a model factor k n an integer number r areal number t a model term label includes data variables v y the label of a data variable Where a model term takes another model term as an argument the argument may occa sionally need to be predefined This is done by including the argument model term in the 88 6 2 Specifying model formulae in ASReml model term list with a leading which will cause the term to be defined but not fitted For example Trait male Trait female and Trait female 89 6 2 Specifying model formulae in ASReml Table 6 1 Summary of reserved words operators and functions model brief description common usage term fixed random reserved mu the constant term or intercept J terms mv a term to estimate missing values J Trait multivariate counterpart to mu J units forms a factor with a level for each experimental J unit operators Or placed between labels to specify an interaction J J forms nested expansion Section 6 5 y y forms factorial expansion Section 6 5 J J p
260. eration the final iteration in which prediction is performed By default factors are predicted at each level simple covariates are predicted at their overall mean and covariates used as a basis for splines or orthogonal polynomials are predicted at their design points Covariates grouped into a single term using G qualifier page 48 are treated as covariates Prediction at particular values of a covariate or particular levels of a factor is achieved by listing the levels values after the variate factor name Where there is a sequence of values use the notation ab n to represent the sequence of values from a to n with step size b a The default stepsize is 1 in which case b may be omitted A colon may replace the ellipsis An increasing sequence is assumed When giving particular values for factors the default is to use the coded level 1 n rather than the label alphabetical or integer To use the label precede it with a quote Where a large number of values must be given they can be supplied in a separate file and the filename specified in quotes The file form does not allow label coding or sequences See the discussion of PRWTS for an example Model terms mv and units are always ignored Model terms which are functions such as at and pol sin spl including those defined using CONTRAST GROUP SUBGROUP SUBSET and MBF are implicitly de fined through their base variables and can not be direct
261. erge faster Note that this option is not available with the nrm or grm functions Note also that the EM update is applied to all of the variance parameters in the particular US model and cannot be applied to only a subset of them EM updates can be slow to converge and an alternative parameterization using a factor analysis may converge faster and give a more parsimonious parameterization It may be that there is no variance associated with some levels of the matrix in which case the dimension of the matrix should be reduced 7 7 5 New R4 Initial values INIT v Prior to Release 4 it was necessary to supply initial values for variance structure parameters except for the default IDV variance structure for a random model term where the default initial variance ratio parameter value was 0 1 In Release 4 it is not generally necessary to supply initial values In this release ASReml provides starting or initial values for vari ance structure parameters based on knowledge of the phenotypic variance of the response Occasionally these initial values are not adequate and more appropriate values will need to be supplied by the user In this case the user may have good prior information that can be utilized in forming initial values There are several ways to provide initial values The particular choice will depend on how many values and other variance model function qualifiers are to be specified The initial values can be provided in a number of wa
262. erms in the mixed model NIN Alliance trial 1989 variety A column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r idv repl residual idv units predict variety r qualifier tells ASReml to fit the terms that follow as random effects 3 4 7 There are two variance structures to be spec ified and two variance components to be es timated The first structure is for the repli cate repl effects These effects are IID dis tributed and idv rep1l denotes this and es timates one variance component associated with these effects The other is associated with the residual effects which are again as sumed to be IID distributed This is formally specified here by the line Variance structures NIN Alliance trial 1989 variety A column 11 nin89 asd skip 1 tabulate yield variety yield mu variety r idv repl residual idv units predict variety residual idv units where residual is the name of the directive that specifies the vari ance structure for the residuals and units is the reserved word specifiying a factor with a level for every experimental unit The default variance structure is always uncorrelated effects with a common variance and so idv rep1 and idv units can be reduced to simply repl and units See Chapter 7 for a lengthy discussion on variance modelling in ASReml 32 3 5 Running the job 3 4 8 Prediction Predict statements appear after the model
263. ers are advised to parse the xm1 file in redeveloping code to 222 13 3 Key output files handle the changes with the new release 13 3 2 The sin file The sln file contains estimates of the fixed and random effects with their standard errors in an array with four columns ordered as factor_name level estimate standard _error Note that the error presented for the estimate of a random effect is the square root of the prediction error variance In a genetic context for example where a relationship matrix A is involved the accuracy is 1 athe where s is the standard error reported with the BLUP u for the ith individual f is the inbreeding coefficient reported when DIAG qualifier is given on a pedigree file line 1 f is the diagonal element of A and o is the genetic variance The s1n file can easily be read into a GENSTAT spreadsheet or an S PLUS data frame Below is a truncated copy of nin89a sln Note that e the order of some terms may differ from the order in which those terms were specified in the model statement e the missing value estimates appear at the end of the file in this example e the format of the file can be changed by specifying the SLNFORM qualifier In particular more significant digits will be reported e use of the OUTLIER qualifier will generate extra columns containing the outlier statistics described on page 17 Model_Term Level Effect seEffect variety LANCER 0 000 0 000 variety est
264. es in the explanatory variable is large or if units are measured at different times The data we use was originally reported by Draper and Smith 1998 ex24N p559 and has recently been reanalysed by Pinheiro and Bates 2000 p338 The data are displayed in Figure 15 12 and are the trunk circumferences in millimetres of each of 5 trees taken at 7 times All trees were measured at the same time so that the data are balanced The aim of the study is unclear though both previous analyses involved modelling the overall growth curve accounting for the obvious variation in both level and shape between trees Pinheiro and Bates 2000 used a nonlinear mixed effects modelling approach in which they modelled the growth curves by a three parameter logistic function of age given by E b1 1 exp x 2 93 where y is the trunk circumference x is the tree age in days since December 31 1968 is the asymptotic height 2 is the inflection point or the time at which the tree reaches 0 5 3 is the time elapsed between trees reaching half and about 3 4 of 1 y 15 11 this is the orange data circ age Tree 4 jn 5 A a N o Figure 15 12 Trellis plot of trunk circumference for each tree The datafile consists of 5 columns viz Tree a factor with 5 levels age tree age in days since 31st December 1968 circ the trunk circumference and season The last column season 309 15 9 Balanced longitudinal data Random coe
265. esidual within litter variation The ASReml input to achieve this analysis is presented below Rats example dose 3 A sex 2 A littersize dam 27 pup 18 weight rats asd DOPATH 1 Change DOPATH argument to select each PATH PATH 1 weight mu littersize dose sex dose sex r idv dam 273 15 3 Unbalanced nested design Rats residual idv units PATH 2 weight mu out 66 littersize dose sex dose sex r idv dam residual idv units PATH 3 weight mu littersize dose sex r idv dam residual idv units PATH 4 weight mu littersize dose sex residual idv units The input file contains an example of the use of the DOPATH qualifier Its argument specifies which part to execute We will discuss the models in the two parts It also includes the FCON qualifier to request conditional Wald F statistics Abbreviated output from part 1 is presented below 1 LogL 74 2174 S2 0 19670 315 df 0 1000 1 000 2 LogL 79 1579 S2 0 18751 315 df 0 1488 1 000 3 LogL 83 9408 S2 0 17755 315 df 0 2446 1 000 4 LogL 86 8093 S2 0 16903 315 df 0 4254 1 000 5 LogL 87 2249 S2 0 16594 315 df 0 5521 1 000 6 LogL 87 2398 S2 0 16532 315 df 0 5854 1 000 7 LogL 87 2398 S2 0 16530 315 df 0 5867 1 000 8 LogL 87 2398 S2 0 16530 315 df 0 5867 1 000 Final parameter values 0 5867 Results from analysis of weight Akaike Information Criterion 170 48 assuming 2 parameters Bayesian Information Criterion 162 97 App
266. estimation biases can be over 50 e g Bres low and Lin 1995 Goldstein and Rasbash 1996 Rodriguez and Goldman 2001 Wadding ton et al 1994 For other GLMMs PQL has been reported to perform adequately e g Breslow 2003 McCulloch and Searle 2001 also discuss the use of PQL for GLMMs The performance of PQL in other respects such as for hypothesis testing has received much less attention and most studies into PQL have examined only relatively simple GLMMs Anecdotal evidence suggests that this technique may give misleading results in certain situ ations Therefore we cannot recommend the use of this technique for general use and it is included in the current version of ASReml for advanced users If this technique is used we recommend the use of cross validatory assessment such as applying PQL to simulated data from the same design Millar and Willis 1999 The standard GLM Analysis of Deviance AQD should not be used when there are random terms in the model as the variance components are reestimated for each submodel 6 9 Missing values 6 9 1 Missing values in the response It is sometimes computationally convenient to estimate NIN Alliance Trial 1989 missing values for example in spatial analysis of regular variety arrays see example 3a in Section 7 5 Missing values are estimated if the model term mv is included in the model row 22 mv is formally shown here in the sparse fired effects to column 11 nin89 asd
267. example A will be interpreted as idv A A B will be interpreted as idv A B A B C will be interpreted as idv A B C sat Expt 1 A will be interpreted as sat Expt 1 idv A sat Expt 1 A B will be interpreted as sat Expt 1 idv A B sat Expt 1 A B C will be interpreted as sat Expt 1 idv A B C In these cases the model term can be followed by an initial value and or a parametric qualifier for example A 1 GP is interpreted as idv A INIT 1 GP There is always a residual error term in the model but if it is not explicitly specified it is assumed to be idv units for univariate data and id units us Trait for multivariate data If the consolidated model term definition is incomplete that is if some but not all of the components have a variance model function specified the variance model functions idv or id will be applied to these components depending on the variance model functions specified For example idv A B will be interpreted as idv A id B id A B will be interpreted as id A idv B id A B C will be interpreted as id A idv B C idv A B C will be interpreted as idv A id B C Similarly at the residual level as sat cannot be converted into a variance function sat Expt 1 id A B will be interpreted as sat Expt 1 id A idv B 139 7 11 Variance model functions available in ASReml sat Expt 1 id A B C will be interpreted as sat Expt 1 id A idv B C
268. f the data and by fitting simpler models 244 14 2 Common problems e software problems There are many options in ASReml and some combinations have not been tested Some jobs are too big When all else fails describe your problem to the forum http www vsni co uk forum or email support vsni cu uk There are over 6000 one line diagnostic messages that ASReml may print in the asr file Hopefully most are self explanatory but it will always be helpful to recognise whether they relate to parsing the input file or raise some other issue See Section 14 5 for more information on these messages 14 2 Common problems Common problems in coding ASReml are as follows e a variable name has been misspelt variable names are case sensitive e a model term has been misspelt model term functions and reserved words mu Trait mv units are case sensitive the data file name is misspelt or the wrong path has been given enclose the pathname in quotes if it includes embedded blanks a qualifier has been misspelt or is in the wrong place failure to use commas appropriately in model definition lines e there is an error in the predict statement e model term mv not included in the model when there are missing values in the data and the model fitted assumes all data is present e there is an inconsistency between the variance header line and the structure definition lines presented original syntax e there is an error in
269. factor where the n variables are the marker states for n markers in a linkage group in map order and coded 1 1 backcross or 1 0 1 F2 design s length n 1 should be the n marker positions relative to a left telomere position of zero and an extra value being the length of the linkage group the position of the right telomere The length right telomere may be omitted in which case the last marker is taken as the end of the linkage group The positions may be given in Morgans or centiMorgans if the length is greater than 10 it will be divided by 100 to convert to Morgans The recombination rate between markers at sz and sp L is left and R is right of some putative QTL at Q is OLR 1 e sr s1 2 Consequently for 3 markers L Q R OLR Ore Par 20LQOQR The expected value of a missing marker at Q between L and R depends on the marker states at L and R E q 1 1 1 Ozo 6gr 1 OLR E q 1 1 Oor 9r Orr Eal 1 1 Cro bar OiR and E q 1 1 1 o Oza Let Ax E ql1 1 E g 1 1 2 2226 Gro OLR L OLR and AR E q 1 1 E q 1 1 2 91 91q 1 26gR OLrR 1 OLR Then E qlzr R Arz AR R Where there is no marker on one side E q er 1 69r r Ogr R trR 1 209r This qualifier facilitates the QTL method discussed in Gilmour 2007 58 5 5 Transforming the data IDOM A is used to form domin
270. fficients and cubic smoothing splines Oranges Results from analysis of circ Akaike Information Criterion 186 86 assuming 6 parameters Bayesian Information Criterion 195 65 Model_Term Gamma Sigma Sigma SE C idv spl age 7 IDV_V 6 2 17100 12 2471 1 09 OP Residual SCA_V 35 1 000000 5 64123 1 12 OP us 2 id Tree 10 effects 2 UV 1 1 5 61715 31 6877 1 26 0P 2 US_C 2 1 0 124098E 01 0 700063E 01 0 85 0P 2 US_V 2 2 0 108290E 03 0 610886E 03 1 41 OP idv spl age 7 Tree ID_V 1 1 38313 7 80258 1 48 OP Covariance Variance Correlation Matrix US Tree 31 69 0 5032 0 7001E 01 0 6109E 03 Wald F statistics Source of Variation NumDF DenDF F inc P ine 9 mu 1 2 4 169 87 0 006 3 age 1 2 4 92 78 Oii 5 Season 1 8 8 108 49 lt 001 200 600 1000 1400 I L 5 Marginal Trunk circumference mm 100 Zo Lo ZA Wa 50 a ZA 200 600 1000 1400 Time since December 31 1968 Days Figure 15 15 Trellis plot of trunk circumference for each tree at sample dates adjusted for season effects with fitted profiles across time and confidence intervals Figure 15 15 presents the predicted growth over time for individual trees and a marginal prediction for trees with approximate confidence intervals 2 standard The conclusions from this analysis are quite different from those obtained by the nonlinear mixed effects 315 15 9 Balanced longitudinal data Random coefficients and cubic smoothing
271. fficients and cubic smoothing splines Oranges was added after noting that tree age spans several years and if converted to day of year measurements were taken in either Spring April May or Autumn September October First we demonstrate the fitting of a cubic spline in ASReml by restricting the dataset to tree 1 only The model includes the intercept and linear regression of trunk circumference on age and an additional random term spl age 7 which instructs ASReml to include a random term with a special design matrix with 7 2 5 columns which relate to the vector whose elements i 2 6 are the second differentials of the cubic spline at the knot points The second differentials of a natural cubic spline are zero at the first and last knot points Green and Silverman 1994 The ASReml job is this is the orange data for tree 1 seq record number is not used Tree 5 age 118 484 664 1004 1231 1372 1582 circ season L Spring Autumn orange asd skip 1 filter 2 select 1 ISPLINE spl age 7 118 484 664 1004 1231 1372 1582 PVAL age 150 200 1500 circ mu age r idv spl age 7 residual idv units predict age Note that the data for tree 1 has been selected by use of the filter and select qualifiers Also note the use of PVAL so that the spline curve is properly predicted at the additional nominated points These additional data points are required for ASReml to form the de sign matrix to properly interpolate the cu
272. files ooa aaa 5 8 Job control qualifiers aaa ee eee Ee Command file Specifying the terms in the mixed model 6 1 IMrod chOn o ec Bee eo Se Se EERE ER a A Re EE ea 6 2 Specifying model formulae in ASReml 2 000004 GAl General rules ce ee AER SRR ASE ERE ER 2 eG o oe ee Pe ee Se CES SS eRe eee we ars 6 3 Fixed terms in the model 2 osa ccc 884622886028 tu 48 es vi GaL PN Ted TER o sios So od BRON SHEEN OS BEES Goo Site Te Ts o lt oaie wa RE EE KEG REG ES 6 4 Random and residual terms in the variance component model 6 5 Interactions and conditional factors ooa a 004 0s 0O51 Interactions o co o Rk eee ee eh a eh g p a eP Fees E E E E 053 Conditional factos e e s oes b oe eos piedi g p Re Sew Ee 6 5 4 Associated Factors aoaaa a 6 6 Alphabetic list of model functions oa a a eee eee 6 7 WEEDS s saca e SE Ew Oe a ae See 6 8 Generalized Linear Mixed Models oaaae 6 8 1 Generalized Linear Mixed Models 2 4 6 9 Missing valties ce we ke we eR ee BSE oe ee ee Ee eS ee 6 9 1 Missing values in the response 2 2 20004 6 9 2 Missing values in the explanatory variables 6 10 Some technical details about model fitting in ASReml 6 10 1 Sparse versus dense o o s cec aon ke Pe ee ew 6 10 2 Ordering of terms in ASReml 0 0 000004 6 10 3 Aliassing and singularities 0 0 2 0000 6 10 4 Examples of alassing o o co eee
273. forming a prediction table it is necessary to average over or ignore some dimensions of the hyper table By default ASReml uses equal weights 1 f for a factor with f levels More complicated weighting is achieved by using the AVERAGE qualifier to set specific unequal weights for each level of a factor However sometimes the weights need to be defined with respect to two or more factors The simplest case is when there are missing cells and weighting is equal for those cells in a multiway table that are present achieved by using the PRESENT qualifier This is further generalized by allowing the user to supply the weights to be used by the PRESENT machinery via the PRWTS qualifier The user specifies the factors in the table of weights with the PRESENT statement and then gives the table of weights using the PRWTS qualifier There may only be one PRESENT qualifier on the predict line when PRWTS is specified The order of factors in the tables of weights must correspond to the order in the PRESENT list with later factors nested within preceding factors The weights may be given in a separate file if a filename in quotes is given as the argument to PRWTS Check the output to ensure that the values in the tables 188 9 3 Prediction of weights are applied in the correct order ASReml may transpose the table of weights to match the order it needs for processing When weights are supplied in a separate file two layouts are allowed
274. from 1 to 30 across replicates see Table 15 6 The terms in the linear model are therefore simply RowBlk ColBlk Additional fields row and column indicate the spatial layout of the plots The ASReml input file is presented below Three models have been fitted to these data The lattice analysis is included for comparison in PATH 3 In PATH 1 we use the separable first 286 15 6 Spatial analysis of a field experiment Barley order autoregressive model to model the variance structure of the plot errors Gilmour et al 1997 suggest this is often a useful model to commence the spatial modelling process The form of the variance matrix for the plot errors R structure is given by PE o D 8E 15 5 where X and X are 15 x 15 and 10 x 10 matrix functions of the column c and row r autoregressive parameters respectively Gilmour et al 1997 recommend revision of the current spatial model based on the use of diagnostics such as the sample variogram of the residuals from the current model This diagnostic and a summary of row and column residual trends are produced by default with graphical versions of ASReml when a spatial model has been fitted to the errors It can be suppressed by the use of the n option on the command line We have produced the following plots by use of the EPS qualifier The RENAME ARG 1 2 3 qualifiers in conjunctio with DOPART 1 cause ASReml to run all three parts appending the part number to the outpu
275. g sigma s gamma s under each parameterization for the series of NIN data examples oaa eee ee eR EEE EES 123 Variance model function qualifiers available in ASReml 125 Examples of constraining variance parameters in ASReml 126 Details of the variance models available in ASReml 147 List of pedigree file qualifiers 02002022 ee 159 List or prediction qualifiers 2 cb doh ea ee eee eA 181 List OF predict plot options gt s Ne eee Gee eee ON we ede 183 Trials classified by region and location 2 000 186 Trok AWN o e Be a ee ee ee a Be ee ee 186 xiii 9 5 10 1 10 2 10 3 13 1 13 2 14 1 14 2 14 3 15 1 15 2 15 3 15 4 15 5 15 6 15 7 15 8 15 9 15 10 15 11 15 12 15 13 15 14 15 15 Location means s ee wR ER Re eS we eS we ee eS 186 Command line options 44 66 4 be dee nba bee Ee ESSE DES 195 The use of arguments in ASReml 00 00 eee eee 200 High level qualifiers so eraco 64 bb ee de gadna ERG eG 201 Inet of MERGE Gugino lt a 444 sereu sokea eee ee ee eo 208 Simmary of ASKeml output flee 65 cee aa Gee Gee ee wes 217 ASReml output objects and where to find them 240 Some information messages and comments 2 2 000 255 List of warning messages and likely meaning s 256 Alphabetical list of error messages and probable cause s remedies 259 A split plot field trial of
276. g terms With sum to zero constraints a missing treatment level will generate a singularity but in the first coefficient rather than in the coefficient corresponding to the missing treatment In this case the coefficients will not be readily interpretable When interacting constrained factors all cells in the cross tabulation should have data fac v fac v forms a factor with a level for each value of x and any additional points fac v y inserted as discussed with the qualifiers PPOINTS and PVAL fac v y forms a factor with a level for each combination of values from v and y The values are reported in the res file 97 6 6 Alphabetic list of model functions Table 6 2 Alphabetic list of model functions and descriptions model function action giv f n g f n grm f n h f ide f i f inv v 7r leg v n lin f 1 f log v r mal mai f mbf f c mbf f associates the nth giv G inverse with the factor This is used when there is a known except for scale G structure other than the additive inverse genetic relationship matrix The G inverse is supplied in a file whose name has the file extension giv described in Section 8 9 grm and giv are formally equivalent with grm standing for generalized relationship Matrix h f requests ASReml to fit the model term for factor f using Helmert constraints Neither Sum to zero nor Helmert constraints generate interpretable effects if sing
277. given by e y WB R Py 2 23 It follows that E e 0 var R WCW The matrix WC W under the sigma parameterization is the so called extended hat matrix ASReml includes the o in the hat matrix under the gamma parameterization It is the linear mixed effects model analogue of X X X X for ordinary linear models The diagonal elements are returned in the fourth field of the yht file The OQUTLIER qualifier invokes a partial implementation of research by Alison Smith Ari Verbyla and Brian Cullis With this qualifier ASReml writes e G u and G7 u diag G G C 7G to the sln file R e and R e diag R R WC W R to the yht file e and copies lines where the last ratio exceeds 3 in magnitude to the res file and reports the number of such lines to the asr file e It has not been validated for multivariate models or XFA models with zero Ws 17 2 5 Inference Fixed effects The variogram has been suggested as a useful diagnostic for assisting with the identification of appropriate variance models for spatial data Cressie 1991 Gilmour et al 1997 demonstrate its usefulness for the identification of the sources of variation in the analysis of field experiments If the elements of the data vector and hence the residual vector are indexed by a vector of spatial coordinates s 2 1 n then the ordinates of the sample variogram are given by vi 5 l ls s
278. gression models associating u a vector of f factor effects with v a vector of m regression effects through the model u Mv where the matrix M contains m regressor variables for each of the f levels of the factor Direct fitting of the regression effects is facilitated by using the my basis function mbf function associating the regressor variables to the levels of the factor essentially fitting ZMv where Z is the design matrix linking observations to the levels of the factor But if m is much bigger than f it is more computational efficient to fit an equivalent model Zu with a variance structure for u based on MM ASReml can read the matrix M associated with a factor and group of regressor variables from a grr file construct a GRM matrix G MM s fit the equivalent model and report both factor and regressor predictions One common case of this model is when u represents genotype effects the regressors represent SNP marker counts typically 0 1 2 and v are marker effects The grr file is specified after any pedigree file and before the data file with any other GRM files There may only be one grr file It is assumed to contain a row for each level of the factor each row containing m regressor values Optionally the factor level name associated with the i th row can be included before the relevant regressor values Also a heading row might include a name for each field regressor variable Superfluous fields before the factor or regress
279. he initial values of a US structure ASReml tests the adequacy of the reduced parameterization causes ASReml to report a general description of the distribution of the data variables and factors and simple correlations among the variables for those records included in the analysis This summary will ignore data records for which the variable being analysed is missing unless a multivariate analysis is requested or missing values are being estimated The information is written to the ass file is used to plot the transformed data Use X to specify the z variable 1Y to specify the y variable and G to specify a grouping variable JOIN joins the points when the z value increases between consecutive records The grouping variable may be omitted for a simple scatter plot Omit Y y produce a histogram of the x variable For example X age Y height G sex Note that the graphs are only produced in the graphics versions of AS Reml Section 10 3 68 5 8 Job control qualifiers Table 5 3 List of commonly used job control qualifiers qualifier action For multivariate repeated measures data ASReml can plot the response profiles if the first response is nominated with the Y qualifier and the fol lowing analysis is of the multivariate data ASReml assumes the response variables are in contiguous fields and are equally spaced For example Response profiles Treatment A Yi Y2 Y3 Y4 Y5 rat asd Y Y1 G Treatment JOIN
280. he range of the data or ASReml will modify them before they are applied If you choose to spread them over several lines use a comma at the end of incomplete lines so that ASReml will to continue reading values from the next line of input If the explicit points do not adequately cover the range a message is printed and the values are rescaled unless NOCHECK is also specified Inadequate coverage is when the explicit range does not cover the midpoint of the actual range See KNOTS PVAL and SCALE reduces the update step sizes of the variance parameters The default value is the reciprocal of the square root of MAXIT It may be set between 0 01 and 1 0 The step size is increased towards 1 each iteration Starting at 0 1 the sequence would be 0 1 0 32 0 56 1 This option is useful when you do not have good starting values especially in multivariate analyses forms a new group factor t derived from an existing group factor v by selecting a subset p of its variables A subgroup factor may not be used in a PREDICT or TABULATE directive 73 5 8 Job control qualifiers Table 5 4 List of occasionally used job control qualifiers qualifier action SUBSET t v p WMF forms a new factor t derived from an existing factor v by selecting a subset p of its levels Missing values are transmitted as missing and records whose level is zero are transmitted as zero The qualifier occupies its own line after the dataf
281. he required range The voltage of 64 regulators was set at 10 setting stations setstat between 4 and 8 regulators were set at each station The regulators were each tested at four testing stations teststat The ASReml input file is presented below Voltage data teststat 4 4 testing stations tested each regulator setstat A 10 setting stations each set 4 8 regulators regulator 8 regulators numbered within setting stations voltage voltage asd skip 1 voltage mu r idv setstat idv setstat regulator idv teststat idv setstat teststat residual idv units The factor regulator numbers the regulators within each setting station Thus the term setstat regulator fits an effect for each regulator while the other terms examine the effects of the setting and testing stations and possible interaction The abbreviated output 276 15 4 Source of variability in unbalanced data Volts is given below LogL 188 604 S52 0 67074E 01 255 df LogL 199 530 S52 0 59303E 01 255 df LogL 203 007 S52 0 52814E 01 255 df LogL 203 240 S2 0 51278E 01 255 df LogL 203 242 S52 0 51141E 01 25655 d LogL 203 242 S2 0 51140E 01 255 df Model_Term Gamma Sigma Sigma SE C idv TestStat IDV_V 4 0 642752E 01 0 328704E 02 0 98 0P idv Setstat IDV_V 10 0 233416 0 119369E 01 1 35 OF idv TestStat Setstat IDV_V 40 0 101193E 06 0 517501E 08 0 00 OB idv Regulator Setstat IDV_V 80 0 601817 0 307770E 01 3 64 OP idv units 256 effects Residual
282. he spatial analyse 45796 58 23 8842 1917 442 c f 8061 808 4 03145 1999 729 are similar but slightly lower reflecting the gain in accuracy from the spatial analysis For further reading see Smith et al 2001 2005 15 7 Unreplicated early generation variety trial Wheat To further illustrate the approaches presented in the previous section we consider an un replicated field experiment conducted at Tullibigeal situated in south western NSW The 292 15 7 Unreplicated early generation variety trial Wheat trial was an S1 early stage wheat variety evaluation trial and consisted of 525 test lines which were randomly assigned to plots in a 67 by 10 array There was a check plot variety every 6 plots within each column That is the check variety was sown on rows 1 7 13 67 of each column This variety was numbered 526 A further 6 replicated commercially available varieties numbered 527 to 532 were also randomly assigned to plots with between 3 to 5 plots of each The aim of these trials is to identify and retain the top say 20 of lines for further testing Cullis et al 1989 considered the analysis of early generation variety trials and presented a one dimensional spatial analysis which was an extension of the approach developed by Gleeson and Cullis 1987 The test line effects are assumed random while the check variety effects are considered fixed This may not be sensible or justifiable for most trials and can lead to inc
283. heads of other systems This guide has 15 chapters Chapter 1 introduces ASReml and describes the conventions used in this guide Chapter 2 outlines some basic theory while Chapter 3 presents an overview of the syntax of ASReml through a simple example Data file preparation is described in Chapter 4 and Chapter 5 describes how to input data into ASReml Chapters 6 and 7 are key chapters which present the syntax for specifying the linear model and the variance models for the random effects in the linear mixed model Chapters 8 and 8 3 1 describe special commands for multivariate and genetic analyses respectively Chapter 9 deals with prediction of linear functions of fixed and random effects in the linear mixed model Chapter 10 demonstrates running an ASReml job Chapter 11 describes the merging of data files and Chapter 12 presents the syntax for forming functions of variance components Chapter 13 gives a detailed explanation of the output files Chapter 14 gives an overview of the error messages generated in ASReml and some guidance as to their probable cause The guide concludes with the most extensive chapter which presents the analysis of a range of data examples In brief the improvements in Release 4 include developments associated with input include generating initial values generating a template to allow an alternative way of presenting parametric information associated with variance structures new facilities for reading in data files and
284. hen we invoke marginality considerations The issue of marginality between terms in a linear mixed model has been discussed in much detail by Nelder 1977 In this paper Nelder defines marginality for terms in a factorial linear model with qualitative factors but later Nelder 1994 extended this concept to functional marginality for terms involving quantitative covariates and for mixed terms which involve an interaction between quantitative covariates and qualitative factors Referring to our simple illustrative example above with a full factorial linear model given symbolically by y 1 A B A B then A and B are said to be marginal to A B and 1 is marginal to A and B In a three way factorial model given by y x1 A B C A B A C B C A B C the terms A B C A B A C and B C are marginal to A B C Nelder 1977 1994 argues that meaningful and interesting tests for terms in such models can only be conducted for those tests which respect marginality relations This philosophy underpins the following description of the second Wald statistic available in ASReml the so called conditional Wald statistic This method is invoked by placing FCON on the datafile line ASReml attempts to construct conditional Wald statistics for each term in the fixed dense linear model so that marginality relations are respected As a simple example for the three way factorial model the conditional Wald statistics would be computed as 20 2 5 Inference
285. hooses an arrangement for plotting the predictions by recog nising any covariates and noting the size of factors However the user is able to customize how the predictions are plotted by either using options to the PLOT qualifier or by using the graphical interface The graphical interface is accessed by typing Esc when the figure is displayed The PLOT qualifier has the following options Table 9 2 List of predict plot options option action Lines and data addData superimposes the raw data 183 9 3 Prediction Table 9 2 List of predict plot options option action addlabels factors addlines factors noSEs semult r joinmeans superimposes the raw data with the data points labelled using the given factors which must not be prediction factors This option may be useful to identify individual data points on the graph for instance potential outliers or alternatively to identify groups of data points e g all data points in the same stratum superimposes the raw data with the data points joined using the given factors which must not be prediction factors This option may be useful for repeated measures data specifies that no error bars should be plotted by default they are plotted specifies the multiplier of the SE used for creating error bars default 1 0 specifies that the predicted values should be joined by lines by default they are only joined if the
286. iance components models that is those linear with respect to variances in H the terms in Z4 are exact averages of those in 2 14 and 2 15 The basic idea is to use Z4 4 in place of the expected information matrix in 2 16 to update x The elements of Z4 are 1 The Z4 matrix is the scaled residual sums of squares and products matrix of y Y1 Yel where y is the working variate for k and is given by y HiPy H R R R Ki Oy ZG G t Ki E Og where y X7 Zu 7 and are solutions to 2 18 In this form the Al matrix is relatively straightforward to calculate The combination of the Al algorithm with sparse matrix methods in which only non zero values are stored gives an efficient algorithm in terms of both computing time and workspace 2 2 2 Estimation prediction of the fixed and random effects To estimate T and predict u the objective function log fy y u T Re log fulu G is used This is the log joint distribution of Y u Differentiating with respect to 7 and u leads to the mixed model equations Henderson et al 1959 Robinson 1991 which are given by X R X X R Z 7 XR 2 18 ZRIX Z R Z G9 a ZR y These can be written as CB WR y 14 2 3 What are BLUPs where C W R W G B r u l and _ l0 0 da 0G The solution of 2 18 requires values for o and op In practice we replace o and a by their REML estimates o
287. idual matrix Unfortunately ASReml does not yet have an automatic way of taking the estimates from the univariate analyses and using them in the diagonal analysis The Log likelihood from this run is 20000 1566 45 Once the model from PATH 1 has run we can rerun the anal ysis changing ARG 1 to ARG 2 to obtain the next analysis With the statement CONTINUE coopmf1 rsv ASReml generates initial values from the coopmf1 rsv file if no filename is given ASRem1 will look for the previous rsv file to generate initial values In analysis 2 we get estimates of the sire dam and litter matrices based on a factor analysis parameterization This can give better initial values for unstructured matrices and indicate if the estimated matrices are near singularity The log likelihood from this run is 20000 1488 11 In this case the dam variance parameters are Source Model terms Gamma Sigma Sigma SE C xfai TrDam123 id dam 14244 effects TrDam123 XFA_V O 1 0 405222 0 405222 1 30 OP TrDam123 XFA_V 0 2 0 00000 0 00000 0 00 OF TrDam123 XFA_V O 3 0 616712E 02 0 616712E 02 1 14 0 P TrDam123 AFAL 1 1 1 29793 1 29793 9 05 0 P TrDam123 XFA L 1 2 1 68814 1 68814 9 96 OP TrDam123 XFA_L 1 3 0 124492 0 124492 6 02 321 15 10 Multivariate animal genetics data Sheep And one of the dam specific variances is zero The resulting dam matrix is Covariance Variance Correlation Matrix XFA xfai TrDam123 id dam 2 090 0 8981 0 7590 0 8981 2 190 2 845 0 8451 1
288. iduals are non parents and have no progeny and there is interest in predictions for parents alone This can happen in large forestry trials The reduced animal model expresses the non parent genetic effect in terms of parent effects and a Mendelian sampling term that is combined with the residual effect for the residual We consider the case when there is data on parents and non parents and some individuals are inbred An example tree model for a single trait and a single site might be DBH mu r nrmv tree plot aritv column ar1 row residual idv units since trees are often planted in plots of say 5 trees This is a spatial analysis the idv units term is required so that error variance is not transferred to the nrmv tree term since trees are unreplicated This analysis requires a pedigree file say TreePed csv and if the DIAG qualifier is specified on the pedigree line the resulting aif file will contain the inbreeding level for every tree in the pedigree the diagonal of the A matrix and a N P code distinguishing parents with progeny from non parents without progeny To analyse the data using the RAM we need to incorporate these last two columns into the data file which can be done with the MERGE statement If there is data on parents further 166 8 10 The reduced animal model RAM processing of the data file is required create a copy of the tree field call it say parent and change it to 0 fo
289. ie scaled identity 7 6 2 Switching from the gamma to the sigma parameterization ASReml uses the gamma parameterization by default for univariate single section analyses see above However SIGMAP is a new qualifier with Release 4 that enables the user to force ASReml to use the sigma parameterization this case This is achieved by placing SIGMAP immediately after the independent variable and before on the model definition line For example yield SIGMAP mu variety r idv repl f mv residual idv units would force ASReml to use the sigma parameterization in NIN example 3a see Section 7 5 Table 7 3 gives the variance model specification for each of the six NIN examples column 3 the individual terms in G o and R o under the sigma parameterization column 4 the sigmas that are estimated under this parameterization column 5 the individual terms in G y and R 7 under the gamma parameterization column 6 and the gammas that are estimated under this parameterization column 7 122 Table 7 3 G structure for the random terms magenta and R structure for the residual error term cyan under both the sigma and gamma parameterizations and the corresponding sigma s gamma s under each parameterization for the series of NIN data examples sigma parameterization gamma parameterization no definition variance model G aq sigma s G T gamma s specification Ry az RAy 1 RCB analysis residual idv units eT
290. ier used on the top job control line Detailed descriptions follow 194 10 3 Command line options Table 10 1 Command line options option qualifier type action Frequently used command line options C N Ww Other command line options Bb Gg Hg Rr Ss Yv CONTINUE FINAL LOGFILE NOGRAPHS WORKSPACE w ARGS a ASK IBRIEF b DEBUG DEBUG 2 IGRAPHICS g HARDCOPY g INTERACTIVE ONERUN OUTFOLDER NA QUIET RENAME NA YVAR v NA XML job control job control screen output graphics workspace job control job control output control debug debug graphics graphics graphics job control output control post processing graphics job control workspace job control license output control continue iterations using previous estimates as initial values continue for one more iteration using previous esti mates as initial values copy screen output to basename asl suppress interactive graphics set workspace size to w Mbyte to set arguments a in job rather than on command line prompt for options and arguments reduce output to asr file invoke debug mode invoke extended debug mode set interactive graphics device set interactive graphics device graphics screens not displayed display graphics screen override rerunning requested by RENAME changes output folder calculation of functions of varianc
291. ies are based on the same records as are used in the analysis of the model fitted in the same run In particular it will ignore records that exist in the data file but were dropped as the data was read into ASReml either explicitly using DV or implicitly because the dependent variable had missing values Multiple tabulate statements are permitted either immediately before or after the linear model If a linear mixed model is not supplied tabulation is based on all records The tabulate statement has the form tabulate response_variables WT weight COUNT DECIMALS d SD RANGE STATS FILTER filter SELECT value factors 174 9 3 Prediction e tabulate is the directive name appearing on a new line e response_variables is a list of variates for which means are required IWT weight nominates a variable containing weights COUNT requests counts as well as means to be reported DECIMALS d 1 lt d lt 7 requests means be reported with d decimal places If omitted ASReml reports 5 significant digits if specified without an argument 2 decimal places are reported IRANGE requests the minimum and maximum of each cell be reported SD requests the standard deviation within each cell be reported STATS is shorthand for COUNT SD RANGE FILTER filter nominates a factor for selecting a portion of the data SELECT value indicates that only records with value in the filter column are to be in cluded facto
292. ies arranged as a grid of 4 rows by 24 columns rows are replicates a first order separable autoregressive spatial variance structure for the residuals can be specified by the consolidated model term ar1 column ar1 row where column and row are the appropriate columns in the data file However the number of data units must be the product of the number of levels for row and the number of levels for column 96 in this case If this is not the case or if more than one unit is associated with some row column combination ASReml will return an error message and it will not be possible to use ar1 column ar1 row for residual error If there are fewer than 96 units and each row column combination present is associated with one unit then the COLUMNFACTOR ROWFACTOR data file qualifiers see Table 5 2 can be used to augment the data by completing the grid to allow an appropriate analysis These rules will always be satisfied for a single section of data with IID errors that is R R 07I see Example 2 2 defined either by default ie with no residual specified or in terms of the units factor However a mismatch in both size and ordering is possible when either multiple sections are present as in multi environment trial MET analysis or when non identity variance model functions are used 115 7 3 Applying variance structures to the residual error term 7 3 2 Using sat to specify the residual model term for data with sections S
293. if provided the initial values are for the GFW FDIAM Trait Trait YEAR lower triangle of the symmetric matrix r us Trait id TEAM us Trait id SheepID specified row wise residual id units us Trait GP e finding reasonable initial values can be a problem When no initial values are provided as in code box ASReml takes half of the phenotypic variance matrix of the data as an initial value Since the variance component matrices for the TEAM and SheepID strata are not specified ASReml will plug in values derived from the observed phenotypic variance matrix GP requests that the resulting estimated matrix be kept within the parameter space ie it is to be positive definite 155 8 5 The command file The special qualifiers relating to multivariate analysis are ASUV and ASMV t see Table 5 4 for details e to use an error structure other than US for the residual stratum you must also specify ASUV see Table 5 4 and include mv in the model if there are missing values e to perform a multivariate analysis when the data have already been expanded use ASMV t see Table 5 4 e tis the number of traits that ASRem1 should expect e the data file must have t records for each multivariate record although some may be coded missing Note that if no residual line is inserted the id units us Trait variance structure is assumed for multivariate data chapterCommand file Genetic analysis 8 4 Introductio
294. ile line but before the linear model e g SUBSET EnvC Env 3 5 8 9 15 21 33 defines a reduced form of the factor Env just selecting the environments listed It might then be used in the model in an interaction A subset factor can be used in a TABULATE directive but not in a PREDICT directive The intention is to simplify the model specification in MET Multi En vironment Trials analyses where say Column effects are to be fitted to a subset of environments It may also be used on the intrinsic factor Trait in a multivariate analysis provided it correctly identifies the number of levels of Trait either by including the last trait number or appending sufficient zeros Thus if the analysis involves 5 traits SUBSET Trewe Trait 13400 sets hardcopy graphics file type to wmf Table 5 5 List of rarely used job control qualifiers qualifier action ATLOADINGS 2 ATSINGULARITIES controls modification to AI updates of loadings in extended Factor Ana lytic models After ASReml calculates updates for variance parameters it checks whether the updates are reasonable and sometimes reduces them over and above any STEPSIZE shrinkage The extra shrinkage has two levels Loadings that change sign are restricted to doubling in magni tude and if the average change in magnitude of loadings is greater than 10 fold they are all shrunk back Unless the user gives constraints ASReml sets them and rotates the load ings each iteration
295. imate prediction variance matrix corresponding to the dense portion It is only written if the VRB qualifier is specified The file is formatted for reading back for post processing The number of equations in the dense portion can be increased to a maximum of 800 using the DENSE option Table 5 5 but not to include random effects The matrix is lower triangular row wise in the order that the parameters are printed in the sln file It can be thought of as a partitioned lower triangular matrix g B pa where B is the dense portion of B and C is the dense portion of C This is part of nin89a vrb Note that the first element is the estimated error variance that is 48 6802 see the variance component estimates in the asr output 0 487026E 02 0 000000E 00 0 O000000E 00 0 298409E 01 0 000000E 00 0 807354E 01 0 470629E 01 0 Q000000E 00 0 456542E 01 0 886497E 01 0 315807E 00 0 000000E 00 0 409951E 01 0 476481E 01 0 876563E 01 0 295379E 01 0 Q000000E 00 0 343250E 01 0 389543E 01 0 416076E 01 0 743440E 01 0 163089E 01 0 000000E 00 0 377085E 01 0 428016E 01 0 472451E 01 0 402633E 01 0 837086E 01 0 129027E 01 O 000000E 00 0 329974E 01 0 347377E 01 0 357535E 01 0 316846E 01 0 412043E 01 0 768099E 01 0 309076E 00 0 000000E 00 0 376552E 01 0 419706E 01 0 395640E 01 0 383367E 01 0 458364E 01 0 378483E 01 0 984962E 01 0 226400E 01 0 Q000000E 00 0 379190E 01 0 442373E 01 0 439411E 01 0 402430E 01 0 440457E 01 0 362313E 01 0 502025E 01 0 90
296. imates variety BRULE 2 984 2 841 variety REDLAND 4 706 2 977 variety CODY 0 3158 2 961 variety ARAPAHOE 2 954 Bahay variety NE87615 1 033 2 934 variety NE87619 5 937 2 849 variety NE87627 4 378 2997 mu 1 24 09 2 465 intercept mv_estimates 1 21 91 6 731 missing value estimates mv_estimates 2 23 22 6 723 mv_estimates 3 22 52 Gril mv_estimates 4 23 49 6 678 mv_estimates 5 2221 6 700 mv_estimates 6 24 47 6 709 mv_estimates T 20 14 6 699 mv_estimates 8 25 01 6 693 223 13 3 Key output files mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates mv_estimates 9 24 29 6 678 10 26 30 6 660 11 24 99 6 592 IZ 2 ld 6 493 13 25 39 6 305 14 26 81 5 900 15 29 07 4 906 16 23 97 4 577 17 24 27 4 618 18 29 82 4 532 13 3 3 The yht file The yht file contains the predicted values of the data in the original order this is not changed by supplying row column order in spatial analyses the residuals and the diagonal elements of the hat matrix Figure 13 1 shows the residuals plotted against the fitted values Yhat and a line printer version of this figure is written to the res file Where an observa tion is missing the residual missing values predicted value and Hat value are also declared missing The missing value estimates with standard errors are reported in the s1n file NIN alliance trial 1989 Residuals vs Fitted values Residuals 24 8
297. ime with the finishing time The execution times for parts of the Iteration process are written to the as1 file if the DEBUG LOGFILE com mand line qualifiers are invoked if BRIEF 1 is invoked the effects that were included in the dense portion of the solution are also printed in the asr file with their standard error a t statistic for testing that effect and a t statistic for testing it against the preceding effect in that factor placed in the pvc file when postprocessing with a pin file and graphics file given if the DL command line option is used for non spatial analyses ASReml prints the slope of the regression of log abs residual against log predicted value This regression is expected to be near zero if the variance is independent of the mean A power of the mean data transformation might be indi cated otherwise The suggested power is approximately 1 b where b is the slope A slope of 1 suggests a log transformation This is indicative only and should not be blindly applied Weighted analysis or identifying the cause of the heterogeneity should also be considered This statistic is not reliable in genetic animal models or when units is included in the linear model because then the predicted value includes some of the residual 241 13 5 ASReml output objects and where to find them Table 13 2 Table of output objects and where to find them ASReml output object found in comment observed
298. in terms of the subset parameters 5 10 14 18 and 19 can be introduced by editing the RN_GN and RP_scale columns Some users would prefer to insert initial values into this tsv file under the Initial_value column As an example the file below contains values based on using 4 8 26 70 35 and 70 for parameters 5 10 14 18 and 19 The data values in the tsv file become GN Term Type PSpace Initial_value RP_GN RP _scale 5 units us Trait us Trait _1 G P 4 8 5 1 0000 6 units us Trait us Trait _2 G P 4 8 5 1 0000 7 units us Trait us Trait _3 G P 9 6 5 2 0000 8 units us Trait us Trait _4 G P 4 8 5 1 0000 9 units us Trait us Trait _5 G P 9 6 5 2 0000 10 units us Trait us Trait _6 G P 26 10 1 0000 11 units us Trait us Trait _7 G P 4 8 5 1 0000 12 units us Trait us Trait _8 G P 9 6 5 2 0000 13 units us Trait s us Trait 9 G P 26 10 1 0000 gt 137 7 9 Ways to present initial values to ASReml 14 unite ws Trait suetirait 10 G F 70 4 14 1 0000 15 units us Trait us Trait _11 G P 4 8 5 1 0000 16 units us Trait sus Trait 12 G P 9 6 5 2 0000 17 units us Trait us Trait 13 G P 26 10 1 0000 18 units us Trait us Trait _14 G P 35 18 1 0000 19 units us Trait us Trait _15 G P 70 19 1 0000 Sometimes users wish to rerun a job making changes to the final values par
299. in this context is to hold the previously estimated factor loadings fixed for a few iterations so that the factor k 1 initally aims to explain variation previously incorporated in w Then allow all loadings to be updated in the remaining rounds A second problem at present unresolved but somewhat improved is that sometimes the LogL rises to a relatively high value and then drifts away In an attempt to make the process easier these two processes have been linked as an addi tional meaning for the AILOADING n qualifier When fitting k factors with N gt k the first k 1 loadings are held fixed no rotation for the first k iterations Then for iterations k 1 to n loadings vectors are updated in pairs and rotated If AILOADING is not set by the user and the model is an upgrade from a lower order XFA AILOADING is set to 4 The problem of XFA loadings going off scale has been reduced by adding a variable penalty the the loading part of the Al matrix It is not unusual for users to have trouble comprehending and fitting extended factor analytic models especially with more than two factors Two examples are developed in a separate document available on request 146 7 12 Variance models available in ASReml 7 12 Variance models available in ASReml Table 7 6 Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr hom het model variance
300. ing workspace Otherwise send problem to VSN 264 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy PROGRAMMING error reading SELF option Reading distances for POWER structure Reading factor names reading Overdispersion factor READING OWN structures Reading the data Reading Update step size Residual Variance is Zero R header SECTIONS DIMNS GSTRUCT R structure header SITE DIM GSTRUCT Variance header SEC DIM GSTRUCT R structure error ORDER SORTCOL MODEL GAMMAS R structures are larger than number of records REQUIRE ASUV qualifier for this R structure REQUIRE I x E R structure Scratch indicates ASReml has failed deep in its core It is likely to be an interaction between the data and the variance model being fitted Try increasing the memory simplifying the model and changing starting values for the gammas If this fails send the problem to the VSN mailto support asreml co uk for investigation Check the argument POWER structures are the spatial variance models which re quire a list of distances Distances should be in increasing order If the distances are not obtained from variables the SORT field is zero and the distances are presented after all the R and G structures are defined something is wrong in the terms definitions It could also b
301. into V99 and changes any missing values in V99 to zero It then adds V98 and discards the whole record if the result is zero i e both YA and YB have missing values for that record Variables 98 and 99 are not labelled and so are not retained for subsequent use in analysis 60 5 6 Datafile line 5 5 4 Special note on covariates Covariates are variates that appear as independent variables in the model It is recommended that covariates be centred and scaled to have a mean near zero and a variance of about one to avoid failure to detect singularities This can be achieved either e externally to ASReml in data file preparation e using RESCALE mean scale where mean and scale are user supplied values for example age rescale 140 142857 in weeks 5 6 Datafile line The purpose of the datafile line is to NIN Alliance Trial 1989 variety A e nominate the data file e specify qualifiers to modify row 22 the reading of the data column 11 the output produced nin89aug asd skip 1 the operation of ASReml yield mu variety 5 6 1 Data line syntax The datafile line appears in the ASReml command file in the form datafile qualifiers e datafile is the path name of the file that contains the variates factors covariates traits response variates and weight variables represented as data fields see Chapter 4 enclose the path name in quotes if it contains embedded blanks e the qualifiers tell ASRe
302. ion algorithm for factor analytic and reduced rank variance models Australian and New Zealand Journal of Statistics 45 445 459 Verbyla A P 1990 A conditional derivation of residual maximum likelihood Australian Journal of Statistics 32 227 230 Verbyla A P Cullis B R Kenward M G and Welham S J 1999 The analysis of designed experiments and longitudinal data by using smoothing splines with discussion Applied Statistics 48 269 311 Waddington D Welham S J Gilmour A R and Thompson R 1994 Comparisons of some glmm estimators for a simple binomial model Genstat Newsletter 30 13 24 Welham S J 2005 Glmm fits a generalized linear mixed model in R Payne and P Lane eds GenStat Reference Manual 3 Procedure Library PL17 VSN International Hemel Hempstead UK pp 260 265 Welham S J Cullis B R Gogel B J Gilmour A R and Thompson R 2004 Predic tion in linear mixed models Australian and New Zealand Journal of Statistics 46 325 347 Wolfinger R D 1996 Heterogeneous variance covariance structures for repeated measures Journal of Agricultural Biological and Environmental Statistics 1 362 389 Wolfinger R and O Connell M 1993 Generalized linear mixed models A pseudo likelihood approach Journal of Statistical Computation and Simulation 48 233 243 Yates F 1935 Complex experiments Journal of the Royal Statistical Society Series B 2 181 247
303. ion are used in forming that variance structure Often sections relate to sites or trials or experiments in the case where several related trials are analysed together For example consider a MET dataset comprising data for three sites To model the residuals at each site by a separate AR1xAR1 variance structure we could write residual sat site ariv column ari row Alternatively an AR1xAR1 variance structure for sites 1 and 3 but an IDVxAR1 structure for site 2 could be coded using sat either as residual sat site 1 ariv column ari row sat site 2 idv column ari row sat site 3 ariv column ar1 row or more succinctly as residual sat site 1 3 ariv column ari row sat site 2 ariv column id row For each of these definitions ASReml will determine the particular levels in row and column for each site and hence the appropriate sizes of the AR1 matrices 116 7 5 A sequence of variance structures for the NIN data Important point A variance structure needs to be specified for every level of the sectioning factor in which case residual sat site 1 3 ar1 row ar1 column would fail as there is no variance structure specified for site 2 7 4 Identifiability Once all variables have a variance model function applied ASReml attempts to determine whether the term is identifiable that is the terms that can be separately estimated from are not confounded with other terms in the model If the cons
304. ions AS SIGN strings and commandline arguments may substitute into a CYCLE line e I J K and L are reserved as names referring to items in the CYCLE list and should therefore not be used as names of an ASSIGN string 201 10 4 Advanced processing arguments High level qualifiers qualifier action ICYCLE SAMEDATA list DOPATH n DOPART n is a mechanism whereby ASReml can loop through a series of jobs The ICYCLE has a qualifier SAMEDATA that tells ASReml to use the same data for all cycles ie the data file is only read on the first cycle and is kept in memory for later cycles The CYCLE qualifier must appear on its own line list is a series of values which are substituted into the job wherever the I string appears The list may spread over several lines if each incomplete line ends with a COMMA A series of sequential integer values can be given in the form 7 j no embedded spaces The output from the set of runs is concatenated into a single set of files but the output written to the asr file is slightly abbreviated after the first cycle by suppressing the data summary and fixed effect solutions that might otherwise appear see BRIEF the BRIEF qualifier is set after the first cycle For example ICYCLE 0 4 0 5 0 6 20 O mat2 1 9 I GPF would result in three runs and the results would be appended to a single file Putting SAMEDATA on the leading CYCLE line makes ASReml read the data and
305. ions NIN Alliance Trial 1989 variety A e should be given for all fields in the data file id fields can be skipped and fields on the end pia of a data line without a field definition are raw ignored if there are not enough data fields repl 4 on a data line the remainder are taken from 1 yield the next line s Jat long e must be presented in the order they appear row 22 in the data file column 11 nin89aug asd skip 1 we ield mu variet can appear with other definitions on the 7 y same line data fields can be transformed see below additional variables can be created by transformation qualifiers 46 5 4 Specifying and reading the data 5 4 1 Data field definition syntax Data field definitions appear in the ASReml command file in the form SPACE label field_type transformations e SPACE is now optional e label is an alphanumeric string to identify the field has a maximum of 31 characters although only 20 are ever printed displayed must begin with a letter must not contain the special characters or reserved words Table 6 1 and Table 7 6 must not be used CSKIP c can be used to skip c default 1 data fields e field_type defines how a variable is interpreted as it is read and whether it is regarded as a factor or variate if specified in the linear model for a variate leave field_type blank or specify 1
306. ip matrix Sometimes a relationship matrix is required other than the one ASReml can produce from the pedigree file We call this a GRM General Relationship Matrix The inverse of a GRM is a GIV matrix The user can provide the relationship matrix in a grm file and ASReml will invert it to form the GIV matrix since it is the inverse that is used in the mixed model equations Alternatively the user can provide a giv file containing the inverted GRM matrix The syntax for specifying a GRM file say name grm or the GIV file say name giv is name s d grm SKIP n DENSEGRM o GROUPDF n ND PSD NSD PRECISION n or name s d giv SKIP n DENSEGIV o GROUPDF n SAVEGIV f e the named file must have a giv grm sgiv sgrm dgiv or dgrm extension e sgiv and sgrm files are binary format files and will be read lower triangle row wise assuming single precision e dgiv and dgrm files are binary format files and will be read lower triangle row wise assuming double precision e the named file will be read assuming single double precision lower triangle row wise 162 8 9 Reading a user defined inverse relationship matrix e the G inverse files must be specified on the line s immediately prior to the data file line after any pedigree file e up to 98 G inverse matrices may be defined e the file must be in SPARSE format unless the DENSE qualifier is specified e a dense format fil
307. is simply the order those treatment labels were discovered in the data file Split plot analysis oat Variety Nitrogen 14 Apr 2008 16 15 49 oats Ecode is E for Estimable for Not Estimable The predictions are obtained by averaging across the hypertable calculated from model terms constructed solely from factors in the averaging and classify sets Use AVERAGE to move ignored factors into the averaging set aa i a a ae eee 1 Predicted values of yield The SIMPLE averaging set variety The ignored set blocks wplots nitrogen Predicted_Value Standard_Error Ecode 0 6_cwt 123 3889 7 1747 E 0 4_cwt 114 2222 7 1747 E 0 2_cwt 98 8889 7 1747 E O_cwt 79 3889 7 1747 E SED Overall Standard Error of Difference 4 436 le ali ee ee cea et 2 mn es es a mli Mlle Predicted values of yield The SIMPLE averaging set nitrogen The ignored set blocks wplots variety Predicted_Value Standard_Error Ecode Marvellous 109 7917 7 7975 E Victory 97 6250 T 7975 E Golden_rain 104 5000 7 7975 E SED Overall Standard Error of Difference 7 079 ei a i a a a Mlle Mlm 3 a i a ee ee Mli Predicted values of yield The ignored set blocks wplots nitrogen variety Predicted_Value Standard_Error Ecode 0 6_cwt Marvellous 126 8333 9 1070 E 0 6_cwt Victory 118 5000 9 1070 E 0 6_cwt Golden_rain 124 8333 9 1070 E 0 4_cwt Marvellous 117 1667 9 1070 E 0 4_cwt Victory 110 8333 9 1070 E
308. is TAB separated yht becomes _yht txt YHTFORM 2 is COMMA separated yht becomes yht csv YHTFORM 3 is Ampersand separated yht becomes _yht tex adds r to the total Sum of Squares This might be used with DF to add some variance to the analysis when analysing summarised data 82 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action this is a test of matern Variogram of fac xsca ysca predictors 21 61 n S H 5 e m i r V 135 a j 0 r j y i K a i j i 45 c e 399 Distance 2 80 Figure 5 1 Variogram in 4 sectors for Cashmore data Table 5 6 List of very rarely used job control qualifiers qualifier action ICINV n prints the portion of the inverse of the coefficient matrix pertaining to FACPOINTS n the n term in the linear model Because the model has not been defined when ASReml reads this line it is up to the user to count the terms in the model to identify the portion of the inverse of the coefficient matrix to be printed The option is ignored if the portion is not wholly in the SPARSE stored equations The portion of the inverse is printed to a file with extension cii The sparse form of the matrix only is printed in the form i j C that is elements of C that were not needed in the estimation process are not included in the file affects the number of distinct points recognised by the fac model func tion Table 6 1 The
309. is keeping the estimated variance matrix positive definite These are not simple issues and in the following we present a pragmatic approach to them The data are taken from a large genetic study on Coopworth lambs A total of 5 traits namely weaning weight wwt yearling weight ywt greasy fleece weight gfw fibre di ameter fdm and ultrasound fat depth at the C site fat were measured on 7043 lambs The lambs were the progeny of 92 sires and 3561 dams produced from 4871 litters over 49 flock year combinations Not all traits were measured on each group No pedigree data was available for e dams The aim of the analysis is to estimate heritability h of each trait and to estimate the genetic correlations between the five traits We will present two approaches a half sib analysis and an analysis based on the use of an animal model which directly defines the genetic covariance between the progeny and sires and dams The data fields included factors defining sire dam and lamb tag covariates such as age the age of the lamb at a set time brr the birth rearing rank 1 born single raised single 2 born twin raised single 3 born twin raised twin and 4 other sex M F and grp a factor indicating the flock year combination 15 10 1 Half sib analysis In the half sib analysis we include terms for the random effects of sires dams and litters In univariate analyses the variance component for sires is denoted by o to where
310. is not valid for generalised linear mixed models as the reported LogL does not include components relating to the reweighting Furthermore it is not appropriate if the fixed effects in the model have changed In particular if fixed effects are fitted in the sparse equations the order of fitting may change with a change in the fitted variance structure resulting in non comparable likelihoods even though the fixed terms in the model have not changed The iteration sequence terminates when the maximum iterations see MAXIT on page 68 has been reached or successive LogL values are less than 0 0027 apart The following is a copy of nin89a asr ASRem1l 4 0 01 Jan 2013 NIN Alliance Trial 1989 version amp title Build ki 07 Jan 2014 64 bit date 29 Jan 2014 09 34 34 315 32 Mbyte Windows x64 nin89a workspace Licensed to VSNi Robin Thompson 3 EEEE Soo I o I C RI I K I A A AOK aK I A 1 21 21 3 4 4 kkk kk kkk kkk kk kk kk Contact support asreml co uk for licensing and support EEEE Sooo ooo o o o ORK kkk k kkk ARG Folder D latest Data examples4 arg Manex4f variety A QUALIFIERS SKIP 1 DISPLAY 15 QUALIFIER DOPART 1 is active Reading nin89aug asd FREE FORMAT skipping 1 lines Univariate analysis of yield Summary of 242 records retained of 242 read data summary Model term Size miss zero MinNonO Mean MaxNonO StndDevn 1 variety 56 0 0 1 26 4545 56 2 iq 0 0 1 0000 26 45 56 00 1 18 3 pid 18 0 1101 26026 4156 1121 4 raw 18 Q
311. is up to the user and deduced from the first line which is assumed to be a an XY individual Thus whatever string is found in the fourth field on the first line of the pedigree is taken to mean XY and any other code found on other records is taken to mean XX 8 8 Genetic groups If all individuals belong to one genetic group then use 0 as the identity of the parents of base individuals However if base individuals belong to various genetic groups this is indicated by the GROUPS qualifier and the pedigree file must begin by identifying these groups All base individuals should have group identifiers as parents In this case the identity 0 will only appear on the group identity lines as in the following example where three sire lines are fitted as genetic groups 161 8 9 Reading a user defined inverse relationship matrix Genetic group example G10 0 Animal P G20 0 Sire A G3 0 0 Dam SIRE_1 Gi Gi Line 2 SIRE2 G1 Gi AgeOfDam SIRE_3 Gi Gi adailygain SIRE_4 G2 G2 Y2 SIRE_5 G2 G2 Y3 SIRE_6 G3 G3 harveyg ped ALPHA GROUPS 3 SIRE_7 G3 G3 harvey dat SIRE_8 G3 G3 adailygain mu Line fixed model SIRE_9 G3 G3 Ir grmiv Animal INIT 0 25 random model 101 SIRE_1 G1 residual idv units 102 SIRE_1 G1 103 SIRE_1 G1 163 SIRE_9 G3 164 SIRE_9 G3 165 SIRE_9 G3 Important It is usually appropriate to allocate a genetic group identifier where the parent is unknown 8 9 Reading a user defined inverse relationsh
312. istics for the spatial models are greater 290 15 6 Spatial analysis of a field experiment Barley than for the lattice analysis We note the Wald F statistic for the AR1xAR1 units model is smaller than the Wald F statistic for the AR1x AR1 Predicted values of yield AR1 x AR1 variety Predicted_Value Standard_Error Ecode 1 0000 1257 9763 64 6146 E 2 0000 1501 4483 64 9783 E 3 0000 1404 9874 64 6260 E 4 0000 1412 5674 64 9027 E 5 0000 1514 4764 65 5889 E 23 0000 1311 4888 64 0767 E 24 0000 1586 7840 64 7043 E 25 0000 1592 0204 63 5939 E SED Overall Standard Error of Difference 59 05 AR1 x AR1 units variety Predicted_Value Standard_Error Ecode 1 0000 1245 5843 97 8591 E 2 0000 1516 2331 97 8473 E 3 0000 1403 9863 98 2398 E 4 0000 1404 9202 97 9875 E 5 0000 1471 6197 98 3607 E 23 0000 1316 8726 98 0402 E 24 0000 1557 5278 98 1272 E 25 0000 1573 8920 97 9803 E SED Overall Standard Error of Difference 60 51 IB Rep is ignored in the prediction RowBlk is ignored in the prediction Co1lBlk is ignored in the prediction variety Predicted_Value Standard_Error Ecode 1 0000 1283 5870 60 1994 E 2 0000 1549 0133 60 1994 E 3 0000 1420 9307 60 1994 E 4 0000 1451 8554 60 1994 E 5 0000 1533 2749 60 1994 E 23 0000 1329 1088 60 1994 E 24 0000 1546 4699 60 1994 E 25 0000 1630 6285 60 1994 E SED Overall Standard Error of Difference 62 02 Notice the differences in SE and SED associated with the various models Choosing a model
313. it aids formulation of prediction tables see ASSOCIATE qualifier on page 186 Common examples are Genotypes grouped into Families and Locations grouped by Region We call these associated factors The key characteristic of associated factors is that they are coded such that the levels of one are uniquely nested in the levels of another If one is unknown coded as missing all associated factors must be unknown for that data record It is typically unnecessary to interact associated factors except when required to adequately define the variance structure 96 6 6 Alphabetic list of model functions 6 6 Alphabetic list of model functions Table 6 2 presents detailed descriptions of the model functions discussed above Note that some three letter function names may be abbreviated to the first letter Table 6 2 Alphabetic list of model functions and descriptions model function action abs v takes the absolute value of the variable v This function can be used on the response variable and t r overlays adds r times the design matrix for model term t to the existing design a t r matrix Specifically if the model up to this point has p effects and t has a effects the a columns of the design matrix for t are multiplied by the scalar r default value 1 0 and added to the last a of the p columns already defined The overlaid term must agree in size with the term it overlays This can be used to force a correlation of 1 betwee
314. ites in which the specific variances are all equal For the xfak variance model functions ASReml orders the parameters as the specific variances followed by the loadings note that this is different to the ordering for the fak variance model functions see previous example In this example the first loading in the second factor is constrained to be equal to zero for identifiability xfa2 site VVVV00000000 contracted form 4V8 4P4PZ3P INIT 4 0 2 4 1 2 0 3 0 3 gen 7 7 2 New R4 Ways to supply distances in one dimensional metric based models COORD v Power models rely on the definition of distance for the associated term Information for determining distances is supplied either implicitly by applying the variance model function to the fac of the coordinate variables for example expv fac X where X contains the positions or explicitly with the COORD qualifier for example 126 7 7 Variance model function qualifiers expv Time COORD x where x is a vector of distances which has to be of length the number of levels of Time For computational reasons it is useful to have the range of x between 5 and 50 7 7 3 Your own program Fi The OWN variance structure is a facility whereby advanced users may specify their own variance structure This facility requires the user to supply a program MYOWNGDG that reads the current set of parameters forms the G matrix and a full set of derivative matrices and writes
315. ither argument is supplied 2 is assumed If the second argument is omitted it is given the value of the first If the problem of later singularities arises because of the low coefficient of variation of a covariable it would be better to centre and rescale the covariable If the degrees of freedom are correct in the first iteration the problem will be with the variance parameters and a different variance model or variance constraints is required requests writing of vrb file Previously the default was to write the file 85 6 Command file Specifying the terms in the mixed model 6 1 Introduction The linear mixed model is specified in ASReml as a series of model terms and qualifiers In this and the following chapter we discuss a functional specification of mixed models in ASReml This chapter describes the model formula syntax for traditional variance component models From Chapter 2 the linear mixed model can be written as y XT Zut e 6 1 where y n x 1 is a vector of observations T p x 1 is a vector of fixed effects X n x p is the design matrix of full column rank that associates observations with the appropriate combination of fixed effects u q x 1 is a vector of random effects Z n x q is the design matrix that associates observations with the appropriate combination of random effects and e n x 1 is the vector of residual errors Typically 7 and u are composed of several model terms that is 7 can be part
316. itioned as T T TI and u can be partitioned as u ul u with X and Z partitioned conformably as X X X and Z Z Zol In this chapter we concentrate on specification of the fixed and random effects and their associated design matrices For ease of exposition we assume variance component mixed models Example 2 2 In these models the random effects within model terms and the residual errors are assumed to be identically and independently distributed IID This means they have a common variance and zero covariance In these variance component models a functional specification is relatively simple and we discuss this here In Chapter 7 we present a more general functional specification of random effects and variance structures 86 6 2 Specifying model formulae in ASReml 6 2 Specifying model formulae in ASReml The linear mixed model is specified in ASReml as a se NIN Alliance Trial 1989 ries of model terms and qualifiers Model terms include variety factor and variate labels Section 5 4 functions of la bels special terms and interactions of these The model column 11 is specified immediately after the datafile and any job nin89 asd skip 1 control qualifier and or tabulate lines The syntax for yield mu variety r idv repl me If mv specifying the model is rasial tetua response qualifiers fixed r conrandom f sparse_fixed residual conresidual e response is the label f
317. its us Trait us Trait _3 G P 15 298889 7 1 8 6 1 9 9 i tnits us Trait us Treit A G P 4 8438271 unites ne Trait us Trait 5 G P 11 264815 10 units us Trait us Trait _6 G P 26 095692 10 1 ii units us Trait us Trait _7 G P 4 6882715 11 1 12 units us Trait us Trait _8 G P 10 824074 12 1 13 units us Trait us Trait 9 G P 27 332887 13 1 14 units us Trait us Trait _10 G P 71 875403 14 1 15 unhitse ws Trait us Trait _ii G P 3 9083333 15 1 16 units us Trait us Trait _12 G P 10 292592 16 1 17 units us Trait us Trait _13 G P 34 137962 17 1 18 units us Trait us Trait _14 G P 69 287036 18 1 19 units us Trait us Trait _15 G P 141 97296 19 1 Parameter constraints and initial values can be changed by editing the values in the PSpace and Initial_value columns Scale relationships can be introduced by noting that the full set of parameters can be related to a subset of parameters and scale factors such as parameter subset parameter scale or GN column parameter RP_GN column parameter RP_scale value where GN RP_GN and RP_scale are columns in the tsv file The relationships generated by VCC 2 5681115 7 29 212 2 16 2 parameters 6 8 11 15 are equal to 5 7 9 12 16 are twice 5 10 13 17 parameters 13 and 17 are equal to 10 the full set of parameters 5 19 can therefore be expressed
318. ivariate model and the univariate model of 15 7 The variety effects for each trait u in the bivariate model are partitioned in 15 7 into variety main effects and tmt variety interactions so that u l Q u1 Ue There is a similar partitioning for the run effects and the errors see table 15 9 In addition to the assumptions in the models for individual traits 15 9 the bivariate analysis involves the assumptions cov ty Ul Fv L44 COV Ur Un Ora Too and cov ec e Octd 132 Thus random effects and errors are correlated between traits So for example the variance matrix for the variety effects for each trait is given by 2 Ca Ova var Uy ve gn OL Ovet Ou This unstructured form for trait variety in the bivariate analysis is equivalent to the variety main effect plus heterogeneous tmt variety interaction variance structure 15 8 in the univariate analysis Similarly the unstructured form for trait run is equivalent to the run main effect plus heterogeneous tmt run interaction variance structure The unstructured form for the errors trait pair in the bivariate analysis is equivalent to the pair plus heterogeneous error tmt pair variance in the univariate analysis This bivariate 304 15 8 Paired Case Control study Rice analysis is achieved in ASReml as follows noting that the tmt factor here is equivalent to traits this is for the paired data id pair 132 run 66 variety 44 A yc ye
319. king directories maybe just keeping the as asr rsv and pvs files 218 13 3 Key output files 13 2 An example In this chapter the ASReml output files are discussed with reference to a two dimensional separable autoregressive spatial analysis of the NIN field trial data see model 3b on page 120 of Chapter 7 for details The ASReml command file for this analysis is presented to the right Recall that this model specifies a separable autoregressive correlation structure for residual or plot errors that is the direct product of an autoregressive correlation ma trix of order 22 for rows and an autoregressive correlation matrix of order 11 for columns 13 3 Key output files NIN Alliance Trial 1989 variety A id pid raw repl 4 nloc yield lat long row 22 column 11 nin89a asd skip 1 DISPLAY 15 tabulate yield variety yield mu variety f mv residual ar1 row ari column predict variety The key ASReml output files are the asr sln and yht files 13 3 1 The asr file This file contains e an announcements box outlined in asterisks containing current messages e asummary of the data for the user to confirm the data file has been interpreted correctly and to review the basic structure of the data and validate the specification of the model e the iteration sequence of REML loglikelihood values to check convergence e asummary of the variance parameters The Gamma column reports
320. l appears to be failing then please send details of the problem to support vsni co uk 1 6 Typographic conventions A hands on approach is the best way to develop a working understanding of a new computing package We therefore begin by presenting a guided tour of ASReml using a sample data set for demonstration see Chapter 3 Throughout the guide new concepts are demonstrated by example wherever possible In this guide you will find framed sample boxes to the right of the page as shown here These contain ASReml command file sample code Note that the code under discussion is highlighted in bold type for easy identification s AREE that some of the original code is omitted from the display An example ASReml code box bold type highlights sections of code currently under discussion remaining code is not highlighted the continuation symbol is used to indicate that some of the original code is omitted Data examples are displayed in larger boxes in the body of the text see for example page 40 Other conventions are as follows keyboard key names appear in SMALLCAPS for example TAB and ESC e example code within the body of the text is inthis size and font and is highlighted in bold type see pages 33 and 49 e in the presentation of general ASReml syntax for example path asreml1 basename as arguments typewriter font is used for text that must be typed verbatim for example asrem1 an
321. l effects This example fits direct effects for two traits but maternal effects for the first trait only str Trait animal at Trait 1 dam us 3 nrm animal A rather artificial example of using v greater than 1 is when we have 20 levels in a factor A and wish to use one variance for the first 8 levels and another for the last 12 levels Then str A idv 8 idv 12 will do this 7 3 Applying variance structures to the residual error term In Release 4 the residual error term is also defined using a consolidated model term and it now appears after a residual statement that has been introduced to specify the associated variance structure We give five examples Firstly for the default situation of IID residual errors the error model definition line would be residual idv units This second example would specify a separable autoregressive spatial model of order 1 AR1xAR1 for the observations from a trial arranged in a rectangular array indexed by the data variables column and row To apply this variance structure the observations would need to cover the whole grid but it would not be necessary to pre order the data file as rows within columns as ASReml uses the information in column and row to put the observations into the appropriate row within column order residual ariv column ari1 row If there were 3 columns and 23 rows in the previous example then this third example residual ariv 3 ar1 23 would be an equivalent coding fo
322. l lines Maybe the string is too long the error model is not correctly specified the file did not exist or was of the wrong file type binary unformatted sequential The PREDICT statement cannot be parsed ASReml failed to form the PREDICT design matrix This usually indicates the model has not been properly parsed and part is misinterpreted as a variance header line old syntax where the residual statement was expected When the model statement is written over several lines incomplete lines must end with a PLUS or COMMA character Check old syntax variance structure specification Check the filename is correct and that the file is not open in another process ASReml has failed to determine an order for solving the mixed model equations See EQORDER for some discussion Try in creasing WORKSPACE This error comes from the main read routine or from the variable definition parsing routine There are several messages of this form where something is what ASReml is attempting to read Either there is an error telling ASReml to read something when it does not need to or there is an error in the way something is specified 260 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Error reading the data Error reading the DATA FILENAME line Error reading the model factor list Error Ra
323. l w correlation CORB corb banded C 1 w l w 2w 1 correlation Cig b 1Sj lt w 1 l l lt 1 CORG corg general C 1 wld wt H1 acan correlation C i j w CORGH US l ere ij 148 7 12 Variance models available in ASReml Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr het model variance function name One dimensional unequally spaced EXP exp exponential C 1 1 l w Cy l iA j xi are coordinates 0 lt o lt 1 GAU gau gaussian C 1 1 LEWU Cy dO iF j xi are coordinates 0 lt lt 1 Two dimensional irregularly spaced x and y vectors of coordinates Oij min d 1 1 dij is euclidean distance IEXP iexp isotropic C 1 1 l w exponential Cy piti esltly usl ij 0 lt lt 1 IGAU igau isotropic C 1 1 1 w gaussian C glares Huy pany 0 lt lt 1 IEUC ieuc isotropic Er 1 1 1 w euclidean C pV imti Hui i ij 0 lt lt 1 LVR lvr linear variance C 1 64 1 l w 0 lt 149 7 12 Variance models available in ASReml Details of the variance models available in ASReml variance description algebraic number of parameters structure form name variance corr hom het model variance variance function name SPH sph spherical C 1 36 583 1 2 l w 0 lt CIR cir circular Web C 1 1 2 l w amp Oliver 7 ted i INE 2 6454 1 0 sin p 113 0 lt
324. laced before model terms to exclude them from y J the model placed at the end of a line to indicate that the model specification continues on the next line treated as a space J J I placed around some model terms when it is impor J 1 tant the terms not be reordered Section 6 4 commonly at f n condition on level n of factor f J J used n may be a list of level numbers functions at f forms conditioning covariables for all levels of fac J tor f fac v forms a factor from v with a level for each unique J value in v fac v y forms a factor with a level for each combination of J values in v and y lin f forms a variable from the factor f with values equal to 1 n corresponding to level 1 level n of the factor spl v k forms the design matrix for the random component J of a cubic spline for variable v other t n fits variable n from the G set of variables t This y y functions tin is a special case of the SUBGROUP qualifier func tion applied to G variables Note that the square parentheses are permitted alternative syntax abs v forms the absolute value of the variable v and t r adds r times the design matrix for model term t to J the previous design matrix r has a default value of 1 predefine it by saying t and t r c f factor fis fitted with sum to zero constraints J 90 6 2 Specifying model formulae in ASReml Table 6 1 Summary of reserved words operators and functions mod
325. lant number treatment identification and the 5 heights The ASReml input file for our first model is This is plant data multivariate tmt A Diseased Healthy plant 14 y1 y3 y5 y7 y10 grass asd skip 1 ASUV IY y1 G tmt JOIN Plot the data yl y3 y5 y7 y10 Trait tmt Tr tmt r idv units residual idv units Trait 279 15 5 Balanced repeated measures Height The focus is modelling of the error variance for the data Specifically we fit the multivariate regression model given by Y DT E 15 1 where Y is the matrix of heights D is the design matrix T is the matrix of fixed effects and E is the matrix of errors The heights taken on the same plants will be correlated and so we assume that var vec E I4 X 15 2 where is a symmetric positive definite matrix The variance models used for are given in Table 15 4 These represent some commonly used models for the analysis of repeated measures data see Wolfinger 1986 Note that we have specified the ASUV qualifier This is required to allow the fitting of all these models Without ASUV ASReml woul only allow us to fit the final UnStructured variance model which is the default R structure fo Table 15 4 Summary of variance models fitted to the plant data number of REML model parameters log likelihood BIC Uniform 2 196 88 401 95 Power 2 182 98 374 15 Heterogeneous Power 6 171 50 367 57 Antedependence order 1 9 160 37 357 5
326. le 254 14 5 Information Warning and Error messages Table 14 1 Some information messages and comments information message comment Logl converged BLUP run done JOB ABORTED by USER Logl converged parameters not converged Logl not converged Warning Only one iteration performed Parameters unchanged after one iteration Messages beginning with the word Notice are not generally listed here information the user should be aware of as it may affect the interpretation of results They are not in themselves errors in that the syntax is valid but they may reflect errors in the the REML log likelihood last changed less than 0 002 iter ation number and variance parameter values appear stable A full iteration has not been completed See discussion of BLUP See discussion of ABORTASR NOW the change in REML log likelihood was small and conver gence was assumed but the parameters are in fact still changing the maximum number of iterations was reached before the REML log likelihood converged The user must decide whether to accept the results anyway to restart with the CONTINUE command line option see Section 10 3 on job control or to change the model and or initial values be fore proceding The sequence of estimates is reported in the res file It may be necessary to simplify the model and estimate the dominant components before estimating other terms if the LogL is oscilating Paramete
327. les setting n to 1 means the file is not formed modifies the appearance of the variogram calculated from the residuals obtained when the sampling coordinates of the spatial process are defined on a lattice The default form is based on absolute distance in each direction This form distinguishes same sign and different sign distances and plots the variances separately as two layers in the same figure specifies that n constraints are to be applied to the variance parameters The constraint lines occur after the G structures are defined The con straints are described in Section 7 8 2 The variance header line struc tural specification or residual line Section 6 2 must be present even if only O O 0 or residual units indicating there are no explicit R or G structures see Section 7 8 2 requests that the variogram formed with radial coordinates see page 18 be based on s 4 6 or 8 sectors of size 180 s degrees The default is 4 sectors if VGSECTORS is omitted and 6 sectors if it is specified without an argument The first sector is centred on the X direction Figure 5 1 is the variogram using radial coordinates obtained using pre dictors of random effects fitted as fac xsca ysca It shows low semi variance in xsca direction high semivariance in the ysca direction with intermediate values in the 45 and 135 degrees directions controls the form of the yht file YHTFORM 1 suppresses formation of the yht file YHTFORM 1
328. log likelihoods for models 1 and 2 are comparable and likewise for models 3 to 6 The REML log likelihoods are not comparable between these groups due to the inclusion of the fixed season term in the second set of models We begin by modelling the variance matrix for the intercept and slope for each tree as a diagonal matrix as there is no point including a covariance component between the intercept and slope if the variance component s for one or both is zero Model 1 also does not include a non smooth component at the overall level that is fac age Abbreviated output is shown below 6 LogL 97 8517 s2 7 2838 33 df 7 LogL 97 7837 S2 6 6673 33 dt 8 LogL 97 7792 S2 6 4634 33 df 9 LogL 97 7788 52 6 3911 33 df 10 LogL 97 7788 52 6 3615 33 df Results from analysis of circ Akaike Information Criterion 205 56 assuming 5 parameters Bayesian Information Criterion 213 04 Model_Term Gamma Sigma Sigma SE C 313 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges idv spl age 7 IDV_V 5 100 466 639 116 1 55 OF Residual SCA_V 35 1 000000 6 36154 1 74 GP idv Tree ID_V 1 4 78778 30 4577 1 24 OP idv Tree age ID_V 1 0 939009E 04 0 597354E 03 1 41 OP idv spl age 7 Tree ID_V 1 1 11619 7 10070 1 44 OP Wald F statistics Source of Variation NumDF DenDF F ine P ine 9 mu 1 4 0 47 05 0 002 3 age i 4 0 95 00 lt 001 217 Predicted values of circ
329. lower than the PCG method ASReml prints its standard reports as if it had completed the iteration normally but since it has not completed it some of the information printed will be incorrect In particular variance information on the vari ance parameters will always be unavailable Standard errors on the es timates will be wrong unless n 3 Residuals are not available if n 1 Use of n 3 or n 2 will halve the processing time when compared to the alternative of using MAXIT 1 rather than a tt BLUP n qualifier However MAXIT 1 does result in complete and correct output sets the number of equations solved densely up to a maximum of 5000 By default sparse matrix methods are applied to the random effects and any fixed effects listed after random factors or whose equation numbers exceed 800 Use DENSE nto apply sparse methods to effects listed before the r reducing the size of the DENSE block or if you have large fixed model terms and want Wald F statistics calculated for them Individual model terms will not be split so that only part is in the dense section n should be kept small lt 100 for faster processing alters the error degrees of freedom from vy to v n This qualifier might be used when analysing pre adjusted data to reduce the degrees of freedom n negative or when weights are used in lieu of actual data records to supply error information n positive The degrees of freedom is only used in the calculation of the residual
330. ls column 3 and the diagonal elements of the hat matrix This final column can be used in tests involving the residuals see Section 2 4 under Diagnostics Record Yhat Residual Hat 1 30 442 1 192 13 01 2 27 955 3 595 13 01 3 32 380 2 670 13 01 4 23 092 7 008 13 01 5 31 317 1 733 1301 6 29 267 0 9829 13 01 T 26 155 9 045 13 01 8 24 567 6 167 13 01 9 23 530 0 8204 13 01 222 16 673 9 877 13 01 223 24 548 1 052 13 01 224 23 786 3 114 13 01 3 7 Tabulation predicted values and functions of the vari ance components It may take several runs of ASReml to determine an appropriate model for the data that is the fixed and random effects that are important During this process you may wish to explore the data by simple tabulation Having identified an appropriate model you may then wish to form predicted values or functions of the variance components The facilities in ASReml to form predicted values and functions of the variance components are described in Chapters 9 and 12 respectively Our example only includes tabulation and prediction The statement tabulate yield variety in nin89 as results in nin89 tab as follows NIN alliance trial 1989 11 Jul 2005 13355221 Simple tabulation of yield variety LANCER 28 56 BRULE 26 07 REDLAND 30 50 CODY 21 21 ARAPAHOE 29 44 NE83404 27 39 NE83406 24 28 37 3 7 Tabulation predicted values and functions of the variance components NE83407 22 69 CENTURA 21 65 SCOUT66 2
331. lue to highlight terms associated with A and B respectively in cov ab ab if Ay Aj Ais B B var A Ax A and var B os then Az Azo A33 on ae COV ab aD Ags x Biv 2 1 The general linear mixed model 2 1 10 Direct product structures Mathematically the result 2 9 is known as a direct product structure and is written in full as var ab A8 B A B ApB A B A B Structures associated with direct product construction are known as separable variance struc tures and we call the assumption that a separable variance structure is plausible the assump tion of separability 2 1 11 Direct products in R structures Separable structures occur naturally in many practical situations Consider a vector of common errors associated with an experiment The usual least squares assumption and the default in ASReml is that these are independently and identically distributed IID However if e was from a field experiment laid out in a rectangular array of r rows by c columns we could arrange the residuals as a matrix and might consider that they were autocorrelated within rows and columns Writing the residuals as a vector in field order that is by sorting the residuals rows within columns plots within blocks the variance of the residuals might then be a2 Dp Ur pr where amp p and amp p are correlation matrices for the row model order r autocorrelation parameter p and column model o
332. ly referenced in the classify and average sets For example GROUP Year YearLoc 1 112233344 forms a new factor Year with 4 levels from the existing factor YearLoc with 10 levels The prediction must be in terms of YearLoc not Year even if YearLoc does not formally ap pear in the model For default averaging in prediction the weights for the levels of the grouped factor Year will be in this example 0 3 0 2 0 3 0 2 derived from the weights for the base factor YearLoc Use AVE YearLoc 2 2 2 3 3 2 2 2 3 3 24 to produce equal weighting of Year effects If G sets of variables are included in the classify set only the first variable is reported in labelling the predict values except that for G MM sets the marker position is reported Having identified the explanatory variables in the classify set the second step is to check the averaging set The default averaging set is those explanatory variables involved in fixed effect model terms that are not in the classify set By default variables that are not in any ASSOCIATE list and that only define random model terms are ignored Use the AVERAGE ASSOCIATE or PRESENT qualifiers to force variables into the averaging set The third step is to check the linear model terms to use in prediction The default is that all model terms based entirely on variables in the classifying and averaging sets are used Two qualifiers allow this default to be modified by adding USE or removing IGN
333. lysis of a field experiment There has been a large amount of interest in developing techniques for the analysis of spatial data both in the context of field experiments and geostatistical data see for example Cullis and Gleeson 1991 Cressie 1991 Gilmour et al 1997 This example illustrates the analysis of so called regular spatial data in which the data is observed on a lattice or regular grid This is typical of most small plot designed field experiments Spatial data is often irregularly spaced either by design or because of the observational nature of the study The techniques we present in the following can be extended for the analysis of irregularly spaced spatial data though larger spatial data sets may be computationally challenging depending on the degree of irregularity or models fitted The data we consider is taken from Gilmour et al 1995 and involves a field experiment designed to compare the performance of 25 varieties of barley The experiment was conducted at Slate Hall Farm UK in 1976 and was designed as a balanced lattice square with replicates laid out as shown in Table 15 6 The data fields were Rep RowBlk ColBlk row column and yield Lattice row and column numbering is typically within replicates and so the terms specified in the linear model to account for the lattice row and lattice column effects would be Rep latticerow Rep latticecolumn However in this example lattice rows and columns are both numbered
334. m P grp 49 sex brr 4 litter 4871 age wwt IMO MO identifies missing values ywt MO gfw 1MO fdm MO fat MO pcoop fmt read pedigree from first three fields PATH 1 pcoop fmt PATH 2 pcoop fmt CONTINUE pcoopfi rsv MAXI 40 PATH 3 pcoop fmt CONTINUE pcoopf2 rsv MAXI 40 PATH 4 pcoop fmt CONTINUE pcoopf2 rsv MAXI 40 PATH 5 pcoop fmt CONTINUE pcoopf4 rsv MAXI 40 IPART O ISUBSET TrDam12 Trait 12000 ISUBSET TrLit1234 Trait 123 40 SUBSET TrAGi1245 Trait 1245 ISUBSET Tr8Gl2a Trait 123 0 0 ISUBSET IrDal2s Trait 1 23 0 0 USING ASSIGN TO MAKE SPECIFICATION CLEARER ASSIGN TDIAGI INIT 2 3759 6 2256 0 60075E 01 0 63086 0 13069 GP ASSIGN DDIAGI INIT 2 1584 2 3048 GP ASSIGN LDIAGI INIT 3 55265 2 55777 0 191238E 01 0 897272 IGP ASSIGN RUSI I lt INIT 13 390 9 0747 17 798 0 31961 0 87272 0 13452 0 71374 1 4028 0 23141 4 0677 0 72812 2 0831 0 75977E 01 0 25782 1 5337 GP gt ASSIGN VARF lt diag TrAG1245 INIT 0 0024 0 0019 0 0020 0 00026 age grp diag TrSG123 INIT 0 93 16 0 0 28 sex grp 1 gt PATH 1 wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF diag Trait TDIAGI nrm tag diag TrDam12 DDIAGI nrm dam diag TrLit1234 LDIAGI id lit lf Trait grp residual id units us Trait RUSI 328 15 10 Multivariate animal genetics data Sheep PATH 2 wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF xfal T
335. m http www vsni co uk products asreml as well as in the examples directory created under the standard installation They remain the property of the authors or of the origi nal source but may be freely distributed provided the source is acknowledged The authors would appreciate feedback and suggestions for improvements to the program and this guide Proceeds from the licensing of ASReml are used to support continued development to im plement new developments in the application of linear mixed models The developmental version is available to supported licensees via a website upon request to VSN Most users will not need to access the developmental version unless they are actively involved in testing a new development Acknowledgements We gratefully acknowledge the Grains Research and Development Corporation of Australia for their financial support for our research since 1988 Brian Cullis and Arthur Gilmour wish to thank the NSW Department of Primary Industries and more recently the University of Wollongong for providing a stimulating and exciting environment for applied biometrical re search and consulting Rothamsted Research receives grant aided support from the Biotech nology and Biological Sciences Research Council of the United Kingdom We sincerely thank Ari Verbyla Dave Butler and Alison Smith the other members of the ASReml team Ari contributed the cubic smoothing splines technology information for the Marker map impu tation
336. mand file it substitutes the nth argument string n may take the values 1 9 to indicate up to 9 strings after the command file name If the argument has 1 character a trailing blank is attached to the character and inserted into the command file If no argu ment exists a zero is inserted For example asreml rat as alpha beta tells ASReml to process the job in rat as as if it read alpha wherever 1 appears in the command file beta wherever 2 appears and 0 wherever 3 appears Table 10 2 The use of arguments in ASReml in command file on command line becomes in ASReml run abc 1def no argument abcO def abc 1def with argument X abcX def abc 1def with argument XY abcXYdef abc 1def with argument XYZ abcXYZdef abc 1 def with argument XX abcXX def abc 1 def with argument XXX abcXXX def abc 1 def with argument XXX abcXXX def multiple spaces 10 4 2 Prompting for input Another way to gain some interactive control of a job in the PC environment is to insert text in the as file where you want to specify the rest of the line at run time ASReml prompts with text and waits for a response which is used to compete the line The qualifier may be used anywhere in the job and the line is modified from that point Warning Unfortunately the prompt may not appear on the top screen under some windows operating systems in which case it may not be obvious that ASReml is waiting for a keyboard response 200 10 4 Advanced p
337. matrices for sires dams and litters respectively The variance matrix for dams does not involve fibre diameter and fat depth while the variance matrix for litters does not involve fat depth The effects in each of the above vectors are ordered levels within traits Lastly we assume that the residual variance matrix is given by Ue Q I 7043 Table 15 15 presents the sequence variance models fitted to each of the four random terms sire dam litter and error in the ASReml job IRENAME 1 ARG 1 CHANGE 1 TO 2 OR 3 FOR OTHER PATHS Multivariate Sire amp Dam DOPATH 1 tag sire 92 II dam 3561 I grp 49 sex brr 4 litter 4871 age wwt IMO MO identifies missing values ywt MO gfw 1MO fdm M fat MO PATH 1 coop fmt PATH 2 coop fmt CONTINUE coopmfi rsv uses initial values from previous rsv file 319 15 10 Multivariate animal genetics data Sheep PATH 3 coop fmt CONTINUE coopmf2 rsv PATH O SETTING UP TRAIT COMBINATIONS FOR DIFFERENT MODEL TERMS SUBSET TrDam123 Trait 12300 SUBSET TrLit1234 Trait 123 40 SUBSET TrAG1245 Trait 1245 ISUBSET TrSG123 Trait 12300 USING ASSIGN TO MAKE SPECIFICATION CLEARER ASSIGN SIRE DAM LITTER AND RESIDUAL INITIAL VALUES FROM UNIVARIATE ANALYSES ASSIGN SDIAGI INIT 0 608 1 298 0 015 0 197 0 035 Initial sire variances ASSIGN DDIAGI INIT 2 2 4 14 0 018 ASSIGN LDIAGI INIT 3 74 0 97 0 019 0 941 ASSIGN RUSI lt INIT 9 27 0 0 16 48 0 0 0 0 0 14 0 0 0 0 0 0 3 37
338. ml to modify either the reading of the data and or the output produced see Table 5 2 below for a list of data file related qualifiers the operation of ASReml see Tables 5 3 to 5 6 for a list of job control qualifiers e the data file related qualifiers must appear on the data file line e the job control qualifiers may appear on the data file line or on following lines e the arguments to qualifiers are represented by the following symbols f a filename n an integer number typically a count 61 5 7 Data file qualifiers p a vector of real numbers typically in increasing order r a real number s a character string t a model term label v the number or label of a data variable vlist a list of variable labels 5 7 Data file qualifiers Table 5 2 lists the qualifiers relating to data input Use the Index to check for examples or further discussion of these qualifiers Table 5 2 Qualifiers relating to data input and output qualifier action Frequently used data file qualifiers ISKIP n causes the first n records of the non binary data file to be ignored Typically these lines contain column headings for the data fields Other data file qualifiers COLUMNFACTOR v is used in combination with ROWFACTOR and SECTION to get ASReml ICOLFAC v to insert extra data records to complete the grid of plots defined by the RowFactor and the ColumnFactor for each Section so that a
339. model Ir nrmv Animal INIT 0 25 random model residual idv units 8 6 The pedigree file is specified after all field definitions and before the datafile definition See below for the first 20 lines of harvey ped together with the cor responding lines of the data file harvey dat All individuals appearing in the data file must appear in the pedigree file When all the pedi gree information individual male_parent fe male_parent appears as the first three fields of the data file the data file can double as the pedigree file In this example the line harvey ped ALPHA could be replaced with harvey dat ALPHA Often the pedigree file will include individuals for which there is no data individuals that define genetic links between individuals with data The nrm in nrmv Animal indicates that an additive or numerator relationship matrix nrm variance structure is constructed from the pedigree associated with Animal The v in nrmv indicates that the nrm matrix is scaled by a variance parameter 8 6 The pedigree file The pedigree file is used to construct the genetic relationships for fitting a genetic animal model and is required if the P qualifier is associated with a data field The pedigree file e has three fields the identities of an individual and its parents or sire and maternal grand sire if the MGS qualifier is specified Table 8 1 Typically for animals the male parent is listed first but for trees the
340. more the REML estimates are consistent and asymptotically normal though in small samples this approximation appears to be unreliable see later A general method for comparing the fit of nested models fitted by REML is the REML likelihood ratio test or REMLRT The REMLRT is only valid if the fixed effects are the same for both models In ASReml this requires not only the same fixed effects model but also the same parameterisation If Zro is the REML log likelihood of the more general model and p is the REML log likelihood of the restricted model that is the REML log likelihood under the null hypothesis then the REMLRT is given by D 2 log r2 r1 2 log C2 log r1 2 21 which is strictly positive If r is the number of parameters estimated in model 7 then the asymptotic distribution of the REMLRT under the restricted model is x _ The REMLRT is implicitly two sided and must be adjusted when the test involves an hy pothesis with the parameter on the boundary of the parameter space It can be shown that for a single variance component the theoretical asymptotic distribution of the REMLRT is a mixture of x variates where the mixing probabilities are 0 5 one with 0 degrees of free dom spike at 0 and the other with 1 degree of freedom The approximate P value for the REMLRT statistic D is 0 5 1 Pr x7 lt d where d is the observed value of D This has a 5 critical value of 2 71 in contrast to the 3 84
341. mother tree may be first an optional fourth field may supply inbreeding selfing information used if the FGEN qual ifier is specified Table 8 1 e an additional field specifying the sex of the individual is required if the XLINK qualifier is specified Table 8 1 is ordered by generation so that the line giving the pedigree of an individual appears above any line where that individual appears as a parent is read free format it may be the same file as the data file if the data file is free format and has the necessary identities in the first three fields see below is specified on the line immediately after all field definitions and before the data file line in the command file e use 0 or to represent unknown parents 157 8 7 Reading in the pedigree file harvey ped harvey dat 101 SIRE_1 0 101 SIRE_1 0 1 3 192 390 2241 102 SIRE_1 Q 102 SIRE_1 O 1 3 154 403 2651 103 SIRE_1 0 103 SIRE 1 O 1 4 185 432 2411 104 SIRE_1 0 104 SIRE_1 O 1 4 183 457 2251 105 SIRE_1 0 105 SIRE_1 0 1 5 186 483 2581 106 SIRE_1 0 106 SIRE_1 0 1 5 177 469 2671 107 SIRE_1 0 107 SIRE_1 0 1 5 177 428 2711 108 SIRE_1 0 108 SIRE 1 O 1 5 163 439 2471 109 SIRE 2 Q 109 SIRE_2 0 1 4 188 439 2292 110 SIRE_2 0 110 SIRE_2 0 1 4 178 407 2262 111 SIRE 2 Q 111 SIRE_2 0 1 5 198 498 1972 112 SIRE 2 Q 112 SIRE 2 O 1 5 193 459 2142 113 SIRE 2 Q 113 SIRE 2 O 1 5 186 459 2442 114 SIRE_2 0 114 SIRE_2 0 1 5 175 375 2522 115 SIRE_2 0 115 SIRE 2 0 1 5 171 382 1722 1
342. n uni f k n calculates an expected marker state from flanking marker information at position r of the linkage group f see MM to define marker locations r may be specified as TPn where TPn has been previously internally defined with a predict statement see page 182 r should be given in Morgans forms sine from v with period r Omit r if v is radians If v is degrees r is 360 In order to fit spline models associated with a variate v and k knot points in ASReml v is included as a covariate in the model and spl v k as a random term The knot points can be explicitly specified using the SPLINE qualifier Table 5 4 If k is specified but SPLINE is not specified equally spaced points are used If k is not specified and there are less than 50 unique data values they are used as knot points If there are more than 50 unique points then 50 equally spaced points will be used The spline design matrix formed is written to the res file An example of the use of sp1 is price mu week r spl week forms the square root of v r This may also be used to transform the response variable is used with multivariate data to fit the individual trait means It is formally equivalent to mu but Trait is a more natural label for use with multivariate data It is interacted with other factors to estimate their effects for all traits creates a factor with a level for every record in the data file This is used to fit the nugget vari
343. n In genetic analysis using an animal model or sire model we have data on subjects that are genetically related The relationships are defined via a pedigree The subject effects are therefore correlated and assuming normal modes of inheritance the correlation expected from additive effects can be computed from the pedigree provided all the direct links are in the pedigree The matrix of such relationships is called the numerator relationship matrix It is actually the inverse relationship matrix that is required for analysis and that is formed by ASReml Users new to this subject might find notes Mixed Models for Genetic analysis 1 by Julius van der Werf helpful For the more general situation where the pedigree based relationship matrix is not the ap propriate required matrix the user can provide a general relationship matrix GRM matrix explicitly in a grm file or its inverse in a giv file As an example for this chapter we consider data presented in Harvey 1977 using the command file harvey as 8 5 The command file Pedigree file example In ASReml the P data field qualifier indicates i nimal P that the corresponding data field has an asso Sire A ciated pedigree The file containing the pedi Dam gree harvey ped in the example for animal Line 2 Age0fDam lhttp www vsni co uk products asreml user genetipanalysis pdf Y2 Y3 156 harvey ped ALPHA harvey dat adailygain mu Line fixed
344. n adjusted variance matrix of 7 They argued that it is useful to consider an improved estimator of the variance matrix of 7 which has less bias and accounts for the variability in estimation of the variance parameters There are two reasons for this Firstly the small sample distribution of Wald F statistics is simplified when the adjusted variance matrix is used Secondly if measures of precision are required for 7 or effects therein those obtained from the adjusted variance matrix will generally be preferred Unfortunately the Wald statistics are currently computed using an unadjusted variance matrix 2 5 4 Approximate stratum variances ASReml reports approximate stratum variances and degrees of freedom for simple variance components models For the linear mixed effects model with variance components setting o 1 where G SH b t is often possible to consider a natural ordering of the variance component parameters including o Based on an idea due to Thompson 1980 ASReml computes approximate stratum degrees of freedom and stratum variances by a mod ified Cholesky diagonalisation of the average information matrix That is if F is the average information matrix for let U be an upper triangular matrix such that F U U We define U DU where D is a diagonal matrix whose elements are given by the inverse elements of the last column of U ie deai 1 uir i 1 r The matrix U is therefore upper triangular with the elements i
345. n name and other fields are also duplicated IMERGE filel KEY keya keyb WITH file2 TO newfile CHECK IMERGE filel Key key KEEP WITH file2 to newfile will discard records from file2 that do not match records in filel but all records in file1 are retained Omitting fields from the merged file IMERGE filel KEY key skip sla slib WITH file SKIP s2as2b TO newfile Single insertion merging IMERGE adult txt KEY ewe KEEP WITH birth txt KEEP TO newfile NODUP bwt 209 12 Functions of variance components 12 1 Introduction ASReml includes a procedure to calculate cer aa A y mu Ir idv Sire tain functions of variance components either besidual idv units as a final stage of an analysis or as a post VPREDICT DEFINE analysis procedure These functions enable F phenvar idv Sire idv units the calculation of heritabilities and correla F genvar idv Sire 4 tions from simple variance components and berit genvar phenvar when US CORUH and XFA structures are used in the model fitting A simple example is shown in the code box The instructions to perform the required operations are listed after the VPREDICT DEFINE line and terminated by a blank line ASReml holds the instructions in a pin until the end of the job when it retrieves the relevant information from the asr and vvp files and performs the specified operations The results are reported in the pvc file In Section 12 2 the syntax
346. n out of space to code records to sort them Error setting constraints VCC on variance components Error setting dependent variable Error setting MBF design matrix MBF mbf x k filename Error sorting X Y values Error structures are wrong size Error when reading knot point values Failed forming R G scores Failed ordering Level labels Failed to find Failed to open INCLUDE Failed to parse R G structure line Failed to read R G structure line Failed to process MYOWNGDG files Failed when sorting pedigree Failed when processing pedigree file the data file could not be interpreted alphanumeric fields need the A qualifier data file name may be wrong the model specification line is in error a variable is probably misnamed Declare the levels in the ROWFACTOR COLUMNFACTOR and SECTION variables more accurately The VCC constraints are specified last of all and require know ing the position of each parameter in the parameter vector the specified dependent variable name is not recognised It is likely that the covariate values do not match the values supplied in the file The values in the file should be in sorted order ROWFAC and COLFAC and SECTION as well as factors defining a residual structure must uniquely define grid points in the spatial array the declared size of the error structures does not match the actual number of data records There is some pro
347. n qualifiers is in Table 7 4 7 7 1 Parameter equality constraints s Parameters in a variance model can be set to be equal using the s qualifier Table 7 4 where s is a string of letters and or zeros Positions in the string correspond to the position of the parameters in the list of parameters for the particular variance model 124 7 7 Variance model function qualifiers Table 7 4 Variance model function qualifiers available in ASReml status qualifier description existing s s is a list of codes that link parameters sharing a common value details in Section 7 8 2 New R4 COORD v provides coordinates for mapping the effects so that a spatial model can be applied to the effects It is needed when the coordinates are not in the data file for example exp Trait COORD 1 2 5 3 5 5 8 see Section 7 7 2 existing Fi is used with the own variance model function see Section 7 7 3 The argument i is passed to your own program existing Gs s is a list of codes F Z P or U one for each parameter specifying whether the parameter is to be Fixed at it s initial value held at Zero if legal kept in the Parameter space or is Unrestricted see Section 7 7 4 New R4 INIT v v is the list of initial values for the variance structure parameters If initial values can be obtained from the msv rsv or tsv file they override these values see Section 7 7 5 existing SUBSECTION f f is a factor in the data that breaks th
348. n the last column equal to one If the vector is ordered in the natural way with o being the last element then we can define the vector of so called pseudo stratum variance components by gE U o Thence write D The diagonal elements can be manipulated to produce effective stratum degrees of freedom Thompson 1980 viz i 2 7 k deii In this way the closeness to an orthogonal block structure can be assessed 23 3 A guided tour 3 1 Introduction This chapter presents a guided tour of ASReml from data file preparation and basic aspects of the ASReml command file to running an ASReml job and interpreting the output files You are encouraged to read this chapter before moving to the later chapters e areal data example is used in this chapter for demonstration see below e the same data are also used in later chapters e links to the formal discussion of topics are clearly signposted by margin notes This example is of a randomised block analysis of a field trial and is only one of many forms of analysis that ASReml can perform It is chosen because it allows an introduction to the main ideas involved in running ASReml However some aspects of ASReml in particular pedigree files see Chapter 8 3 1 and multivariate analysis see Chapter 8 are only covered in later chapters ASReml is essentially a batch program with some optional interactive features The typical sequence of operations when using ASReml is e
349. n two terms as in a diallel analysis male and female assuming the ith male is the same individual as the ith female at f n defines a binary variable which is 1 if the factor f has level n for the record For f n example to fit a row factor only for site 3 use the expression at site 3 row The string is equivalent to at for this function at f at f is expanded to a series of terms like at f i where i takes the values 01 Qf to the number of levels of factor f Since this command is interpreted before the data is read it is necessary to declare the number of levels of f correctly in its field definition This extended form may only be used as the first term in an interaction at f i 7 k is expanded to a series of terms at f i at f j at f k Sim at f m n ilarly at f i X at f j X at f k X can be written as at f i j7 k X pro Q f m n vided at f i j k is written as the first component of the interaction Any number of levels may be listed Contiguous sets of values can be specified as 7 7 cos v 7r forms cosine from v with period r Omit r if v is radians If v is degrees r is 360 con f apply sum to zero constraints to factor f It is not appropriate for random factors c f and fixed factors with missing cells ASReml assumes you specify the correct number of levels for each factor The formal effect of the con function is to form a model term with the highest level formally equal to minus the sum of the precedin
350. n which case ASReml automatically uses the gamma parame terization for estimation see Section 7 6 Consequently both the sigmas and the gammas are reported The user can force ASReml to use the sigma parameterization by placing SIGMAP immediately after the independent variable and before on the model definition line yield SIGMAP mu variety mv SIGMAP is a new qualifier with Release 4 see also Section 7 6 In this case only the sigmas are reported but they appear twice in the output that is in both of the columns headed sigma in the asr file see Chapter 11 of the User Guide for detailed information on output formats in ASReml 3b Two dimensional separable autoregressive spatial model This model extends 3a by specifying a first order au NIN Alliance Trial 1989 toregressive correlation structure for columns The R variety A structure in this case is the kronecker product of two id autoregressive correlation matrices that is var e o2 B p U p giving an AR1xAR1 model for row 22 error The consolidated model term in this case is Colum 11 ariv column ari row and includes ariv column Ee a i to model the o2 Xe pe variance structure for columns a dels at eae i residual ariv column ar1 row Important points e the same residual variance structure could be achieved by specifying ari column ariv row which mirrors the alternate but equivalent algebraic form var e B pe 92 Ur pr
351. natory variable which is a factor and appears in the model only in terms that are fitted as random Covariates generally appear in fixed terms but may appear in random terms as well random regression In special cases they may appear only in random terms Random factors may contribute to predictions in several ways They may be evaluated at levels specified by the user they may be averaged over or they may be ignored omitting all model terms that involve the factor from the prediction Averaging over the set of random effects gives a prediction specific to the random effects observed We call this a conditional prediction Omitting the term from the prediction model produces a prediction at the population average often zero that is substituting the assumed population mean for an predicted random effect We call this a marginal prediction Note that in any prediction some random factors for example Genotype may be evaluated as conditional and others for example Blocks at marginal values depending on the aim of prediction For fixed factors there is no pre defined population average so there is no natural interpre tation for a prediction derived by omitting a fixed term from the fitted values Therefore any prediction will be either for specific levels of the fixed factor or averaging in some way over the levels of the fixed factor The prediction will therefore involve all fixed model terms Covariates must be predicted a
352. nd Hall Harvey W R 1977 Users guide to LSML76 The Ohio State University Columbus Harville D A 1997 Matrix Algebra from a statisticians perspective Springer Verlaag Harville D and Mee R 1984 A mixed model procedure for analysing ordered categorical data Biometrics 40 393 408 Haskard K A 2006 Anisotropic Mat rn correlation and other issues in model based geostatistics PhD thesis BiometricsSA University of Adelaide Kammann E E and Wand M P 2003 Geoadditive models Applied Statistics 52 1 1 18 Keen A 1994 Procedure IRREML GLW DLO Procedure Library Manual Agricultural Mathematics Group Wageningen The Netherlands pp Report LWA 94 16 Kenward M G and Roger J H 1997 The precision of fixed effects estimates from restricted maximum likelihood Biometrics 53 983 997 Lane P W and Nelder J A 1982 Analysis of covariance and standardisation as instances of predicton Biometrics 38 613 621 McCulloch C and Searle S R 2001 Generalized Linear and Mixed Models Wiley 334 BIBLIOGRAPHY Meuwissen and Lou 1992 Forming iniverse nrm Genetics Selection and Evolution 24 305 313 Millar R and Willis T 1999 Estimating the relative density of snapper in and around a marine reserve using a log linear mixed effects model Australian and New Zealand Journal of Statistics 41 383 394 Nelder J A 1994 The statistics of linear models back to
353. nd R variance structures and the individual variance structure parameters in o and will be referred to as sigmas The variance models given by G and R are referred to as G structures and R structures respectively We illustrate these concepts using the simplest linear mixed model that is the one way classification Example 2 1 A simple example Consider a one way classification comprising a single ran dom effect u and a residual error term e The two random components of this model 5 2 1 The general linear mixed model namely u and e are each assumed to be independent and identically distributed IID and to follow a normal distribution such that u N 0 02I and e N 0 02J Hence the variance of y has the form var y o2ZZ 071 2 4 This model has two variance structure parameters or sigmas the variance component g associated with u and the variance component g associated with e Mapping this equation back to 2 3 we have o 02 G o 02Iy 0 02 and R a 02In 2 1 2 Partitioning the fixed and random model terms Typically 7 and u are composed of several model terms that is 7 can be partitioned as T T 7 and u can be partitioned as u u ul with X and Z partitioned conformably as X X X and Z Z Zol 1 2 1 3 G structure for the random model terms T For u partitioned as u u w we impose a direct sum structure on the matrix G 1 b Fi written G 0
354. nd varieties within runs defines a nested block structure of the form run variety tmt run run variety run variety tmt run pair pair tmt run run variety units There is an additional blocking term however due to the fact that the bloodworms within a run are derived from the same batch of larvae whereas between runs the bloodworms come from different sources This defines a block structure of the form run tmt variety run run tmt run tmt variety run run tmt pair tmt Combining the two provides the full block structure for the design namely run run variety run tmt run tmt variety run run variety run tmt units run pair run tmt pair tmt In line with the aims of the experiment the treatment structure comprises variety and treat ment main effects and treatment by variety interactions In the traditional approach the terms in the block structure are regarded as random and the treatment terms as fixed The choice of treatment terms as fixed or random depends largely on the aims of the experi ment The aim of this example is to select the best varieties The definition of best is somewhat more complex since it does not involve the single trait sqrt rootwt but rather two traits namely sqrt rootwt in the presence absence of bloodworms Thus to minimise selection bias the variety main effects and thence the tmt variety interactions are taken as random The main effect of treatment
355. ned by a mean variance function and a link function In this context y is the observation n is the count for grouped data specified by the TOTAL qualifier is a parameter set with the PHI qualifier u is the mean on the data scale calculated using the inverse link function from the predicted value 7 on the underlying scale where n XT v is the variance under some distributional assumption calculated as a function of u and n and d is the deviance twice the log likelihood for that distribution 101 6 8 Generalized Linear Mixed Models Table 6 3 Link qualifiers and functions Qualifier Link Inverse Link Available with IDENTITY n n All SQRT n yH u Poisson Normal Poisson Negative Bino LOGARITHM n In p u exp n mial Gamma INVERSE n 1 p w 1 n Normal Gamma Negative Binomial LOGIT n p 1 p H FERD Binomial Multinomial Threshold PROBIT n 2 p u O n Binomial Multinomial Threshold COMPLOGLOG n In In l yw pw l e Binomial Multinomial Threshold where p is the mean on the data scale and 7 XT is the linear predictor on the underlying scale GLMs are specified by qualifiers after the name of the dependent variable but before the character Table 6 3 lists the link function qualifiers which relate the linear predictor 7 scale to the observation u E y scale Table 6 4 lists the distribution and other qualifiers Table 6 4 GLM distribution qualifiers The default link is
356. nent That is the variance model for the plot errors is now given by PE X Dr YI 15 6 where 7 is the ratio of nugget variance to error variance o The abbreviated output for this model is given below There is a significant improvement in the REML log likelihood with the inclusion of the nugget effect see Table 15 7 ARI x AR1 1 LogL 739 681 S2 36034 125 df 1 000 0 1000 0 1000 2 LogL 714 340 S2 28109 125 df 1 000 0 4049 0 1870 3 LogL 703 338 S2 29914 125 df 1 000 0 5737 0 3122 4 LogL 700 371 S2 37464 125 df 1 000 0 6789 0 4320 5 LogL 700 324 S2 38602 125 df 1 000 0 6838 0 4542 6 LogL 700 322 S2 38735 125 df 1 000 0 6838 0 4579 7 LogL 700 322 S2 38754 125 df 1 000 0 6838 0 4585 8 LogL 700 322 S2 38757 125 df 1 000 0 6838 0 4586 Final parameter values 1 0000 0 68377 0 45861 Results from analysis of yield Akaike Information Criterion 1406 64 assuming 3 parameters Bayesian Information Criterion 1415 13 288 15 6 Spatial analysis of a field experiment Barley Slate Hall exam Variogram o Paige Ple 06 aug 2002 17 08 51 Outer displacement Inner displacement Figure 15 5 Sample variogram of the residuals from the AR1xAR1 model Model_Term Gamma Sigma Sigma SE C ari column ar1 row 150 effects Residual SCA_V 150 1 000000 38754 3 5 00 0 P column AR_R 1 0 683769 0 683769 10 80 OP row AR_R 1 0 458594 0 458594 5 59 0 P Wald F statistics Source of Variati
357. nents put them all in one list on one line where the relationship applies among simple model terms those without an explicit variance structure for example units the model term name may be given rather than the parameter number These examples are summarized in the following table ASReml code action 57 1 parameter 7 equals parameter 5 of simple coding for 5 7 1 57 Ji parameter 7 is a tenth of parameter 5 Da parameter 7 is the negative of parameter 5 22 24 25 BY 309 29 for a 4 x 4 US matrix given by parameters 31 40 the covari ances parameters 32 39 are forced to be equal 21 29 BLOCKSIZE 8 equates parameters 29 with 21 30 with 22 36 with 28 units uni check parameter associated with model term uni check has the same magnitude but opposite sign to the parameter associated with model term units 7 8 2 Fitting linear relationships among variance structure parameters The user may wish to define relationships between particular variance parameters For example consider an experiment in which two or more separate trials are sown adjacent to one another at the same trial site with trials sharing a common plot boundary In this case it might be sensible to fit the same spatial parameters and error variances for each trial In other situations it can be sensible to define the same variance structure over several model terms ASReml 3 catered for equality and multiplicative relationships among
358. nerated by the leg pol and spl functions are modified to include extra rows that are accessed by the PREDICT directive The default value of n is 21 if there is no PPOINTS qualifier The range of the data is divided by n 1 to give a step size i For each point p in the list a predict point is inserted at p iif there is no data value in the interval p p 1 1x 4i PPOINTS is ignored if PVAL is specified for the variable This process also effects the number of levels identified by the fac model term forces ASReml to attempt to produce the standard output report when there is a failure of the iteration algorithm Usually no report is produced unless the algorithm has at least produced estimates for the fixed and random effects in the model Note that residuals are not included in the output forced by this qualifier This option is primarily intended to help debugging a job that is not converging properly When forming a design matrix for the sp1 Q model term ASReml uses a standardized scale independent of the actual scale of the variable The qualifier SCALE 1 forces ASReml to use the scale of the variable The default standardised scale is appropriate in most circumstances requests ASReml write the SCORE vector and the Average Information matrix to files basename SCO and basename AIM The values written are from the last iteration 84 5 8 Job control qualifiers Table 5 6 List of very rarely used job control qualifiers
359. nes Names found in the data that are not included are simply appended to the list of levels as they are discovered by ASReml An example of this would be for a genotype factor with 6 levels appearing in the data file in the order genb6 genai gena5 genb2 genb4 gena3 In this case Genotype A L genal genb2 gena3 genb4 would result in the levels of Genotype being ordered genal genb2 gena3 genb4 genb6 genad I n is required if the data is numeric defining a factor but not 1 n I must be followed by n if more than 1000 codes are present Year LI 1995 1996 AS p is required if the data field has level names in common with a previous A or I factor p and is to be coded identically for example in a plant diallel experiment Male A 22 Female AS Male integrated coding IP indicates the special case of a pedigree factor ASReml will determine whether the identifiers are integer or alphanumeric from the pedigree file qualifiers and set the levels after reading the pedigree file see Section 8 6 Animal P coded according to pedigree file A warning is printed if the nominated value for n does not agree with the actual number of levels found in the data if the nominated value is too small the correct value is used for a group of m variates or factor variables 48 5 4 Specifying and reading the data G m 1 is used when m contiguous data fields comprise a set to be used together The variables will be treated as factor va
360. ng like In n 103 6 8 Generalized Linear Mixed Models Table 6 4 GLM distribution qualifiers qualifier action ITOTAL n is used especially with binomial and ordinal data where n is the field containing the total counts for each sample If omitted count is taken as 1 Residual qualifiers control the form of the residuals returned in the yht file The predicted values returned in the yht file will be on the linear predictor scale if the WORK or PVW qualifiers are used They will be on the observation scale if the DEVIANCE PEARSON RESPONSE or PVR qualifiers are used DEVIANCE produces deviance residuals the signed square root of d h from Table 6 4 where h is the dispersion parameter controlled by the DISP qualifier This is the default PEARSON writes Pearson residuals a in the yht file PVR writes fitted values on the response scale in the yht file This is the default PVW writes fitted values on the linear predictor scale in the yht file RESPONSE produces simple residuals y u WORK produces residuals on the linear predictor scale qa A second dependent variable may be specified except with a multinomial response MULTINOMIAL if a bivariate analysis is required but it will always be treated as a normal variate no syntax is provided for specifying GLM attributes for it The ASUV qualifier is required in this situation for the GLM weights to be utilized ASReml internally calcul
361. ng row column optional field labels LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 data for sampling unit 1 BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 data for sampling unit 2 REDLAND 3 1103 791 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602 1 4 30 1 4 3 22 8 19 ARAPAHOE 5 1105 661 1 4 33 05 4 3 24 20 NE83404 6 1106 605 1 4 30 25 4 3 2 21 1 NE83406 7 1107 704 1 4 35 2 4 3 26 4 22 1 NE83407 8 1108 388 1 4 19 4 8 6 1 2 1 2 CENTURA 9 1109 487 1 4 24 35 8 6 2 4 2 2 1 2 i Se w w 2 Gs SCOUT66 10 1110 511 1 4 25 55 8 6 3 6 3 2 COLT 11 1111 502 1 4 25 1 8 6 4 8 4 2 NE83498 12 1112 492 1 4 24 6 8 6 6 5 2 NE84557 13 1113 509 1 4 25 45 8 6 7 2 6 2 NE83432 14 1114 268 1 4 13 4 8 6 8 4 7 2 NE85556 15 1115 633 1 4 31 65 8 6 9 6 8 2 NE85623 16 1116 513 1 4 25 65 8 6 10 8 9 2 CENTURAK78 17 1117 632 1 4 31 6 8 6 12 10 2 NORKAN 18 1118 446 1 4 22 3 8 6 13 2 11 2 KS831374 19 1119 684 1 4 34 2 8 6 14 4 12 2 27 3 3 The ASReml data file These data are analysed again in Chapter 7 using spatial methods of analysis see model 3a in Section 7 5 For spatial analysis using a separable error structure see Chapter 2 the data file must first be augmented to specify the complete 22 row x 11 column array of plots These are the first 20 lines of the augmented data file nin89aug asd with 242 data rows Note that ASReml 4 can automatically augment spatial data see ROWFACTOR COLUMNF ACTOR variety id pid raw repl nloc yield lat long row col
362. ning them see Table 5 1 comma delimited files whose file name ends in csv or for which the CSV qualifier is set recognise empty fields as missing values a line beginning with a comma implies a preceding missing value consecutive commas imply a missing value a line ending with a comma implies a trailing missing value if the filename does not end in csv and the CSV qualifier is not set commas are treated as white space e TAB delimited files recognise empty fields as missing values e characters following on a line are ignored so this character may not be used except to flag trailing comments on the ends of lines or to comment out data records unless SPECIALCHAR is specified see Section 5 4 2 adjacent lines can be concatenated and written on one line using For example line_1 line_2 line_n 41 4 2 The data file can be written on one line as line 1 line 2 linen This can aid legibility of the input file Note that everything including after the first on a line is intepreted as a comment blank spaces tabs and commas must not be used embedded in alphanumeric fields unless the label is enclosed in quotes for example the name Willow Creek would need to be appear in the data file as Willow Creek to avoid an error the symbol must not be used in the data file alphanumeric factor level labels have a default size of 16 characters Use the LL size qualifier to extend
363. nstant 1 F 1 age age F 1 spl age 7 R 5 fac age R 7 tree tree RC 5 age tree x tree RC 5 spl age 7 tree R 25 error R slope for each tree are included as random coefficients denoted by RC in Table 15 11 Thus if U is the matrix of intercepts column 1 and slopes column 2 for each tree then we assume that var vec U X amp I where X is a 2 x 2 symmetric positive definite matrix Non smooth variation can be mod elled at the overall mean across trees level and this is achieved in ASReml by inclusion of fac age as a random term 312 15 9 Balanced longitudinal data Random coefficients and cubic smoothing splines Oranges Table 15 12 Sequence of models fitted to the Orange data model term 1 2 3 4 5 6 tree y y y y y y age tree y y y y y b4 covariance n n n n n y spl age 7 y y y y n tree spl age 7 y y y n y y fac age n y y n n n season n n y y y y REML log likelihood 97 78 94 07 87 95 91 22 90 18 87 43 An extract of the ASReml input file is circ mu age r str Tree age Tree us 2 INIT 4 6 00001 000094 id Tree idv spl age 7 idv spl age 7 Tree idv fac age predict age Tree IGNORE fac age We stress the importance of model building in these settings where we generally commence with relatively simple variance models and update to more complex variance models if ap propriate Table 15 12 presents the sequence of fitted models we have used Note that the REML
364. ntal crosses Computational Statistics and Data Analysis 51 3749 3764 333 BIBLIOGRAPHY Gilmour A R Cullis B R and Verbyla A P 1997 Accounting for natural and ex traneous variation in the analysis of field experiments Journal of Agricultural Biological and Environmental Statistics 2 269 273 Gilmour A R Cullis B R Welham S J Gogel B J and Thompson R 2004 An efficient computing strategy for prediction in mixed linear models Computational Statistics and Data Analysis 44 571 586 Gilmour A R Thompson R and Cullis B R 1995 AI an efficient algorithm for REML estimation in linear mixed models Biometrics 51 1440 1450 Gleeson A C and Cullis B R 1987 Residual maximum likelihood REML estimation of a neighbour model for field experiments Biometrics 43 277 288 Gogel B J 1997 Spatial analysis of multi environment variety trials PhD thesis De partment of Statistics University of Adelaide Goldstein H and Rasbash J 1996 Improved approximations for multilevel models with binary response Journal of the Royal Statistical Society A General 159 505 513 Goldstein H Rasbash J Plewis I Draper D Browne W Yang M Woodhouse G and Healy M 1998 A user s guide to MLwiN Institute of Education London URL hitp multilevel ioe ac uk Green P J and Silverman B W 1994 Nonparametric regression and generalized linear models Chapman a
365. o is the additive genetic variance the variance component for dams is denoted by oj 0o 02 where o is the maternal variance component and the variance component for litters is denoted by g and represents variation attributable to the particular mating For a multivariate analysis these variance components for sires dams and litters are in theory replaced by unstructured matrices one for each term Additionally we assume the residuals for each trait may be correlated Thus for this example we would like to fit a total of 4 unstructured variance models For such a situation it is sensible to commence 317 15 10 Multivariate animal genetics data Sheep Table 15 13 REML estimates of a subset of the variance parameters for each trait for the genetic example expressed as a ratio to their asymptotic s e term wwt ywt gfw fdm fat sire 3 68 3 57 3 95 1 92 1 92 dam 6 25 4 93 2 78 0 37 0 05 litter 8 79 0 99 2 23 1 91 0 00 age grp 2 29 1 39 0 31 1 15 1 74 sex grp 2 90 3 43 3 70 1 83 the modelling process with a series of univariate analyses These give starting values for the diagonals of the variance matrices but also indicate what variance components are estimable The ASReml job for the univariate analyses is IRENAME 1 ARG 1 2 3 4 5 Does 5 runs one for each trait Multivariate Sire amp Dam model DOPART 1 IF 1 1 ASSIGN YV wwt sets up dependent variable to each trait in turn IF 1 2 ASSIGN YV
366. o methods In larger analyses users can request the calculation be attempted using the DDF qualifier page 67 Use DDF 1 to prevent the calculation to save processing time when significance testing is not required 108 7 Command file Specifying the variance structures In Chapter 2 we presented the general linear mixed model y XT Zu e where y n x 1 is a vector of observations T p x 1 is a vector of fixed effects X n x p is the design matrix of full column rank that associates observations with the appropriate combination of fixed effects u q x 1 is a vector of random effects Z n x q is the design matrix that associates observations with the appropriate combination of random effects and e n x 1 is the vector of residual errors see model 2 1 Among the key concepts regarding this model are e the sigma parameterization Section 2 1 1 ere sl where the matrices G and R are variance matrices for u and e and are functions of parameters o and o Under this parameterization var y ZG o Z R a e G structures for the random model terms Section 2 1 3 and R structures for the residual error term Section 2 1 5 e direct sum structures for G and or R Re see below Sections 2 1 3 and 2 1 5 e direct product structures for terms composed of several component factors Section 2 1 10 e the gamma parameterization for estimation of variance structure parameters as ratios relative to the residual error v
367. o the number of levels identified in this se quential process see Other exam ples below Missing values remain missing changes the focus of subsequent transformations to variable field v replaces the variate with uniform random variables having range 0 v 57 treat ILCA B CYR treat ISET 1 1 1 group treat ISET 1 2293 4 Anorm A SETN 2 5 10 Aeff A SETU 5 10 year 3 SUB 66 67 68 plot V3 5EQ sqrtaA meanAB A 2 TARGET sqrtA 70 5 Udat 1 0 UNIFORM 4 5 5 5 Transforming the data Table 5 1 List of transformation qualifiers and their actions with examples qualifier argument action examples Vtarget value assigns value to data field tar V3 2 5 get overwriting previous contents subsequent transformation qualifiers will operate on data field target Vfield assigns the contents of data field V10 V3 field to data field target overwriting V1i1 block previous contents subsequent trans V12 VO formation qualifiers will operate on data field target If field is 0 the number of the data record is in serted 5 5 2 QTL marker transformations IMM s associates marker positions in the vector s based on the Haldane mapping function with marker variables and replaces missing values in a vector of marker states with expected values calculated using distances to non missing flanking markers This transformation will normally be used on a G n
368. oat varieties and nitrogen application 268 Rat data AOV decomposition e ac 6 a Rs 273 REML log likelihood ratio for the variance components in the voltage data 278 Summary of variance models fitted to the plant data 280 Summary of Wald F statistics for fixed effects for variance models fitted to iS ts ee a ee ee Pee ead w Be mi e ee 286 Field layout of Slate Hall Farm experiment 287 Summary of models for the Slate Hall data 0 292 Estimated variance components from univariate analyses of bloodworm data a Model with homogeneous variance for all terms and b Model with het erogeneous variance for interactions involving tmt 302 Equivalence of random effects in bivariate and univariate analyses 304 Estimated variance parameters from bivariate analysis of bloodworm data 306 Orange data AOV decomposition 0 00002 eae 312 Sequence of models fitted to the Orange data 0 4 313 REML estimates of a subset of the variance parameters for each trait for the genetic example expressed as a ratio to their asymptotic s e 318 Wald F statistics of the fixed effects for each trait for the genetic example 319 Variance models fitted for each part of the ASReml job in the analysis of the penelit erainple o oos sae oe mi Paaa E aua Re b e oe Ree 321 XIV List 5 1 13 1 13 2 13 3 13 4 13 5 13 1 15 2 15 3 15 4 15 5 15
369. obtained the maximum available workspace then use WORKSPACE to increase it The problem could be with the way the model is specified Try fitting a simpler model or using a reduced data set to discover where the workspace is being used The response variable nominated by the YVAR command line qualifier is not in the data 262 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable cause remedy Invalid binary data Invalid Binomial Variable Invalid definition of factor Invalid error structure for Multivariate Analysis Invalid factor in model Invalid model factor Invalid SOURCE in R structure definition Invalid weight filter column number Iteration aborted because of singularities Iteration failed Matern Maximum number of special structures exceeded Maximum number of variance parameters exceeded Missing faulty SKIP or A needed for Missing values in design variables factors Missing Value Miscount forming design Missing values not allowed here Multiple trait mapping problem The data values are out of the expected range for bi nary binomial data there is a problem with forming one of the generated fac tors The most probable cause is that an interaction cannot be formed You must either use the US error structure or use the ASUV qualifier and maybe include mv in
370. occur in the primary data file and that there are no extraneous lines in the MERGE file A much more powerful merging facility is provided by the MERGE directive described in Chapter 11 For example assuming the field definitions define 10 fields PRIMARY DAT skip 1 IMERGE 6 SECOND DAT SKIP 1 MATCH 1 6 would obtain the first five fields from PRIMARY DAT and the next five from SECOND DAT checking that the first field in each file has the same value Thus each input record is obtained by combining information from each file before any transformations are performed formally instructs ASReml to read n data fields from the data file It is needed when there are extra columns in the data file that must be read but are only required for combination into earlier fields in transfor mations or when ASReml attempts to read more fields than it needs to is required when reading a binary data file with pedigree identifiers that have not been recoded according to the pedigree file It is not needed when the file was formed using the SAVE option but will be needed if formed in some other way see Section 4 2 is used in combination with COLUMNFACTOR and SECTION to get AS Reml to insert extra data records to complete the grid of plots defined by the RowFactor and the ColumnFactor for each Section so that a two dimensional error structure can be defined see SECTION on page 73 causes ASReml to read n records or to read up to a data
371. ode action yield MO yield 70 score 5 score ISET 0 5 1 5 2 5 score SUB 0 5 1 5 2 5 block 8 variety 20 yield plot variety SEQ Var 3 Nit 4 VxN 12 Var 1 4 4Nit YA V98 YA NA O YB V99 YB NA O V98 DO changes the zero entries in yield to missing values takes natural logarithms of the yield data subtracts 5 from all values in score replaces data values of 1 2 and 3 with 0 5 1 5 and 2 5 respectively replaces data values of 0 5 1 5 and 2 5 with 1 2 and 3 respectively a data value of 1 51 would be replaced by 0 since it is not in the list or very close to a number in the list in the case where there are multiple units per plot contiguous plots have different treatments and the records are sorted units within plots within blocks this code generates a plot factor assuming a new plot whenever the code in V2 variety changes whether this creates a variable or overwrites an input variable depends on whether any subsequent variables are input variables assuming Var is coded 1 3 and Nit is coded 1 4 this syntax could be used to create a new factor VxN with the 12 levels of the com posite Var by Nit factor will discard records where both YA and YB have missing values assuming neither have zero as valid data The first line sets the focus to variable 98 copies YA into V98 and changes any missing values in V98 to zero The second line sets the focus to variable 99 copies YB
372. ograms run and then view the output before saving results It is available on the following platforms e Windows 32 bit and 64 bit e Linux 32 bit and 64 bit various incantations ASReml W has a built in help system explaining its use 1 3 2 ConTEXT ConTEXT is a third party freeware text editor with programming extensions which make it a suitable environment for running ASReml under Windows The ConTEXT directory on 1 5 Getting assistance and the ASReml forum the CD ROM includes installation files and instructions for configuring it for use in ASReml Full details of ConTEXT are available from http www contexteditor org 1 4 How to use this guide The guide consists of 15 chapters Chapter 1 introduces ASReml and describes the conven tions used in the guide Chapter 2 outlines some basic theory which you may need to come back to New ASReml users are advised to read Chapter 3 before attempting to code their first job It presents an overview of basic ASReml coding demonstrated on a real data example Chapter 15 presents a range of examples to assist users further When coding your first job look for an example to use as a model Data file preparationis described in Chapter 4 and Chapter 5 describes how to input data into ASReml Chapters 6 and 7 are key chapters which present the syntax for specify ing the linear model and the variance models for the random effects in the linear mixed model Variance modelling is a c
373. olidated model term gener ates a correlation matrix for example the consolidated model term for A B is specified as id A ar1 B then it is usually the case that one wishes to fit a model with this correlation structure but to also allow the effects to have a common variance When a correlation struc ture is specified for a consolidated term either for an R or a G structure ASReml will detect this and add a common scaled variance parameter Some users might find it simpler and reduce confusion by specifying terms as variance terms directly For example id A ar1 B should become either idv A ar1 B or id A ar1v B it is arbitrary which variable the common variance is attached to If more than one variance model function in the consol idated model term generates a variance structure either homogeneous or heterogeneous for example idv A ariv B then the parameters will not all be identifiable and so the user must either change idv A to id A and leave ariv B as it is or change ar1v B to ar1 B and leave idv A as it is 7 5 A sequence of variance structures for the NIN data Having outlined the theory and introduced the functional specification we pause now to consider an example The following is a series of six variance structures of increasing com plexity for the NIN column trial data see Chapter 3 for an introduction to these data For each example we present a code box to the right that contains the functional specification
374. om the command prompt when attached to the appropriate folder is ASReml nin89 as However if the path to ASReml is not specified in your system s PATH environment variable the path must also be given and the path is required when configuring ASReml W or Context In this guide we assume the command file has a filename extension as ASReml also recognises the filename extension asc as an ASReml command file When these are used the extension as or asc may be omitted from basename as in the command line if there is no file in the working directory with the name basename The options and arguments that can be supplied on the command line to modify a job at run time are described in Chapter 10 33 3 6 Description of output files 3 6 Description of output files A series of output files are produced with each ASReml run Nearly all files all that contain user information are ASCII files and can be viewed in any ASCII editor including Con Text ASReml W and NotePad The primary output from the nin89 as job is written to nin89 asr This file contains a summary of the data the iteration sequence estimates of the variance parameters and an a table of Wald F statistics for testing fixed effects The estimates of all the fixed and random effects are written to nin89 sln The residuals pre dicted values of the observations and the diagonal elements of the hat matrix see Chapter 2 are returned in nin89 yht see Section 13 3 Other key
375. ome missing values in the contrast Zero values in the factor no level assigned become zeros in the contrast The user should check that the levels of the factor are in the order assumed by contrast check the ass or sln or tab files It may also be used on the implicit factor Trait in a multivariate analysis provided it implicitly identifies the number of levels of Trait the number of traits is implied by the length of the list Thus if the analysis involves 5 traits CONTRAST Time Trait 1 3 5 10 20 requests computation of the approximate denominator degrees of freedom according to Kenward and Roger 1997 for the testing of fixed effects terms in the dense part of the linear mixed model There are three options for i i 1 suppresses computation i 1 and i 2 compute the denominator d f using numerical and algebraic methods respectively If i is omitted then i 2 is assumed If DDF i is omitted i 1 is assumed except for small jobs lt 10 parameters lt 500 fixed effects lt 10 000 equations and lt 100 Mbyte workspace when i 2 Calculation of the denominator degrees of freedom is computationally ex pensive Numerical derivatives require an extra evaluation of the mixed model equations for every variance parameter Algebraic derivatives re quire a large dense matrix potentially of order number of equations plus number of records and is not available when MAXIT is 1 or for multivariate analysis adds a
376. omplex aspect of analysis We introduce variance modelling in ASReml by example in Chapter 15 Chapters 8 and 8 3 1 describe special commands for multivariate and genetic analyses re spectively Chapter 9 deals with prediction offixed and random effects from the linear mixed model and Chapter 12 presents the syntax for forming functions of variance components such as heritability Chapter 10 discusses the operating system level command for running an ASReml job Chap ter 11 describes a new data merging facility Chapter 13 gives a detailed explanationof the output files Chapter 14 gives an overview of the error messages generated in ASReml and some guidance as to their probable cause 1 5 Getting assistance and the ASReml forum The ASReml help accessable through ASReml W can also be linked to ConText or accessed directly ASRem1 chm There is a User Area on the website http www VSNi co uk select ASReml and then User Area which contains contributed material that may be of assistance Users with a support contract with VSN should email support asreml co uk for assistance with installation and running ASReml When requesting help please send the input com mand file the data file and the corresponding primary output file along with a description of the problem All ASReml users including unsupported users are encouraged to join the ASReml forum register now at http www vsni co uk forum 1 6 Typographic conventions If ASRem
377. on From Section 2 1 1 the variance matrix of y is var y ZG o Z R o see model 2 3 This is referred to as the sigma parameterization and the individual vari ance structure parameters in Og and a are referred to as sigmas For the case when the variance structure for the residual error term is a scaled correlation matrix that is R o 07R 7 the variance matrix of y can be written alternatively as var y ol ZG y Z R 7 see 2 8 This is referred to as the gamma parameterization and the variance structure parameters in y and y are referred to as gammas see Section 2 1 6 7 6 1 Which parameterization does ASReml use for estimation By default ASReml uses either the gamma or sigma parameterization for estimation depend ing on the residual specification The current default for univariate single section data sets is the gamma parameterization It is possible to over ride this default as discussed in the following section ASReml reports both the gammas and the sigmas when the gamma pa rameterization is used for estimation For historical reasons the sigmas are presented twice two identical columns when the sigma parameterization is used for estimation ASReml uses the sigma parameterization for analyses other than univariate single site analy ses examples including multi section analyses multivariate analyses and repeated measures analysis using R structures that are not the default variance model
378. on NumDF DenDF F_inc Prob 8 mu i 12 3 850 88 lt 001 6 variety 24 80 0 13 04 lt 001 AR1 x AR1 units 1 LogL 740 735 S2 33225 125 df 2 components constrained 2 LogL 723 595 S2 11661 125 df 1 components constrained 3 LogL 698 498 S2 46239 125 df 4 LogL 696 847 S2 44725 125 df 5 LogL 696 823 S2 45563 125 df 6 LogL 696 823 S2 45753 125 df 7 LogL 696 823 S2 45796 125 df Results from analysis of yield Akaike Information Criterion 1401 65 assuming 4 parameters Bayesian Information Criterion 1412 96 Model_Term Gamma Sigma Sigma SE C idv units IDV_V 150 0 106152 4861 06 212 OP 289 15 6 Spatial analysis of a field experiment Barley ari column ari row Residual column row SCA_V AR_R AR_R Source of Variation 8 mu 6 variety 150 effects 150 1 000000 1 0 843791 1 0 682682 45793 4 0 843791 0 682682 Wald F statistics NumDF DenDF 1 3 6 24 TST FP ine 259 83 10 21 O O Y SG P ine lt 001 lt 001 The lattice analysis with recovery of between block information is presented below This variance model is not competitive with the preceding spatial models The models can be formally compared using the BIC values for example IB analysis LogL 707 LogL 707 NO OP WNEH LogL 734 LogL 720 LogL 711 LogL 707 LogL 707 786 786 184 060 119 937 786 52 26778 125 52 16591 125 52 11173 125 S2 8562 4 125 52
379. on of varieties to plots in the NIN field trial 26 List of transformation qualifiers and their actions with examples 54 Qualifiers relating to data input and output 62 List of commonly used job control qualifiers 66 List of occasionally used job control qualifiers 69 List of rarely used job control qualifiers 204 74 List of very rarely used job control qualifiers 83 Summary of reserved words operators and functions 90 Alphabetic list of model functions and descriptions 97 Link queers and functions eos a ae ee ee ee ee 102 GLM distribution qualifiers The default link is listed first followed by per Nee CeCe ee ee ele ot ee ote ee kee oe OE ce S 102 Examples of aliassing in ASReml 0 0000040 107 List of common variance model functions their type correlation or variance the form of the variance matrix generated C for correlation V for variance matrix S for scaled variance matrix and a brief description Parameters o gt 0 are variances 1 lt p lt 1 are correlations Subscipt c denotes parameter held in common across all rows columns 111 Building consolidated model terms in ASReml 4 112 G structure for the random terms magenta and R structure for the residual error term cyan under both the sigma and gamma parameterizations and the correspondin
380. ons including special functions to be included in the table row 22 of Wald F statistics column 11 e generally begins with the reserved word mu which fits nin89 asd skip 1 mvinclude a constant term mean or intercept see Table 6 1 93 yield mu variety r idv repl If mv residual idv units 6 4 Random and residual terms in the variance component model 6 3 2 Sparse fixed terms The f sparse_fixed terms in model formula NIN Alliance Trial 1989 variety e are the fixed covariates for example the fixed lin row covariate now included in the model for eae mula factors and interactions including special func column 11 tions and reserved words for example mv see Table nin89 asd skip 1 6 1 for which Wald F statistics are not required yield mu variety r idv repl If mv lin row include large gt 100 levels terms lasa iav unii 6 4 Random and residual terms in the variance component model The r conrandom functions have arguments that NIN Alliance Trial 1989 variety e comprise random covariates factors and interactions including special functions and reserved words see row 29 Table 6 1 Note that idvQ may not enclose a column 11 contracted at function an at function that is nin89 asd skip 1 expanded by ASReml to form multiple model terms yield mu variety r idv repl f because the result is ambiguous T s residual idv units In Chapter 7 we discuss possible
381. onsistent comparisons between check varieties and test lines Given the large amount of replication afforded to check varieties there will be very little shrinkage irrespective of the realised heritability We consider an initial analysis with spatial correlation in one direction and fitting the variety effects check replicated and unreplicated lines as random We present three further spatial models for comparison The ASReml input file is EPS RENAME ARG 1 2 3 Tullibigeal trial DOPART 1 linenum yield weed column 10 row 67 variety 532 testlines 1 525 check lines 526 532 wheat asd SKIP 1 PATH 1 ARI x I y mu weed mv r idv variety residual ariv row id col PATH 2 ARI x AR1 y mu weed mv r variety residual ariv row ar1 col PATH 3 AR1 x AR1 column trend y mu weed pol column 1 mv r idv variety residual ariv row ar1 col PATH 4 AR1 x AR1 Nugget column trend y mu weed pol column 1 mv r idv variety idv units residual ar1 row ar1i col predict var The data fields represent the factors variety row and column a covariate weed and the plot yield yield There are four paths in the ASReml file We begin with the one dimensional spatial model which assumes the variance model for the plot effects within columns is described by a first order autoregressive process The abbreviated output file is 1 LogL 4280 75 S2 0 12850E 06 666 df 2 LogL 4268 58 82 0 12139E 06 666 df 3 LogL 4255 89 S
382. onth year IPRWTS 56 36 70 53 0 556 0 O 56 22 BG 0 21 22 0 53 53 17 92 53 57 23 O 19 70 63 24 0 44 22 054 0 0 0 0 54 70 O 51 0 43 0 36 16 035 0 O 51 0 053 0 U O 0 0 49 0 5 predict crop 1 pasture lime PRES year month PRWIS YMprwts txt where YMprwts txt contains 19 2 11 0 11 2 10 6 11 4 1226 0 0 0 0 0 0 0 0 0 0 O20 faz 0 0 0 0 10 6 4 6 4 8 10 8 10 8 8 6 7 0 0 0 0 0 14 0 0 4 2 3 4 0 0 0 0 0 0 14 0 0 0 0 10 6 0 0 10 6 11 2 4 4 16 4 3 8 6 8 0 G fie W 0 9 8 189 9 3 Prediction 0 4 4 0 10 6 14 4 4 0 10 2 3 2 10 29 0 We have presented both sets of predict statements to show how the weights were derived and presented Notice that the order in PRESENT year month implies that the weight coefficients are presented in standard order with the levels for months cycling within levels for years There is a check which reports if non zero weights are associated with cells that have no data The weights are reported in the pvs file PRESENT counts are reported in the res file 9 3 6 Examples Examples are as follows yield mu variety r idv repl predict variety is used to predict variety means in the NIN field trial analysis Random rep1 is ignored in the prediction yield mu x variety r idv repl predict variety predicts variety means at the average of x ignoring random repl yield mu x variety repl predict variety x 2 forms the hyper table based on variety and repl at the covariate value of
383. or fields can be skipped and superfluous rows before the regressor information can be skipped 168 8 11 Factor effects with large Random Regression models The syntax for specifying and reading the grr file is M grr CSKIP c Factor f NOID CSKIP co Regressors m NONAMES SKIP s where M grr is the name of the file to be read CSKIP c indicates c fields are to be skipped before the factor identifiers are read Factor is the name of the variable in the data that is associated with the regressors f sets the maximum number of levels default 1000 of Factor with regressor data ASRem will count the actual number NOID indicates that the factor identifiers are not present in the grr file CSKIP co indicates cz fields are to be skipped before the regressor variables are read Regressors is the name for the set of regressor variables m sets the number of regressor variables default is the number of names found must be set if there are extraneous fields to be ignored SKIP s specifies how many lines are to be skipped before reading the regressor data NONAMES indicates there is no line containing the individual names of the regressor variables otherwise names are taken from the first non skipped line in the file If the factor identifiers are not present NOID ASReml assumes that the order of the factor classes in the data file matches the order in the grr file If the factor identifiers are present
384. or messages in the asr file The major in formation messages are in Table 14 1 A list of warning messages together with the likely meaning s is presented in Table 14 2 Other error messages with their probable cause s is presented in Table 14 3 Not all messages are listed here If not identify whether the problem is syntactical as in the previous section whether it is a processing problem the job starts to process but does not complete or a reporting problem e for a syntax problem note that the actual problem may be in an earlier line and the current message is indicating an inconsistency with what ASReml has already read Scan the output for other messages which might indicate the problem If the problem is not evident simplify the job until the simpler version runs and then build back to the required model Remember that the model statement is parsed before the data file is read but any following statements e g residual predict are parsed after the data is read processing errors are indicated if the asr file contains lines like Forming 18211 equations 42 dense Initial updates will be shrunk by factor 0 316 Simple things to try are increasing WORKSPACE and simplifying the model reporting problems are indicated if the LogL has converged or ASReml has completed the specified number of iterations Do not hesitate to seek help on the forum and to report problems to support vsni co uk Often a simple solution is availab
385. or name in RESIDUAL declaration After correcting the spelling of Repl we qin Alliance Trial 1989 get the following abbreviated output The variety A problem here is essentially the same as error id pid raw 5 The spatial residual model was declared TeP using Row and Col but the relevant variables T are in fact row and column Note that in Jing9 asa lakip 4 this case column could be truncated to colin yield mu variety the model formulae as this does not cause any R repl ambiguity but often it is clearer to use the full residual ar1 Row ar1 Col variable name predict varierty Summary of 224 records retained of 224 read Model term Size miss zero MinNonO Mean MaxNonO StndDevn 1 variety 56 0 0 1 28 5000 56 10 row 22 0 0 1 11 7321 22 11 column 11 0 0 1 6 3304 if 12 mu 1 ari Row in ar1 Row ar1 Col has size 0 parameters 5 5 ari Col in ar1 Row ar1 Col has size 0 parameters 6 6 ar1 Row ar1 Co1 4 6 initialized Error There are 224 data records but RESIDUAL model implies 0 data records Error Unrecognised argument in ari Row Error Unrecognised argument in ar1 Col Fault RESIDUAL structure does not match records in data Last line read was Residual ari Row ari Col ninerr6 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu 1 repl 4 0 100 3 SECTIONS 0 4 1 STRUCT 0 1 1 5 il 1 1 17 factors defined max5000 6 variance pa
386. or the response variable s to be analysed multivariate analysis is discussed in Chapter 8 e qualifiers allow for weighted analysis Section 6 7 and Generalized Linear Models Section 6 8 e is read as modelled as and separates response from the list of fixed and random terms in the linear mixed model fixed represents the list of primary fixed explanatory terms that is variates factors interactions and special terms for which Wald F statistics are required See Table 6 1 for a brief definition of reserved model terms operators and commonly used functions The full definition is in Section 6 6 conrandom represents the list of consolidated model terms see Chapter 7 specifying both random effects and variance structures In this chapter the consolidated model terms are of the form idv with arguments being the explanatory terms to be fitted as random effects see Table 6 1 and Section 6 6 Specifying idv term indicates that the term effects are IID distributed with a common variance sparse_fixed are additional fixed terms not included in the table of Wald F statistics the residual statement allows specification of the residual error variance structure conresidual is the list of residual consolidated terms see Chapter 7 specifying both ran dom effects and variance structures In this chapter we are assuming that the residual errors are IID Hence the specification idv units in the code box where units is the reserved w
387. ord specifying a factor with a level for every experimental unit 6 2 1 General rules The following general rules apply in specifying the linear mixed model 87 6 2 Specifying model formulae in ASReml all elements in the model must be space separated the character modelled as separates the response variables s from the explanatory variables in the model elements in the model may be separated by which is ignored except when it is at the end of a line which implies the model continues onto the next line the sign must appear on the first line of the model statement when the model statement is written over several lines data fields are identified in the model by their labels labels are case sensitive labels may be abbreviated truncated when used in the model line but care must be taken that the truncated form is not ambiguous If the truncated form matches more than one label the term associated with the first match is assumed For example dens is an abbeviation for density but spl dens 7 is a different model term to spl density 7 because it does not represent a simple truncation model terms may only appear once in the model line repeated occurrences are ignored model terms other than the original data fields are defined the first time they appear on the model line They may be abbreviated truncated if they are referred to again provided no ambiguity is introduced Important It is often clear
388. oup constraints see GROUPS below is to shrink the group effects by adding the constant o gt 0 to the diagonal elements of A pertaining to groups When a constant is added no adjustment of the degrees of freedom is made for genetic groups Use GOFFSET 1 to add no offset but to suppress insertion of constraints where empty groups appear The empty groups are then not counted in the DF adjustment includes genetic groups in the pedigree The first g lines of the pedigree identify genetic groups with zero in both parent fields All other lines must specify one of the genetic groups as parent if the actual parent is unknown You may insert groups identifiers with no members to define constraints on groups that is to associate groups into supergroups where the supergroup fixed effect is formally fitted separately in the model A constraint is added to the inverse which causes the preceding set of groups which have members to have effects which sum to zero The issue is to get the degrees of freedom correct and to get the correct calculation of the Likelihood especially in bivariate cases where DF associated with groups may differ between traits The LAST qualifier see page 80 is designed to help as without it reordering may associate singularities in the A matrix with random effects which at the very least is confusing When the A matrix incorporates fixed effects the number of DF involved may not be obvious especially if there is also a
389. ows Print Manager 11 WMF Windows Meta File wmf 12 HPGL 2 HP GL2 hgl 21 PNG PNG png 22 EPS EncapsulatedPostScript eps 10 3 5 Job control command line options C F O R C CONTINUE indicates that the job is to continue iterating from the values in the rsv file This is equivalent to setting CONTINUE on the datafile line see Table 5 4 page 66 for details F FINAL indicates that the job is to continue for one more iteration from the values in the rsv file This is useful when using predict see Chapter 9 O ONERUN is used with the R option to make ASReml perform a single analysis when the R option would otherwise attempt multiple analyses The R option then builds some arguments into the output file name while other arguments are not For example ASReml nor2 mabphen 2 TWT out 621 out 929 results in one run with output files mabphen2_TWT R r RENAME r is used in conjunction with at least r argument s and does two things it modifies the output filename to include the first r arguments so the output is identified by these arguments and if there are more than r arguments the job is rerun moving the extra arguments up to position r unless ONERUN 0 is also set If r is not specified it is taken as 1 For example ASReml r2 job wwt gfw fd fat is equivalent to running three jobs ASReml r2 job wwt gfw jobwwt_gfw asr ASReml r2 job wwt fd jobwwt_fd asr ASReml r2 job wwt fat jobwwt_fa
390. p SE ratios of zero sometimes indicate poor scaling Consider rescaling the design matrix in such cases Wald F statistics Source of Variation NumDF F inc 7 mu 1 1405 14 4 tmt a 441 72 The estimated variance components from this analysis are given in column b of table 15 8 There is no significant variance heterogeneity at the residual or tmt run level This indicates that the square root transformation of the data has successfully stabilised the error variance There is however significant variance heterogeneity for tmt variety interactions with the variance being much greater for the control group This reflects the fact that in the absence of bloodworms the potential maximum root area is greater Note that the tmt variety interaction variance for the treated group is negative The negative component is meaningful and in fact necessary and obtained by use of the GU option in this context since it should be considered as part of the variance structure for the combined variety main effects and treatment by variety interactions That is of 03 oF var 15 Q ui U2 1 o 2c E fae Q Is 15 8 Using the estimates from table 15 8 this structure is estimated as 3 84 2 33 2 33 1 96 ota Thus the variance of the variety effects in the control group also known as the genetic variance for this group is 3 84 The genetic variance for the treated group is much lower 1 96 The genetic correlation is 2 33 v 3 84 1 96 0 85 which
391. p information vy at position p sin v r forms sine from v with period r OS sqrt v 7r forms square root of v r uni f forms a factor with a level for each record where J factor f is non zero 91 6 2 Specifying model formulae in ASReml Table 6 1 Summary of reserved words operators and functions model term brief description common usage fixed random uni f n vect v forms a factor with a level for each record where factor f has level n is used in a multivariate analysis on a multivariate set of covariates v to pair them with the variates 92 v 6 3 Fixed terms in the model 6 2 2 Examples ASReml code action yield mu variety residual idv units yield mu variety r idv block residual idv units yield mu time variety time variety residual idv units livewt mu breed sex breed sex r idv sire residual idv units fits a model with a constant and fixed variety effects fits a model with a constant term fixed variety effects and random block effects fits a saturated model with fixed time and variety main effects and time by va riety interaction effects fits a model with fixed breed sex and breed by sex interaction effects and ran dom sire effects 6 3 Fixed terms in the model 6 3 1 Primary fixed terms The fixed list in the model formula NIN Alliance Trial 1989 variety e describes the fixed covariates factors and interacti
392. path name of the data subfile and SKIP n is an optional qualifier indicating that the first n lines of the subfile are to be skipped After reading each subfile input reverts to the primary data file Typically the primary data file will just contain INCLUDE statements identifying the subfiles to include For example you may have data from a series of related experiments in separate data files for individual analysis The primary data file for the subsequent combined analysis would then just contain a set of INCLUDE statements to specify which experiments were being combined If the subfiles have CSV format they should all have it and the CSV file should be declared on the primary datafile line This option is not available in combination with MERGE 5 8 Job control qualifiers The following tables list the job control qualifiers These change or control various aspects of the analysis Job control qualifiers may be placed on the datafile line and following lines They may also be defined using an environment variable called ASREML_QUAL The environment variable is processed immediately after the datafile line is processed All qual ifier settings are reported in the asr file Use the Index to check for examples or further discussion of these qualifiers Important Many of these are only required in very special circumstances and new users 65 5 8 Job control qualifiers should not attempt to understand all of them You do need
393. phanumeric variable The qualifier SPECIALCHAR cancels the normal meaning of the character in an input file so that it can be included in the name of a level of an alphanumeric or pedigree variable If class names are being predefined the qualifier SPECIALCHAR must appear before the class names are read in 5 4 3 Ordering factor levels The default order for factor levels when factors are declared with I and A is the order the levels are encountered in the data file SORT declared after A or I on a field definition line will cause ASReml to fit the levels in numeric alphabetical order although they are defined in some other order To control the order levels are defined the level names must be prespecified using the L s qualifier applies only to factors declared A Thus for a variable SEX coded as Male and Female declared SEX A the user cannot know whether it will be coded 1 Male 2 Female or 1 Female 2 Male without looking to see which occurs first in the data file However declaring it as SEX A L Male Female will mean Male is coded 1 Female is coded 2 If it is declared as SEX A SORT the coding order is unspecified but ASReml creates a lookup table after reading the data to arrange levels in sorted order and uses this sorted order when forming the design matrices Consequentially with the SORT qualifier the order of fitted effects will be 1 Female 2 Male in the analysis regardless of which appears first in the file
394. plicit in the variance structure for the trait by variety effects The variance structure can arise from a regression of treated variety effects on control effects namely Uy Buy E where the slope 6 oy 02 Tolerance can be defined in terms of the deviations from regression Varieties with large positive deviations have greatest tolerance to bloodworms Note that this is similar to the researcher s original intentions except that the regression has been conducted at the genotypic rather than the phenotypic level In Figure 15 9 the BLUPs for treated have been plotted against the BLUPs for control for each variety and the fitted regression line slope 0 61 has been drawn Varieties with large positive deviations from the regression line include YRK3 Calrose HR19 and WC1403 BLUP regression residual o e T wy 4 w 2 1 o 1 control BLUP Figure 15 10 Estimated deviations from regression of treated on control for each variety plotted against estimate for control An alternative definition of tolerance is the simple difference between treated and control BLUPs for each variety namely 6 Uy Uy Unless 8 1 the two measures and 6 have very different interpretations The key difference is that is a measure which is independent of inherent vigour whereas is not To see this consider cov e ul COV ty Buy w v e 307 15 8 Paired Case Control study Rice whereas
395. ports the terms in the conditional statistics Marginality pattern for F con calculation Model terms Model Term DF 1 2 3 4 5 6 7 8 1 mu 1 a NE re 2 water 1 I C C c 3 variety WO E k Cu a C 4 sow 2 PA Db G x 5 water variety 7 I I I I C C 6 water sow 2 1 oe Al E 2 7 variety sow 14 I I I I I I 8 water variety sow 14 I I I I I I I F inc tests the additional variation explained when the term is added to a model con sisting of the I terms F con tests the additional variation explained when the term is added to a model consisting of the I and C c terms Any c terms are ignored in calculating DenDF for F con using numerical derivatives for computational reasons The terms are ignored for both F inc and F con tests Consider now a nested model which might be represented symbolically by y 1 REGION REGION SITE For this model the incremental and conditional Wald F statistics will be the same However 21 2 5 Inference Fixed effects it is not uncommon for this model to be presented to ASReml as y 1 REGION SITE with SITE identified across REGION rather than within REGION Then the nested structure is hidden but ASReml will still detect the structure and produce a valid conditional Wald F statistic This situation will be flagged in the M code field by changing the letter to lower case Thus in the nested model the three M codes would be A and B because REGION SITE is obviously an interac
396. priate qualifier ND PSD or NSD is supplied These qualifiers do not modify the matrix they just instruct ASReml to proceed regard less If the matrix has positive and negative eigenvalues ND instructs ASReml to ignore the condition and proceed anyway If the matrix is positive semi definite positive and zero eigenvalues PSD allows ASReml to introduce Lagrangian multipliers to accommodate linear dependencies and rows with zero elements and allows ASRem1 to proceed Linear depen 163 8 9 Reading a user defined inverse relationship matrix dencies occur for example when the list of individuals includes clones Rows with zero elements occur when the GRM represents a dominance matrix and the list of individuals includes fully inbred individuals which by definition have zero dominance variance If the matrix has positive zero and negative eigenvalues NSD may be used to allow ASReml to continue The zero eigenvalues are handled as for PSD Sometimes with negative eigen values the iteration sequence may fail as some parameter values will result in a negative residual sum of squares If the specified giv file does not exist but there is a grm file of the same name ASReml will read and invert the grm file and write the inverse to the giv file if SAVEGIV f is specified Its is written in DENSE format unless f 1 SAVEGIV 3 writes the GIV matrix as an sgiv file SAVEGIV 4 writes the GIV matrix as a dgiv file where sgi
397. publication may be reproduced by any process electronic or otherwise without specific written permission of the copyright owner Neither may information be stored electronically in any form whatever without such permission Published by VSN International Ltd 5 The Waterhouse Waterhouse Street Hemel Hempstead HP1 1ES UK email info asreml co uk website http www vsni co uk The correct bibliographical reference for this document is Gilmour A R Gogel B J Cullis B R Welham S J and Thompson R 2014 ASRem User Guide Release 4 1 Functional Specification VSN International Ltd Hemel Hempstead HP1 1ES UK www vsni co uk Preface ASReml is a statistical package that fits linear mixed models using Residual Maximum Like lihood REML It has been under development since 1993 and arose out of collaboration between Arthur Gilmour and Brian Cullis NSW Department of Primary Industries and Robin Thompson and Sue Welham Rothamsted Research to research into the analysis of mixed models and to develop appropriate software building on their wide expertise in relevant areas including the development of methods that are both statistically and compu tationally efficient the analysis of animal and plant breeding data the analysis of spatial and longitudinal data and the production of widely used statistical software More recently VSN International acquired the right to ASReml from these sponsoring organizations and
398. qualifiers qualifier action FOR forlist DO command IFOR Markern DO MBF IFOR Markers DO MBF The argument n is often given as 1 indicating that the actual path to use is specified as the first argument on the command line see Section 10 4 See Sections 15 7 and 15 10 for examples The default value of n is 1 DOPATH n can be located anywhere in the job but if placed on the top job control line it cannot have the form DOPATH 1 unless the arguments are on the command line as the DOPATH qualifier will be parsed before any job arguments on the same line are parsed New R4 The FOR DO command is intended to simplify coding when a series of similar lines are required in the command file which differ in a single argument The list of arguments is placed after FOR and the command is written after DO with S indicating where the argument is to be inserted list may be an assign string since they are processed before the FOR statement is expanded Furthermore if list is entirely integer numbers 7 7 notation can be used For example ASSIGN Markern 35 75 125 ASSIGN Markers M35 M75 M125 mbf Geno 1 markers csv key 1 RFIELD S RENAME M S me Ir Markers is expanded to IMBF mbf Geno 1 markers csv key 1 RFIELD 35 RENAME M35 IMBF mbf Geno 1 markers csv key 1 RFIELD 75 RENAME M75 IMBF mbf Geno 1 markers csv key 1 RFIELD 125 RENAME M125 Ir M35 M75 M125 The aim here is to generate the
399. qualifiers that allow specification of initial values and constraints We have given an explicit specification for these variance component models to emphasise the form of the syntax However an alternative more concise implicit specification for these models is to note that idv is a default function and the random terms can be placed after r without explicitly specifying idv Furthermore residual idv units is the default residual specification and may be omitted from the model specification This is precisely the form used in Release 3 for these models 94 6 5 Interactions and conditional factors 6 5 Interactions and conditional factors 6 5 1 Interactions e interactions are formed by joining two or more terms with a or a which is replaced with for example a b is the interaction of factors a and b interaction levels are arranged with the levels of the second factor nested within the levels of the first labels of factors including interactions are restricted to 47 characters of which only the first 20 are ever displayed Thus for interaction terms it is often necessary to shorten the names of the component factors in a systematic way for example if Time and Treatment are defined in this order the interaction between Time and Treatment could be specified in the model as Time Treat remember that the first match is taken so that if the label of each field begins with a different letter the first letter i
400. r each trait Trait by itself fits the mean for each variate in an interaction Trait Fac fits the factor Fac for each variate and Trait Cov fits the covariate Cov for each variate An explanatory factor or covariable associated with Trait i can be fitted using at Trait i Fac or at Trait i Cov ASReml internally arranges the data so that n data records containing t traits each becomes n sets of t analysis records indexed by the internal factor Trait ie nt analysis records ordered Trait within data record If the data is already in this long form use the ASMV t qualifier to indicate that a multivariate analysis is required 8 3 Residual variance structures Using the notation of Section 2 1 11 consider a multivariate analysis with t traits and n units in which the data are ordered traits within units An algebraic expression for the residual variance matrix in this case is I 8 where is an unstructured variance matrix This is the general form of residual variance structures required for multivariate analysis 8 3 1 Specifying multivariate variance structures in ASReml A standard multivariate analysis is achieved Orange Wether Trial 1984 8 using the the us variance model func SheepID I tion for the two random Trait com TRIAL ponents and specifying the R structure BloodLine I TEAM for the residual error term as residual hs id units us Trait GFW YLD FDIAM f f ake wether dat skip 1 e
401. r effects mu takes 1 variety then takes 2 linNitr takes 1 nitrogen takes 2 variety linNitr takes 2 and there are four degrees of freedom left This information is used to make sure that the conditional Wald F statistic does not contradict marginality principles The next table indicates the details of the conditional Wald F statistic The conditional Wald F statistic is based in the reduction in Sums of Squares from dropping the particular term indicated by from the model also including the terms indicated by I C and c The next two tables based on incremental and conditional sums of squares report the model term the number of effects in the term the numerator degrees of freedom the Wald F statistic an adjusted Wald F statistic scaled by a constant reported in the next column and finally the computed denominator degrees of freedom The scaling constant is discussed by Kenward and Roger 1997 Table showing the reduction in the numerator degrees of freedom for each term as higher terms are absorbed Model Term 6 a 2 J mu 12 3 1 variety 1 1 2 LinNitr 1 4 4 1 3 9 3 nitrogen 8 2 6 4 ONWW Ww variety LinNitr variety nitrogen OoanRFRWNE Marginality pattern for F con calculation Model terms Model Term DF 12 3 4 5 6 1 mu Lm g G a 2 variety 2 T G 3 LinNitr Lt I T 3 4 nitrogen 2 ai I L 5 variety LinNitr 2 i I I I amp 6 variety nitrogen 4 I I I I I Model codes b A a A bB F
402. r mixed models Journal of the American Statistical Association 88 9 25 Breslow N E and Lin X 1995 Bias correction in generalised linear mixed models with a single component of dispersion Biometrika 82 81 91 Cox D R and Hinkley D V 1974 Theoretical Statistics Chapman and Hall Cox D R and Snell E J 1981 Applied Statistics Principals and Examples Chapman and Hall Cressie N A C 1991 Statistics for spatial data John Wiley and Sons Cullis B R and Gleeson A C 1991 Spatial analysis of field experiments an extension to two dimensions Biometrics 47 1449 1460 Cullis B R Gleeson A C Lill W J Fisher J A and Read B J 1989 A new procedure for the analysis of early generation variety trials Applied Statistics 38 361 375 Cullis B R Gogel B J Verbyla A P and Thompson R 1998 Spatial analysis of multi environment early generation trials Biometrics 54 1 18 Dempster A P Selwyn M R Patel C M and Roth A J 1984 Statistical and computational aspects of mixed model analysis Applied Statistics 33 203 214 Draper N R and Smith H 1998 Applied Regression Analysis John Wiley and Sons New York 3rd Edition Fernando R and Grossman M 1990 Genetic evaluation with autosomal and x chromosomal inheritance Theoretical and Applied Genetics 80 75 80 Gilmour A R 2007 Mixed model regression mapping for qtl detection in experime
403. r the AR x AR1 model using the dimensions of the fac tors rather than the factor names In this case the data records would need to be sorted in the order rows within columns because ASReml does not reorder the data internally when dimensions are used but instead assumes that the specified variance structure matches the 114 7 3 Applying variance structures to the residual error term order of the data as presented in the data file The fourth example assumes variance heterogeneity among the data observations that is that the three groups comprising observations 1 23 24 50 51 70 have unequal vari ances residual idv 23 idv 27 idv 20 The fifth and final example is the default residual variance in a multivariate analysis Spec ifying units as the first component is crucial as ASReml extracts the trait values by trait within unit residual id units us Trait 7 3 1 Special properties and rules in defining the residual error term There are certain properties and associated rules for this term that require special consider ation Rule 1 The number of effects in the residual term must be equal to the number of data units included in the analysis Rule 2 Where a compound model term is specified for the residuals each combination of levels of the single model terms comprising this term must uniquely identify one unit of the data For example in the spatial analysis of a column trial comprising 4 replicates of 24 variet
404. r the progeny records Assume our data file ramdbh txt has fields tree mum dad row column plot DBH Aldiag OP parent and we have deleted the non parent rows from the full pedigree file to form ParentPed txt If you have a pedigree file for all trees processing that pedigree with the GIV 2 qualifier will create a pedigree file just containing the parents and also the Q giv file for the non parent referred to below If we assume a heritability of 0 1111 so that the ratio of genetic variance to residual variance is 0 125 the following model will estimate the breeding values for the parents directly RAM BLUP model tree mum P V21 dad P V21 row column plot DBH AIdiag V21 NP A L Nonparent Parent parent P filter NP 1 create Nonparent filter mum filter dad filter AIdiag filter WT 0 125 AIdiag 1 AIdiag 1 filter ParentPed txt ramdbh txt DBH WT WT mu Ir str parent and mum 0 5 and dad 0 5 id 1 nrmv parent 0 125 plot ariv column ar1 row residual idv units In this model e NP A L Nonparent Parent ensures the NP data field is coded 1 for non parents and 2 for parents e filter NP 1 creates a variable that is 1 for non parents and zero for parents e The filter transformations put mum dad and AIdiag information to zero for parents e WT 0 125 AIdiag 1 AIdiag 1 V21 creates a weight variable which is 1 for parent records g q 7 for a non pa
405. r values are not at the REML solution Parameters appear to be at the REML solution in that the parameter values are stable sense that the user may have intended something different Messages beginning with the word Warning highlight information that the user should check Again it may reflect an error if the user has intended something different Messages beginning with the word Error indicate that something is inconsistent as far as ASReml is concerned It may be a coding error that the user can fix easily or a processing error which will generally be harder to diagnose Often the error reported is a symptom of something else being wrong 255 They provide 14 5 Information Warning and Error messages Table 14 2 List of warning messages and likely meaning s warning message likely meaning Notice ASReml has merged design points closer than Warning e missing values generated by transformation Warning ti singularities in AI matrix Warning m variance structures were modified Warning n missing values were detected in the design Warning n negative weights Warning r records were read from multiple lines WARNING term has more levels than expected Warning term in the predict TGNORE list Warning term in the predict USE list Warning term is ignored for prediction Warning Check if you need the RECODE qualifier Warning Code B fixed ata boundary GP
406. rait nrm tag xfai TrDam12 nrm dam xfai TrLit1234 id lit lf Trait erp residual id units us Trait PATH 3 wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF us Trait nrm tag xfai TrDam12 nrm dam us TrLit1234 id lit I Trait grp residual id units us Trait PATH 4 wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF xfa2 Trait nrm tag xfai TrDam12 nrm dam us TrLit1234 id lit lf Traat grp residual id units us Trait PART 5 wwt ywt gfw fdm fat Trait Trait age Trait brr Trait sex Trait age sex r VARF xfa3 Trait nrm tag xfai TrDam12 nrm dam us TrLit1234 id lit It Trait grp residual id units us Trait The term function Tr nrm tag now replaces the function Tr id sire and picks up part of function TrDam12 id dam variation present in the half sib analysis This analysis uses information from both sires and dams to estimate additive genetic variance The dam variance component is this analysis estimates the maternal variance component It is only significant for the weaning and yearling weights The litter variation remains unchanged Notice again how the maternal effect is only fitted for the first 2 trait and the litter effect for the first 4 traits The critical detail is that SUBSET is used to setup TrDam12 a variable using the first two traits ASReml uses the relationship matrix for the dam dimension since dam is d
407. rameters max2500 2 special structures Final parameter values 3 6 0 10000E 00 1 00000 0 10000E 00 0 10000E 00 Last line read was Residual ari Row ari Col Finished 23 Apr 2014 09 17 20 179 RESIDUAL structure does not match records in data 252 14 4 An example 7 Missing plots in field layout The variables row and column define a 22x11 NIN Alliance Trial 1989 grid that is 242 plots but there are only 224 plots in the data We could manually work out which are missing and construct extra data lines to complete the grid but ASReml will do this for us if we add the qualifiers repl and add the model term mv to estimate miss r repl ing values for the missing plots So this prob residual ar1 row ar1 col predict varierty lem is resolved by changing the model lines to variety A id pid raw row column nin89 asd skip 1 ROWFAC row COLFAC column yield mu variety read nin89 asd skip 1 ROWFAC row COLFAC column yield mu variety mv R repl residual ari row ari col This output also flags the 8th error which is the misspelling of variety in the predict line That error does not stop the job running but does mean the predicted means for variety will not be formed QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines 12 mu 1 ari row in ar1 row ar1i col has size 22 parameters 5 5 ari col in ar1 row ari col has size 11 parameters 6 6 ari
408. raphics file type to HP GL 2 allows the user to temporarily fix the parameters listed Parameter num bers have been added to the reporting of input values to facilitate use of this and other parameter number dependent qualifiers The list should be in increasing order using colon to indicate a sequence step size is 1 For example HOLD 1 20 30 40 79 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action ILAST lt factor gt lt lev gt Kfacz gt lt levg gt lt fac3 gt lt lev3 gt OUTLIER OWN f PRINT n PNG IPS PVSFORM n RESIDUALS 2 limits the order in which equations are solved in ASReml by forcing equations in the sparse partition involving the first lt lev gt equations of lt factor gt to be solved after all other equations in the sparse partition It is intended for use when there are multiple fixed terms in the sparse equations so that ASReml will be consistent in which effects are identified as singular The test example had Ir Anim Litter f HYS where genetic groups were included in the definition of Anim Consequently there were 5 singularities in Anim The default reordering allows those singularities to appear anywhere in the Anim and HYS terms Since 29 genetic groups were defined in Anim LAST Anim 29 forces the genetic group equations to be absorbed last and therefore incorporate any singularities In the more general
409. rder c autocorrelation parameter pe respectively More specifically a two dimensional separable autoregressive spatial structure AR1 AR1 is sometimes assumed for the common errors in a field trial analysis see Gogel 1997 and Cullis et al 1998 for examples In this case 1 1 Pr 1 Pe 1 t e m 1 and X Pe pe 1 oe Ge pe ney a i O Be ve Alternatively the residuals might relate to a multivariate analysis with n traits and n units and be ordered traits within units In this case an appropriate variance structure might be ILO X where X is a general or unstructured variance matrix See Chapter 7 for details on specifying separable R structures in ASReml 10 2 1 The general linear mixed model 2 1 12 Direct products in G structures Likewise the random model terms in u may have a direct product variance structure For example for a field trial with s sites g varieties and the effects ordered varieties within sites the random model term site variety may have the variance structure Yel where is the variance matrix for sites This would imply that the varieties are independent random effects within each site have different variances at each site and are correlated across sites Important Whenever a random term is formed as the interaction of two factors you should consider whether the IID assumption is sufficient or if a direct product structure might be more appropriate See Chapter 7 for details on
410. red by the VCM process see Section 7 8 2 above For example using a control file vemdes as containing Create VCM Design for H F model Row Col Off Y vo vemdes asd DESIGN Y Row and Row 0 5 and Col 0 5 Off and a data file vemdes asd containing anananrPRPRPRBPWWWNYN EB ORPWNHRFPBPWNHRWNHENF 1 ps then the file vcmdes des will be generated which contains the values used in fitting the variance model for the HuynhFeldt model given in Section 7 8 2 135 7 9 Ways to present initial values to ASReml 7 9 Ways to present initial values to ASReml In complex models the Average Information algorithm can have difficulty maximising the REML log likelihood when starting values are not reasonably close to the REML solution ASReml has several internal strategies to cope with this problem When the user needs to provide better starting values than those generated by ASReml three of the methods are inserting explicit initial values in the as file for example using INIT doing a preliminary run to obtain tsv or msv files and then modifying the parametric information in one of those files Section 7 9 1 fitting a simpler model and using parameter values derived from the simpler model through the rsv file Section 7 9 2 7 9 1 Using templates to set parametric information associated with variance structures using tsv and msv files ASReml 3 needed initial values for most variance structure parameters an
411. rent record with q the respective diagonal element 167 8 11 Factor effects with large Random Regression models of AIdiag with q 2 for non inbred non parents and y is the variance ratio o o 0 125 in this case This weighting corresponds to a residual variance for a non parent record of 05 4 o e If there is no direct information on parents the parent term is replaced by zero where zero is a variable with zero elements e If dad is unknown the and dad term is dropped e The BLUPs of a non parent will need to be calculated outside ASReml by adding y q 7 times its residual to the average of the parental BLUPs Prediction of parental values with assumed heritability was the main motivation for the development of the reduced animal model Estimation of genetic variance parameters is a little more complicated and the computational gains of removing non parent genetic values from the estimation procedure only apply if it is reasonable to form a small number of groups with roughly similar AIdiag values If AIG is this group factor then one can estimate residual variances in each group using sat AIG idv units and use the variance parameter linear model facilities to constrain the residual variances and the parent variance to be a function of the genetic and residual variances 8 11 Factor effects with large Random Regression models One use of the GRM matrix is to allow more computationally efficient fitting of random re
412. respond to these singularities are zero in the sln file Singularities in the sparse_fixed terms of the model may change with changes in the random terms included in the model If this happens it will mean that changes in the REML log likelihood are not valid for testing the changes made to the random model This situation is not easily detected as the only evidence will be in the sln file where different fixed effects are singular A likelihood ratio test is not valid if the fixed model has changed 6 10 4 Examples of aliassing The sequence of models in Table 6 5 are presented to facilitate an understanding of over parameterised models It is assumed that var is a factor with 4 levels trt with 3 levels and rep with 3 levels and that all var trt combinations are present in the data Table 6 5 Examples of aliassing in ASReml model number of order of fitting singularities yield var r idv rep 0 rep var yield mu var r idv rep 1 rep mu var first level of var is aliassed and set to Zero yield var trt r idv rep 1 rep var trt var fully fitted first level of trt is aliassed and set to zero yield mu var trt var trt 8 rep mu var trt var trt Ir idv rep first levels of both var and trt are aliassed and set to zero together with subsequent interactions 107 6 11 Wald F Statistics Table 6 5 Examples of aliassing in ASReml model number of order of fitting singularities yield mu var trt r idv rep
413. riables if the second argument 1 setting the number of levels is present it may be For example is equivalent to X1 X2 X3 X4 X5 y X IGDy data dat data dat y mu X1 X2 X3 X4 X5 y mu X DATE specifies the field has one of the date formats dd mm yy dd mm ccyy dd Mon yy dd Mon ccyy and is to be converted into a Julian day where dd is a 1 or 2 digit day of the month mm is a 1 or 2 digit month of the year Mon is a three letter month name Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec yy is the year within the century 00 to 99 cc is the century 19 or 20 The separators and must be present as indicated The dates are converted to days starting 1 Jan 1900 When the century is not specified yy of 0 32 is taken as 2000 2032 33 99 taken as 1933 1999 DMY specifies the field has one of the date formats dd mm yy or dd mm ccyy and is to be converted into a Julian day MDY specifies the field has one of the date formats mm dd yy or mm dd ccyy and is to be converted into a Julian day TIME specifies the field has the time format hh mm ss and is to be converted to seconds past midnight where hh is hours 0 to 23 mm is minutes 0 59 and ss is seconds 0 to et 59 The separator must be present e transformations are described below 5 4 2 Storage of alphabetic factor labels Space is allocated dynamically for the storage of alphabetic factor labels with a default allocation being 2000
414. ricem asd skip 1 X syc Y sye sqrt yc sqrt ye Trait r us Trait id variety us Trait id run residual id units us Trait predict variety A portion of the output from this analysis is 8 LogL 343 220 S2 1 00000 262 df 9 LogL 343 220 S2 1 00000 262 df Results from analysis of sqrt yc sqrt ye Akaike Information Criterion 704 44 assuming 9 parameters Bayesian Information Criterion 736 56 Model_Term Sigma Sigma Sigma SE C id units us Trait 264 effects Trait USV 1 1 2 14370 2 14370 4 44 OP Trait US_C 2 1 0 987342 0 987342 2 59 OP Trait USV 2 2 2 34744 2 34744 4 62 OP us Trait id variety 88 effects Trait usy a 1 3 83911 3 83911 3 47 0P Trait USC 2 1 2 33352 2 33352 2 01 OP Trait USV 2 2 1 96136 1 96136 269 OP us Trait id run 132 effects Trait USV 1 1 1 70810 1 70810 2 61 OP Trait US_C 2 1 0 319444 0 319444 0 59 OP Trait USV 2 2 2 54360 2 54360 3 20 OP Covariance Variance Correlation Matrix US Residual 2 144 0 4401 0 9873 2 347 Covariance Variance Correlation Matrix US us Trait id variety 3 839 0 8504 2 334 1 961 Covariance Variance Correlation Matrix US us Trait id run 1 708 0 1533 0 3194 2 544 The resultant REML log likelihood is identical to that of the heterogeneous univariate analysis column b of table 15 8 The estimated variance parameters are given in Table 15 10 The predicted variety means in the pvs file are used in the following section on interpretation of res
415. riety plotted against estimate for Control lt sa s sas aiaa ee ey ee RH Y 308 Trellis plot of trunk circumference for each tree o oo aaa aa 309 Fitted cubic smoothing spline for tree 1 aoaaa aaa a 311 Plot of fitted cubic smoothing spline for model 1 314 Trellis plot of trunk circumference for each tree at sample dates adjusted for season effects with fitted profiles across time and confidence intervals 315 Plot of the residuals from the nonlinear model of Pinheiro and Bates 316 XV 1 Introduction 1 1 What ASReml can do ASReml pronounced A S Rem el is used to fit linear mixed models to quite large data sets with complex variance models It extends the range of variance models available for the analysis of experimental data ASReml has application in the analysis of e un balanced longitudinal data e repeated measures data multivariate analysis of variance and spline type models e un balanced designed experiments e multi environment trials and meta analysis e univariate and multivariate animal breeding and genetics data involving a relationship matrix for correlated effects e regular or irregular spatial data The engine of ASReml underpins the REML procedure in GENSTAT An interface for R called ASReml R is available and runs under the same license as the ASReml program While these interfaces will be adequate for many analyses some large problems will need to use ASReml The ASR
416. rm is confounded with a fixed term and when there is no information in the data on a particular component Another common cause is when fitting an animal model and there is ex cessive sire dam variance so that heritability from a sire model would exceed 1 so that the residual variance under the animal model has ap proached zero In this case the data contradicts the assumptions of the animal model The best solution is to reform the variance model so that the ambiguity is removed or to fix one of the parameters in the variance model so that the model can be fitted Only rarely will it be reasonable to specify the ATSINGULARITIES qualifier sets hardcopy graphics file type to bmp suppresses some of the information written to the asr file The data summary and regression coefficient estimates are suppressed This quali fier should not be used for initial runs of a job until the user has confirmed from the data summary that the data is correctly interpreted by ASReml Use BRIEF 2 to cause the predicted values to be written to the asr file instead of the pvs file Use BRIEF 1 to get BLUE fixed effect esti mates reported in asr file The BRIEF qualifier may be set with the B command line option is used to calculate the effects reported in the sln file without calcu lating any derived quantities such as predicted values or updated vari ance parameters For argument values 1 3 ASReml solves for the effects directly while for
417. rmation so that the transformations apply to the same set of variables Y1 Y2 Y3 Y4 Y5 Repeat 5 times incrementing just Ymean 0 DO 5 O 1 Y1 ENDDO 5 the argument is equivalent to Y1 Y2 Y3 Y4 Y5 Ymean 0 Y1 Y2 Y3 Y4 Y5 5 YO Y1 Y2 Y3 Y4 Y5 TARGET Y1 do 5 1 O YO ENDDO Take YO from rest Markers G 12 do D ENDDO Delete records with missing marker values The default arguments 12 1 0 are used The initial target is the first marker 5 5 3 Remarks concerning transformations Note the following variables that are created should be listed after all variables that are read in unless the intention is to overwrite an input field missing values are unaffected by arithmetic operations that is missing values in the current or target column remain missing after the transformation has been performed except in assignment 3 will leave missing values NA and as missing 3 will change missing values to 3 multiple arithmetic operations cannot be expressed in a complex expression but must be given as separate operations that are performed in sequence as they appear for example yield 120 0 0333 would calculate 0 0333 yield 120 59 5 5 Transforming the data e Most transformations only operate on a single field and will not therefore be performed on all variables in a G factor set The only transformations that apply to the whole set are DOM MM and RESCALE ASReml c
418. ro ae Ee hee hee hw a e g ek we ea ee he E a 8 5 The gommand ME eoc et eS OM SEE OES HERS 8 6 Thep edgee ie e ce bee hee WR we ee eee ee he 8 7 Reading in the pedigree file 00000 ee eee eee 8 8 Genetic groups gt cc sc ow oe oe ASR EMG eb ERE S EES e 8 9 Reading a user defined inverse relationship matrix 8 9 1 Genetic groups in GIV matrices 2 2 ee ee 8 9 2 The example continued bk ke ee ee eR Ree eS 8 10 The reduced animal model RAM 22000 0 8 11 Factor effects with large Random Regression models Tabulation of the data and prediction from the model 9 1 IOUNCNIORG 0 oe Ze ee Poe hae Se BEER EES Ee ESS SS 9 2 WIE ce oe ee eae ee ee eee eee eee we 9 3 PPOGIOMON oe t 444 426 4 o 6 44 6 KE od amp He aoi He tae BS ee 9 3 1 Underlying principles s os c cs ce bev debe atiae EES 9 3 2 Predictsyntak o o p ea alee ke es i e a a eh eee e e a 9 33 Freedict Tailte soes oe Bab a ra do eee a ea He a y 9 3 4 Associated factors 2 co ke Pek ee ee eee a Eo eee Ee wd 9 3 5 Complicated weighting with PRESENT 03 6 Examples oei chk ee heb eee hehe bbe ERE REG HS 9 3 7 New R4 Prediction using two way interaction effects 10 Command file Running the job Di Tniroduchon so me BH EE EEE EE EE SOG PE EERE RS RG 10 2 Th gomiand line e ow eae a ok ee E ee RH 10 21 Normal run 6 64 44S 4 EER ES EREEER regda SD HDA 10 2 2 Processing a pin We o s sa sacca
419. rocessing arguments 10 4 3 Paths and Loops ASReml was designed to analyse just one model per run However the analysis of a data set typically requires many runs fitting different models to different traits It is often convenient to have all these runs coded into a single as file and control the details from the command line or top job control line using arguments The highlevel qualifiers CYCLE and DOPATH enable multiple analyses to be defined and run in one execution of ASReml Table 10 3 High level qualifiers qualifier action ASSIGN list New R4 An ASSIGN string qualifier has been added to extend coding options It is a high level qualifier command which may appear anywhere in the job Each occurrence of ASSIGN must start on its own input line The syntax is ASSIGN name string or ASSIGN name lt string gt and the defined string is substituted into the job where name appears string is the rest of the line and may include blanks If lt gt encloses string string may extend over several lines which are concatenated For example ASSIGN TVS xfai Treat TVS geno is interpreted as xfai Treat geno Restrictions e a maximum of 50 assign strings may be defined e the combined length of all strings is 5000 characters e name may have up to 8 characters but should not begin with a number see command line arguments e dollar substitution occurs before most other high level act
420. row ar1 col 4 6 initialized Forming 61 equations 57 dense Initial updates will be shrunk by factor 0 400 Notice Invalid argument unrecognised qualifier or vector space exhausted at varierty Error R structures do not match records in data Error Spatial Layout is not rectangular grid Fault Variance structure does not match data Last line read was STOP ninerr7 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS variety 56 mu al repl 4 0 100 3 SECTIONS 242 4 1 STRUCT 22 1 1 5 1 1 10 10 1 1 6 1 1 11 15 factors defined max5000 6 variance parameters max2500 2 special structures Final parameter values 3 6 0 10000 1 0000 0 10000 253 14 5 Information Warning and Error messages 0 10000 Last line read was STOP Finished 23 Apr 2014 09 17 23 354 Variance structure does not match data 8 A misspelt factor name in the predict statement The final error in the job is that a factor name is misspelt in the predict statement This is a non fatal error The asr file contains the messages Notice Invalid argument unrecognised qualifier or vector space exhausted at voriety Warning Extra lines on the end of the input file are ignored from predict varierty The faulty statement is otherwise ignored by ASReml and no pvs file is produced To rectify this statement correct varierty to variety 14 5 Information Warning and Error messages ASReml prints information warning and err
421. roximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv dam 22 50 1271762 11 5 1 0 Residual Variance 292 44 0 165300 0 0 1 0 Model_Term Gamma Sigma Sigma SE C idv dam IDV_V 27 0 586674 0 969770E 0O1 2 92 OP idv units 322 effects Residual SCA_V 322 1 000000 0 165300 12 09 OP Wald F statistics Source of Variation NumDF DenDF_con F_inc F_con M P_con 7 mu 1 32 0 9049 48 1099 20 b lt 001 3 littersize 1 316 27 99 46 25 B lt 001 1 dose 2 29 9 12 15 11 51 A lt 001 2 sex 1 299 8 57 96 57 96 A lt 001 8 dose sex 2 302 1 0 40 0 40 B 0 673 Notice The DenDF values are calculated ignoring fixed boundary singular variance parameters using algebraic derivatives 4 dam 27 effects fitted SLOPES FOR LOG ABS RES on LOG PV for Section 1 zo 3 possible outliers see res file 274 15 3 Unbalanced nested design Rats The iterative sequence has converged and the variance component parameter for dam hasn t changed for the last three iterations The incremental Wald F statistics indicate that the interaction between dose and sex is not significant The F_con column helps us to assess the significance of the other terms in the model It confirms littersize is significant after the other terms that dose is significant when adjusted for littersize and sex but ignoring dose sex and that sex is significant when adjusted for littersize and dose but ignoring dose sex These tests respect marginali
422. rs identifies the factors to be used for classifying the data Only factors not covariates may be nominated and no more than six may be nominated ASReml prints the multiway table of means omitting empty cells to a file with extension tab 9 3 Prediction 9 3 1 Underlying principles Our approach to prediction is a generalization of that of Lane and Nelder 1982 who only consider fixed effects models They form fitted values for all combinations of the explana tory variables in the model then take marginal means across the explanatory variables not relevent to the current prediction Our case is more general in that we also consider the case of associated factors see below and options for random effects that appear in our mixed models A formal description can be found in Gilmour et al 2004 and Welham et al 2004 Associated factors have a particular one to many association such that the levels of one factor say Region define groups of the levels of another factor say Location In prediction it is necessary to correctly associate the levels of associated factors 175 9 3 Prediction Terms in the model may be fitted as fixed or random and are formed from explanatory variables which are either factors or covariates For this exposition we define a fixed factor as an explanatory variable which is a factor and appears in the model in terms that are fixed it may also appear in random terms a random factor as an expla
423. rt CORUH and APA to US o ac cc hha eH RRR REG HS 1223 Correlation i i sa hee ee ee Eee eR ee SES 12 2 4 A more detailed example 0 22 0000 123 VPREDICT PIN file processing gt lt o ao ce Se ee eee eee ee 13 Description of output files 131 NOOO ce bh ee ag eenaa e GEE E SE OEE ES 132 Aneampl sa s sss s eaor kw aa et ap E os aa ee a eUa 133 WP Ce les 2 sieca ee Bed RS Ot e Re a ea ey 1331 The asr file oe ee eh oe oe oe BEES eA Ree ee 1332 TOR cE n ers era pe Gh erde eee Eee ee 133 3 The yht fE o oi ecaa a a a a a N a a aa 13 4 Other ASReml output files a ey hw wd ww a 1341 The aor ile orek seges deei e dies EERO YESS 1342 The sasl iile ene ieder he CEE SE g i ee eS p 1343 The dpr tle e ca adera ea ea a Ba e SEAS a 1344 The msy tile 5 6 soos Gao Be ee eee ewe ea eS 228 134 5 The BURT e sac doreir ay oe PY ORS eR OEE EKG de 229 1460 Thep file oero Be eee Pw EEE ES eh Ge ek ee hs 230 1347 THe restile oo 2s ceded wRERE Oe RSE eERESD EASES YS 230 134 8 The rey file o o psc eee eh ee eh Oe ee EAS 237 BAS The stabile eo air ati a ee eh eee ai i ee ai i eed 238 BAW CS e e bed ae b OS eH pia E ee ee A 238 134 11 The vrb file o ae eh ds podia rrara Yaa ra Aoo arada 239 13412 The WS ee ca samea edee e a REE EE ae ED 240 13 5 ASReml output objects and where to find them 240 14 Error messages 244 MI lntrodichian o se e cerra ee a E ae wee RE pe e RE SR eee EY 244 142 Common problems ec ss ssa se Ki
424. ructure when kis w 1 initial values for US CHOL and ANTE structures are given in the form of a US matrix which is specified lower triangle row wise viz On Fn Oa 0 0 0 31 32 33 that is initial values are given in the order 1 0 2 0 3 0 the US model is associated with several special features of ASReml There is an process to update its values by EM see EMFLAG rather than Al when its Al updates make the matrix non positive definite Also when used in the R structure for multivariate data ASReml automatically recognises patterns of missing values in the responses see Chapter 8 142 7 11 Variance model functions available in ASReml 7 11 4 Notes on Mat rn The Mat rn class of isotropic covariance models is now described ASReml uses an extended Mat rn class which accomodates geometric anisotropy and a choice of metrics for random fields observed in two dimensions This extension described in detail in Haskard 2006 is given by where h hy hy is the spatial separation vector 6 a governs geometric anisotropy A specifies the choice of metric and v are the parameters of the Mat rn correlation function The function is puldiov 2r 2 x S 7 1 where gt 0 is a range parameter v gt 0 is a smoothness parameter T is the gamma function K is the modified Bessel function of the third kind of order v Abramowitz and Stegun 1965 section 9 6 and d is the distanc
425. ructure for a term has changed ASReml will take results from some structures as supplying starting values for other structures The transitions recognised are CORUH to FA1 and XFA1 CORGH to US DIAG to CORUH DIAG to FA1 DIAG to XFA1 FAz to CORGH FAz to FAtt1 FAz to US XFA2 to XFAt 1 XFAz to US US to XFA1 XFA2 XFA3 Users may wish to keep output from a series of runs This can be done by using RENAME 1 ARG runnumber on the first line of the command file or alternatively R1 basename runnum ber on the command line This ensures that the output from the various parts has runnumber appended to the base filename If an rsv file does not exist for the particular runnumber 138 7 10 Default variance structures in ASReml you are running ASRem1 will retrieve starting values from the most recent rsv file formed by that job You can of course copy an rsv file building the new runnumber into its name so that ASRem1 uses that particular set of values The asr file keeps track of which rsv files have been formed If the user wishes to use different models with different runs then using DOPART 1 and specifying the different models in different parts will achieve this aim 7 10 Default variance structures in ASReml There are default variance structures in ASReml that allow the linear mixed model to be specified more succinctly IDV is the default variance structure for random model terms and for the residual error terms For
426. running the job with CONTINUE 3 You may not change values in the first 3 fields or RP fields where RP_GN is negative HHHH Fields are GN Term Type PSpace Initial_value RP_GN RP_scale 4 Variance 1 V P 1 00000000 4 1 5 ari row ar1i column ariv row _1 R P 0 65547976 5 si 6 ari row ar1 column ari column _1i R P 0 43750453 s 6 i Valid values for Pspace are F P U and maybe Z RP_GN and RP_scale define simple parameter relationships RP_GN links related parameters by the first GN number RP_scale must be 1 0 for the first parameter in the set and otherwise specifies the size relative to the first parameter Multivalue RP_scale parameters may not be altered here HH H H Notice that this file is overwritten if not being read 13 4 5 The pvc file The pvc file contains functions of the variance components produced by running a pin file on the results of an ASReml run as described in Chapter 12 The pin and pvc files for a half sib analysis of the Coopworth data are presented in Section 15 10 229 13 4 Other ASReml output files 13 4 6 The pvs file The pvs file contains the predicted values formed when a predict statement is included in the job Below is an edited version of nin89a pvs See Section 3 6 for the pvs file for the simple RCB analysis of the NIN data considered in that chapter NIN Alliance Trial 1989 03 Feb 2014 06 23 03 title line nin89a Ecode is E
427. s ASReml will also include the denominator degrees of freedom DenDF denoted by v2 Kenward and Roger 1997 and a probablity value if these can be computed They will be for the conditional Wald F statistic if it is reported The DDF 2 see page 67 qualifier can be used to suppress the DenDF calculation DDF 1 or request a particular algorithmic method DDF 1 for numerical derivatives DDF 2 for algebraic derivatives The value in the probability column either P_inc or P_con is computed from an Fava reference distribution An approximation is used for computational convenience when calculating the DenDF for Conditional F statistics using numerical derivatives The DenDF reported then relates to a maximal conditional incremental model MCIM which depending on the model order may not always coincide with the maximal conditional model MCM under which the conditional F statistic is calculated The MCIM model omits terms fitted after any terms ignored for the conditional test I after in marginality pattern In the example above MCIM ignores variety sow when calculating DenDF for the test of water and ignores water sow when calculating DenDF for the test of variety When DenDF 22 2 5 Inference Fixed effects is not available it is often possible though anti conservative to use the residual degrees of freedom for the denominator Kenward and Roger 1997 pursued the concept of construction of Wald type test statistics through a
428. s use the I option instead Otherwise you will have to convert a factor with alphanumeric labels to numeric sequential codes external to ASReml so that an A option can be avoided The data file may need to be rewritten with some factors re coded as sequential integers This is an internal limit Reduce the number of response vari ables Response variables may be grouped using the G factor definition qualifier so that more than 20 actual variables can be analysed this message occurs when there is an error forming the inverse of a variance structure The probable cause is a non positive definite initial variance structure US CHOL and ANTE mod els It may also occur if an identity by unstructured ID US error variance model is not specified in a multivariate analysis including ASMV see Chapter 8 If the failure is on the first iteration the problem is with the starting values If on a sub sequent iteration the updates have caused the problem You can specify GP to force the matrix positive definite and try reducing the updates by using the STEP qualifier Otherwise you could try fitting an alternative parameterisation generally refers to a problem setting up the mixed model equa tions Most commonly it is caused by a non positive definite matrix 266 14 5 Information Warning and Error messages Table 14 3 Alphabetical list of error messages and probable cause s remedies error message probable caus
429. s zero MinNonO Mean MaxNonO StndDevn 1 variety 56 0 0 1 28 5000 56 2 id 0 O 1 000 28 50 56 00 16 20 3 pid 0 Q 1101 2628 4156 1121 4 raw 0 Q 21 00 510 5 840 0 149 0 5 repl 0 0 1 2 5000 4 6 nloc 0 O 4 000 4 000 4 000 0 000 7 yield Variate 0 O 1 050 25 53 42 00 7 450 34 3 6 Description of output files 8 lat 0 O 4 300 ah 22 47 30 12 90 9 long 0 1 200 14 08 26 40 7 698 10 row 22 0 0 1 11 7321 22 11 column 11 0 0 1 6 3304 11 12 mu 1 Forming 61 equations 57 dense Initial updates will be shrunk by factor 0 400 Notice 1 singularities detected in design matrix 1 LogL 454 807 S2 50 329 168 df 0 1000 2 LogL 454 635 S2 50 073 168 df 0 1219 3 LogL 454 513 S2 49 818 168 df 0 1537 4 LogL 454 471 S2 49 622 168 df 0 1899 5 LogL 454 469 S2 49 584 168 df 0 1989 6 LogL 454 469 S2 49 582 168 df 0 1993 Final parameter values 0 1993 Results from analysis of yield Akaike Information Criterion 912 94 assuming 2 parameters Bayesian Information Criterion 919 19 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv rep1 3 00 603 100 56 0 1 0 Residual Variance 165 00 49 5824 0 0 10 Model_Term Gamma Sigma Sigma SE C idv repl IDV_V 4 0 199323 9 88291 1 12 O P parameter idv units 224 effects estimates Residual SCA_V 224 1 000000 49 5824 9 08 OF Wald F statistics Source of Variation NumDF DenDF F ing P inc testing 12 mu 1 3 0 242 05 lt 001 fixed
430. s formally appears in this hyper table regardless of whether it is fitted as fixed or random Note that variables evaluated at only one value for example a covariate at its mean value can be formally 176 9 3 Prediction introduced as part of the classify or averaging set c Determine which terms from the linear mixed model are to be used when predicting the cells in the multiway hyper table in order to obtain either conditional or marginal predictions That is you may choose to ignore some random terms in addition to those ignored because they involve variables in the ignored set All terms involving associated factors are by default included d Choose the weights to be used when averaging cells in the hyper table to produce the multiway table to be reported The multiway table may require partial and or sequential averaging over associated factors Operationally ASReml does the averaging in the prediction design matrix rather than actually predicting the cells of the hyper table and then averaging them The main difference in this prediction process compared to that described by Lane and Nelder 1982 is the choice of whether to include or exclude model terms when forming predictions In linear models since all terms are fixed factors not in the classify set must be in the averaging set and all terms must contribute to the predictions 9 3 2 Predict syntax The first step is to specify the classify set of NIN alliance
431. s not specified the value of v is 1 is used to join lines in plots see X 70 5 8 Job control qualifiers Table 5 4 List of occasionally used job control qualifiers qualifier action IMBF mbf v n f FACTOR FIELD s IKEY k NOKEY IRENAME t RFIELD r ISKIP k I SPARSE specified on a separate line after the datafile line predefines the model term mbf v n as a set of n covariates indexed by the data values in vari able v MBF stands for My Basis Function and uses the same mechanism as the leg pol and sp1 model functions but with covariates sup plied by the user It is used for reading in specialized design matrices indexed by a factor in the data including genetic marker covariables By default the file f should contain 1 n fields where the first field the key field contains the values which are in the data variable or at which pre diction is required and the remaining n fields define the corresponding covariate values If n is omitted all fields after the key field are taken unless FACTOR is specified for which n is 1 and the covariate values are treated as coding for a multilevel factor Set n to 1 to read just one field form the data file Also note that the file may be a binary file e g formed in a previous run using SAVE RENAME changes the name of the the term from mbf to the new name t This is necessary when several mbf terms are being defined which would otherwise
432. s sufficient to identify the term e interactions can involve model functions 6 5 2 Expansions e is ignored except at the end of the line where it indicates the model is continued on the next line e makes sure the following term is defined but does not include it in the model indicates factorial expansion up to 5 way a b is expanded to a b a b a b c d is expanded to abcda ba ca db c b d c d a b c a b d a c d b c d a b c d indicates nested expansion a b is expanded to a a b a b c d e is expanded to a b a c a d e This syntax is detected by the string and the closing parenthesis must occur on the same line and before any comma indicating continuation Any number of terms may be enclosed Each may have prepended to suppress it from the model 6 5 3 Conditional factors A conditional factor is a factor that is present only when another factor has a particular level e individual components are specified using the at f n function see Table 6 2 for exam 95 6 5 Interactions and conditional factors ple at site 1 row will fit row as a factor only for site 1 e a complete set of conditional terms are specified by omitting the level specification in the at f function provided the correct number of levels of fis specified in the field definitions e otherwise a list of levels may be specified see Table 6 2 e where variable fis coded with alphanumeric level names the level name m
433. s to plots in field plan order with replicates 1 and 3 in rrazics and replicates 2 and 4 in BOLD 25 Table 3 1 Trial layout and allocation of varieties to plots in the NIN field trial column row 1 2 3 4 5 6 7 8 9 10 11 1 NE83407 BUCKSKIN NE87612 VONA NE87512 NES87408 CODY BUCKSKIN NE87612 KS831374 2 CENTURA NE86527 NE87613 NE87463 NE83407 NE83407 NE87612 NE83406 BUCKSKIN NE86482 3 SCOUT66 NE86582 NE87615 NE86507 NE87403 NORKAN NE87457 NE87409 NE85556 NE85623 4 COLT NE86606 NES87619 BUCKSKIN NE87457 REDLAND NE84557 NE87499 BRULE NE86527 5 NE83498 NE86607 NE87627 ROUGHRIDER NE83406 KS8313874 NE838T12 CENTURA NE86507 NE87451 6 NE84557 ROUGHRIDER NE86527 COLT COLT NE86507 NE83432 ROUGHRIDER NE87409 7 NES88482 VONA CENTURA SCOUT66 NE87522 NE86527 TAM200 NE87512 VONA GAGE 8 NE85556 SIOUXLAND NE85623 NE86509 NORKAN VONA NE87613 ROUGHRIDER NE83404 NE83407 9 NE85623 GAGE CODY NE86606 NE87615 TAM107 ARAPAHOE NE83498 CODY NE87615 10 CENTURAK78 NE88T12 NE86582 NE84557 NE85556 CENTURAK78 SCOUT66 NE87463 ARAPAHOE 11 NORKAN NES86T666 NE87408 KS831374 TAM200 NE87627 NE87403 NE86T666 NE86582 CHEYENNE 12 KS8313874 NES87403 NE87451 GAGE LANCOTA NE86T666 NE85623 NE87403 NE87499 REDLAND 13 TAM200 NE87408 NE83432 NE87619 NE86503 NE87615 NE86509 NE87512 NORKAN NE83432 14 B NES86482 NES87409 CENTURAK78 NE87499 NE86482 NE86501 NE85556 NE87446 SCOUT66 NE87619 15 HOMESTEAD NES87446 NE83T12 CHEYENNE BRULE NE87522 HOMESTEAD CENTURA NE8751
434. se created or read but not labelled intermediate calculations not required for subsequent analysis When listing variables in the field definitions list those read from the data file first After them list and define the labelled variables that are to be created The number of variables read can be explicitly set using the READ qualifier described in Table 5 5 Otherwise if the first transformation on a field overwrites its contents for instance using ASReml recognises that the field does not need to be read in unless a subsequent field does need to be read For example A B C A B reads two fields A and B and constructs C as A B All three are available for analysis However A B C A B D E D B reads four fields A B C and D because the fourth field is not obviously created and must therefore be read even though the third field C is overwritten The fifth field is not read but just created E Variables that have an explicit label may be referenced by their explicit label or their internal label Therefore to avoid confusion do not use explicit labels of the form Vz where 7 is a number for variables to be referred to in a transformation Vi always refers to field variable i in a transformation statement 52 5 5 Transforming the data Variables that are not initialized from the data file are initialized to missing value for the first record and otherwise to the values from the preceding re
435. ships be tween animals This is an alternate method of estimating additive genetic variance for these data The data file has been modified by adding 10000 to the dam ID now 10001 13561 so that the lamb sire and dam ID s are distinct They appear as the first genetic relation ships are available for this data so the data file doubles as the pedigree file The multi trait additive genetic variance matrix X4 of the animals sires dams and lambs is given by var ua X48 A where A is the genetic relationship matrix and u4 are the trait BLUPs ordered animals within traits There are a total of 10696 92 3561 7043 animals in the pedigree Multivariate analysis involving several strata here animal direct additive genetic dam maternal and litter typically involves several runs The ASReml input file presented below has five parts which show the use of FA structures to get initial values for estima tion of unstructured matrices and their use when estimated unstructured matrices are not positive definite as is the case with the tag matrix here but omits earlier runs involved 327 15 10 Multivariate animal genetics data Sheep with linear model selection and obtaining initial values This model is not equivalent to the sire dam litter model with respect to the animal litter components for gfw fd and fat IRENAME 1 ARG 1 CHANGE 1 TO 2 3 4 OR 5 FOR OTHER PATHS Multivariate Animal model DOPART 1 tag P Bire 92 lII da
436. sing values as zero in covariates is usually only acceptable if the covariate is centred has mean of zero Design factors Where the factor level is zero or missing and the MVINCLUDE qualifier is specified no level is assigned to the factor for that record These effectively defines an extra level class in the factor which becomes a reference level 6 10 Some technical details about model fitting in ASReml 6 10 1 Sparse versus dense ASReml partitions the terms in the linear model into two parts a dense set and a sparse set The partition is at the r point unless explicitly set with the DENSE data line qualifier or mv is included before r see Table 5 5 The special term mv is always included in sparse Thus random and sparse terms are estimated using sparse matrix methods which result in faster processing The inverse coefficient matrix is fully formed for the terms in the dense set The inverse coefficient matrix is only partially formed for terms in the sparse set Typically the sparse set is large and sparse storage results in savings in memory and computing A consequence is that the variance matrix for estimates is only available for equations in the dense portion 6 10 2 Ordering of terms in ASReml The order in which estimates for the fixed and random effects in linear mixed model are reported will usually differ from the order the model terms are specified Solutions to the mixed model equations are obtained using the methods
437. skip 1 yield mu variety r idv repl If mv residual idv units emphasise that it is always included in the sparse equa tions If mv is listed in the fixed effects section it and any following fixed effect terms are processed as sparse see Section 6 10 1 Formally mv creates a factor with a covariate for each missing value The covariates are coded 0 except in the record where the particular missing value occurs where it is coded 1 The action when mv is omitted from the model depends on whether a univariate or multivariate analysis is being performed For a univariate analysis ASReml discards records which have a missing response In multivariate analyses all records are retained and the R matrix is modified to reflect the missing value pattern 6 9 2 Missing values in the explanatory variables ASReml will abort the analysis if it finds missing values in the design matrix which are not directly associated with missing values for the response or logically excluded from the model 105 6 10 Some technical details about model fitting in ASReml by being in combination with an at term which evaluates to ZERO unless MVINCLUDE or MVREMOVE is specified see Section 5 8 MVINCLUDE causes the missing value to be treated as a zero MVREMOVE causes ASReml to discard the whole record Records with missing values in particular fields can be explicitly dropped using the DV transformation Table al Covariates Treating mis
438. so the output files do not reflect the latest program output In this case use the Unix script screen log command before running ASReml with the DEBUG qualifier but without the LOGFILE qualifier to capture all the debugging information in the file screen log The debug information pertains particularly to the first iteration and includes timing infor mation reported in lines beginning gt gt gt gt gt gt gt gt gt gt gt gt These lines also mark progress through the iteration 13 4 3 The dpr file The dpr file contains the data and residuals from the analysis in double precision binary form The file is produced when the RES qualifier Table 4 3 is invoked The file could be renamed with filename extension dbl and used for input to another run of ASReml Alternatively it could be used by another Fortran program or package Factors will have level codes if they were coded using A or I All the data from the run plus an extra column of residuals is in the file Records omitted from the analysis are omitted from the file 13 4 4 The msv file The msv file contains the variance parameters from the most recent iteration of a model in a form that is relatively easy to edit if the values need to be reset The file is read when MSV or CONTINUE 3 is specified This is nin89a msv 228 13 4 Other ASReml output files This msv file is a mechanism for resetting initial parameter values by changing the values here and re
439. sociate the variance structure with the appropriate component of a model term a brief description the algebraic form of the model and the number of associated variance structure parameters The models span correlation base models diagonal elements equal to 1 and correlations on the off diagonals the extension of these to variance models variances on the diagonals and covariance on the off diagonals additional models that are parameterized as variance matrices rather than as correlation matrices and some special cases where the covariance 140 7 11 Variance model functions available in ASReml structure is known except for the scale See Sections 7 2 and 7 10 for important points to note in defining variance structures in ASReml 7 11 1 Forming variance models from correlation models The variance function models presented under correlation models in Table 7 6 id matk are used to specify the correlation models for the corresponding variance structures The corresponding homogeneous and heterogeneous variance models are specified by appending v and h to the variance model function names respectively and appending the corresponding variance parameters to the corresponding list of parameters This convention holds for most models It does not make sense to append v or h to the variance model function names for the heterogeneous variance models from diag xfak In summary e to specify a correlation model provide the varian
440. sparsely fitted fixed HYS factor The number of Fixed effects degrees of freedom associated with GROUPS is taken as the declared number less twice the number of constraints applied This assumes all groups are represented in the data and that degrees of freedom associated with group constraints will be fitted elsewhere in the model Each cross is assumed to be selfed several times to stabilize as an inbred line as is usual for cereals such as wheat before being evaluated or crossed with another line Since inbreeding is usually associated with strong selection it is not obvious that a pedigree assumption of covariance of 0 5 between parent and offspring actually holds Do not use the INBRED qualifier with the MGS or SELF qualifiers indicates the identifiers are numeric integer with less than 16 digits The default is integer values with less than 9 digits The alternative is alphanumeric identifiers with up to 255 character indicated by ALPHA forces ASReml to make the A inverse rather than trying to retrieve it from the ainverse bin file The default method for forming A is based on the algorithm of Meuwissen and Luo 1992 indicates that the third identity is the sire of the dam rather than the dam The original routine for calculating A in ASReml was based on Quaas 1976 tells ASReml to ignore repeat occurrences of lines in the pedigree file Warning Use of this option will avoid the check that animals occur in generation
441. specifying separable G structures in ASReml 2 1 13 Range of variance models for R and G structures A range of models are available for the components of both R and G structures They include correlation C models that is where the diagonals are 1 or covariance V models and are discussed in detail in Chapter 7 Among the range of correlation models are e identity that is independent and identically distributed with variance 1 e autoregressive order 1 or 2 e moving average order 1 or 2 e ARMA 1 1 e uniform e banded e general correlation Among the range of covariance models are e scaled identity that is independent and identically distributed with homogenous vari ances e diagonal that is independent with heterogeneous variances e antedependence 11 2 2 Estimation e unstructured e factor analytic There is also the facility to define models based on relationship matrices including additive relationship matrices generated by pedigrees and using user specified variance matrices 2 1 14 Combining variance models in R and G structures The combination of variance models in separable G and R structures is a difficult and im portant concept This is discussed in detail in Chapter 7 2 2 Estimation Consider the sigma parameterization of Section 2 1 1 Estimation involves two processes that are closely linked They are performed within the engine of ASReml One process involves estimation
442. splines Oranges analysis The individual curves for each tree are not convincingly modelled by a logistic function Figure 15 16 presents a plot of the residuals from the nonlinear model fitted on p340 of Pinheiro and Bates 2000 The distinct pattern in the residuals which is the same for all trees is taken up in our analysis by the season term Residual T T T T T T T 200 400 600 800 1000 1200 1400 1600 age Figure 15 16 Plot of the residuals from the nonlinear model of Pinheiro and Bates 316 15 10 Multivariate animal genetics data Sheep 15 10 Multivariate animal genetics data Sheep The analysis of incomplete or unbalanced multivariate data often presents computational difficulties These difficulties are exacerbated by either the number of random effects in the linear mixed model the number of traits the complexity of the variance models being fitted to the random effects or the size of the problem In this section we illustrate two approaches to the analysis of a complex set of incomplete multivariate data Much of the difficulty in conducting such analyses in ASReml centres on obtaining good starting values Derivative based algorithms such as the Al algorithm can be unreliable when fitting complex variance structures unless good starting values are available Poor starting values may result in divergence of the algorithm or slow convergence A particular problem with fitting unstructured variance models
443. stStat IDV_V 4 0 642752E 01 0 328704E 02 0 98 GP idv Setstat IDV_V 10 0 233416 0 119369E 01 1 35 QP idv Regulator Set IDV_V 80 0 601817 0 3077 70E 01 3 64 OP idv units 256 effects Residual SCA_V 256 1 000000 0 511400E 01 9 72 OP Table 15 3 REML log likelihood ratio for the variance components in the voltage data REML 2x terms log likelihood difference P value setstat 200 31 5 864 0077 setstat regulator 184 15 38 19 0000 teststat 199 71 7 064 0039 278 15 5 Balanced repeated measures Height 15 5 Balanced repeated measures Height The data for this example is taken from the GENSTAT manual It consists of a total of 5 measurements of height cm taken on 14 plants The 14 plants were either diseased or healthy and were arranged in a glasshouse in a completely random design The heights were measured 1 3 5 7 and 10 weeks after the plants were placed in the glasshouse There were 7 plants in each treatment The data are depicted in Figure 15 3 obtained by qualifier line IY yi G tmt JOIN in the following multivariate ASReml job Y yl This is plant data multivariate Y axis 21 0000 130 5000 x axfs 0 5000 5 5000 1 2 Figure 15 3 Trellis plot of the height for each of 14 plants In the following we illustrate how various repeated measures analyses can be conducted in ASReml For these analyses it is convenient to arrange the data in a multivariate form with 7 fields representing the p
444. stat term was 203 242 the same as the REML log likelihood for the previous model Table 15 3 presents a summary of the REML log likelihood ratio for the remaining terms in the model The summary of the ASReml output for the current model is given below The column labelled Sigma SE is printed by ASReml to give a guide as to the significance of the variance component for each term in the model The statistic is simply the REML estimate of the variance component divided by the square root of the diagonal element for each component of the inverse of the average information matrix The diagonal elements of the expected not the average information matrix are the asymptotic variances of the REML estimates of the variance parameters These Sigma SE statistics cannot be used to test the null hypothesis that the variance component is zero If we had used this crude measure then the conclusions would have been inconsistent with the conclusions obtained from the REML log likelihood ratio test 277 15 4 Source of variability in unbalanced data Volts ltage example 5 3 6 from the GENSTAT REML manual Residuals vs Fitted valu Residuals Y 1 08 1 45 Fitted values X 15 56 16 81 o o O o z 0o foray Po 7 o s o o oo d oo 0o o o 2 o o o o o 8 a Pg oo G5 9 o Og o o o o CPA Sa Po a o Go 6 o o o o o o Figure 15 2 Residual plot for the voltage data see Table 15 3 Model_Term Gamma Sigma Sigma SE C idv Te
445. stics see OUTLIER If a job is being run a large number of times significant gains in processing time can some times be made by reorganising the data so reading of irrelevant data is avoided using binary data files use of CONTINUE to reduce the number of iterations and avoiding unnecessary output see SLNFORM YHTFORM and NOGRAPHICS 10 5 3 Timing processes The elapsed time for the whole job can be calculated approximately by comparing the start time with the finish time Timings of particular processes can be obtained by using the DEBUG LOGFILE qualifiers on the first line of the job This requests the asl file be created and hold some intermediate results especially from data setup and the first iteration Included in that information is timing information on each phase of the job 206 11 Command file Merging data files 11 1 Introduction The MERGE directive described in this chapter is designed to combine information from two files into a third file with a range of qualifiers to accomodate various scenarios It was developed with assistance from Chandrapal Kailasanathan to replace the MERGE qualifier see page 64 which had very limited functionality The MERGE directive is placed BEFORE the data filename lines It is an independent part of the ASReml job in the sense that none of the files are necessarily involved in the subsequent analyses performed by the job and there may be multiple MERGE directives Indeed th
446. stimable aliased cell s may be omitted because ASReml checks that predictions are of estimable functions in the sense defined by Searle 1971 p160 and are invariant to any constraint method used Immediate things to check include whether every level of every fixed factor in the averaging set is present and whether all cells in every fixed interaction is filled For example in the previous example no variety predictions would be obtained if site was declared as having 4 levels but only three were present in the data The message is also likely if any fixed model terms are IGNOREd The TABULATE command may be used to see which treatment combinations occur and in what order More formally there are often situations in which the fixed effects design matrix X is not of full column rank This aliasing has three main causes e linear dependencies among the model terms due to over parameterisation of the model e no data present for some factor combinations so that the corresponding effects cannot be estimated e linear dependencies due to other usually unexpected structure in the data The first type of aliasing is imposed by the parameterisation chosen and can be determined from the model The second type of aliasing can be detected when setting up the design matrix for parameter estimation which may require revision of imposed constraints All types are detected in ASReml during the absorption process used to obtain the predicted val
447. sv Notice LogL values are reported relative to a base of 20000 000 Note XFA model lower loadings initially held fixed Notice 29764 singularities detected in design matrix 1 LogL 1558 44 S2 1 00000 18085 df i 1 components restrained 2 LogL 1541 77 S2 1 00000 18085 df 8 components restrained 3 LoghL 1538 27 S2 1 00000 18085 df 1 components restrained 4 LogL 1534 53 S2 1 00000 18085 df 1 components restrained 5 LogL 1532 53 S2 1 00000 18085 df 1 components restrained 6 LogL 1531 90 S2 1 00000 18085 df 1 components restrained Note XFA model fitted with rotation 7 LogL 1531 73 S2 1 00000 18085 df 1 components restrained 8 LogL 1531 66 S2 1 00000 18085 df 9 LogL 1531 64 S2 1 00000 18085 df 10 LogL 1531 64 S2 1 00000 18085 df Results from analysis of wwt ywt gfw fdm fat Akaike Information Criterion 43151 28 assuming 44 parameters Bayesian Information Criterion 43494 60 Model_Term Sigma Sigma Sigma SE C id units us Trait 35200 effects Trait USV 1 1 8 73848 8 73848 30 29 OP Trait usc 2 1 7 28418 7 28418 20 19 0 P Trait USV 2 2 17 7519 17 7519 26 87 0 P Trait US_C 3 1 0 247701 0 247701 5 87 OP Trait US_C 3 2 0 705206 0 705206 14 31 OP Trait US_V 3 3 0 109534 0 109534 14 21 0 P Trait US_C 4 1 0 816946 0 816946 aeee 0 P Trait US_C 4 2 2 03823 2 03823 3 68 OP Trait US_C 4 3 0 252623 0 252625 3 82 0 P Trait US_V 4 4 3 31364 3 31364 7 50 0 P Trait USC amp 1 0 871291 0 871291 6 95 0 P Trait US_C 5 2 2 53
448. t xfai TrDam123 id dam id dam xfa1 TrDam123 id dam xfa1 TrDam123 id dam xfai TrDam123 id dam xfai TrDam123 id dam xfai TrDam123 id dam xfai TrDam123 us TrLit1234 id lit 44 us TrLit1234 id lit us TrLit1234 38 39 40 41 42 43 xfai TrDami23 xfai TrDami23 xfai TrDami23 xfai TrDami23 xfai TrDami23 xfai TrDam123 14244 effects 19484 effects OQ oa na sianasdsoon a lt sac so qn as QOannstanasas sasaac Pre lt lt lt 4 lt 325 Dam 14 12 751 annnn nwFrrRPRPRBPWWWNYNND KE annnn nFrRRRPWWWNHN KB PrRPROOO S w N PWN OP WNP BONE WN N EE OPWNHRFPFRPRPWNHRPWNHRENFR KE QNeEF WON ywt gfw fdm fat 9 46109 7 34181 17 6050 0 272536 0 668009 0 141595 0 963017 1 9977 1 0 286984 3 64374 0 850282 2 48313 0 786089E 01 0 115894 1 63175 1 01106 16 0229 0 280259 0 0 0 0 0 0 0 0 0 0 l O 0 0 132755E 02 976533E 03 176684E 02 208076E 03 593942 677334 1 55632 280482E 01 287861E 02 150192F 01 596227E 01 657014E 01 477561E 02 157854 407282E 01 133338 B77 122E 03 472300E 01 326718E 01 126746E 01 0 00000 661114E 02 1 46479 1 51911 11077 3 55275 oo oOo 0 0 Oo CO O00 OO Oo 0 Oo Oo 0 0 oO OGO oOo oO Oo Oo 2 CO Oo oo Oo Oo 0 0 0 0 0 0 0 284202 357266 649871 325222E 01 477490E 01 597447E 02 333224 548821 564929F 01
449. t 001 4 variety 2 10 0 1 49 0 272 2 nitrogen 3 45 0 37 69 lt 001 8 variety nitrogen 6 45 0 0 30 0 932 For simple variance component models such as the above the default parameterisation for the variance component parameters is as the ratio to the residual variance Thus ASReml prints the variance component ratio and variance compo for each term in the random model in the columns labelled Gamma and Component respectively A table of Wald F statistics is printed below this summary The usual decomposition has three strata with treatment effects separating into different strata as a consequence of the balanced design and the allocation of variety to whole plots In this balanced case it is straightforward to derive the ANOVA estimates of the stratum variances from the REML estimates of the variance components That is blocks 126 462 Go 3175 1 blocks wplots 46 6 601 3 residual 6 177 1 The default output for testing fixed effects used by ASReml is a table of so called incremental Wald F statistics These Wald F statistics are described in Section 6 11 They are simply the Wald test statistics divided by the number of estimable effects for that term In this example there are four terms included in the summary The overall mean denoted by mu is of no interest for these data The tests are sequential that is the effect of each term is assessed by the change in sums of squares achieved by adding the term to the current model
450. t asr Yy YVAR y overrides the value of response the variate to be analysed see Section 6 2 with the value y where y is the number of the data field containing the trait to be analysed This facilitates analysis of several traits under the same model The value of y is appended to the basename so that output files are not overwritten when the next trait is analysed 10 3 6 Workspace command line options S W 198 10 4 Advanced processing arguments The workspace requirements depend on problem size and may be quite large On 32bit computers the maximum is 2000Mbyte under Linux 1600 Mbyte under Windows On 64bit systems the maximum is 32 Gbyte but may be less depending on the machine configuration The default allocation is 32Mbyte 4 million double precision words An increased workspace allocation may be requested on the command line with the Wm option Wm WORKSPACE m sets the initial size of the workspace in Mbytes For example W1600 requests 1600 Mbytes of workspace the maximum typically available under Windows W2000 is the maximum available on 32bit Unix Linux systems On 64bit systems the argument if less than 33 is taken as Gbyte If your system cannot provide the requested workspace the request will be diminished until it can be satisfied On multi user systems do not unnecessarily request the maximum or other users may complain Having started with an initial allocation if ASReml realises more space is require
451. t file names Table 15 6 Field layout of Slate Hall Farm experiment Column Replicate levels Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 2 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 5 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 6 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 8 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 9 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 10 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 Column Rowblk levels Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 I 1 1 1 1 1 11 11 11 11 11 2 21 21 21 21 2 2 2 2 2 2 12 12 12 12 12 22 22 22 22 22 3 3 3 3 3 3 13 13 13 13 13 23 23 23 23 23 4 4 4 4 4 4 14 14 14 14 14 24 24 24 24 4 5 5 5 5 5 5 15 15 15 15 15 25 25 25 25 25 6 6 6 6 6 6 16 16 16 16 16 26 26 26 26 26 7 7 7 7 7 7 17 17 17 17 17 27 27 237 27 27 8 8 8 8 8 8 18 18 18 18 18 28 28 28 28 28 9 9 9 9 9 9 19 19 19 19 19 29 29 29 29 29 10 10 10 10 10 10 20 20 20 20 20 30 30 30 30 30 Column Colblk levels Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5 1 2 3 4 5 6 T 8 9 10 11 12 13 14 15 6 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 7 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 8 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 9 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 10 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 EPS
452. t relate to labelled variables to the internal data array Note that e there may be up to 10000 variables and these are internally labeled V1 V2 V10000 for transformation purposes Values from the data file ignoring any SKIPed fields are read into the leading variables e alpha A integer I pedigree P and date DATE fields are converted to real num bers level codes as they are read and before any transformations are applied 51 5 5 Transforming the data transformations may be applied to any variable since every variable is numeric but it may not be sensible to change factor level codes transformations operate on a single variable not a G group of variables unless it is explicitly stated otherwise transformations are performed in order for each record in turn variables that are created by transformation should be defined after below variables that are read from the data file unless it is the explicit intention to overwrite an input variable see below after completing the transformations for each record the values in the record for variables associated with a label are held for analysis or the record all values is discarded see D transformation and Section 6 9 Thus variables form three classes those read from the data file possibly modified and labelled are available for subsequent use in analysis those created and labelled are also available for subsequent use in the analysis and tho
453. t specified values If interest lies in the relationship of the response variable to the covariate predict a suitable grid of covariate values to reveal the relationship Otherwise predict at an average or typical value of the covariate The default is to predict at the mean covariate value Omission of a covariate from the prediction model is equivalent to predicting at a zero covariate value which is often not appropriate unless the covariate is centred Before considering the syntax it is useful to consider the conceptual steps involved in the prediction process Given the explanatory variables fixed factors random factors and co variates used to define the linear mixed model the four main steps are a Choose the explanatory variable s and their respective level s value s for which pre dictions are required the variables involved will be referred to as the classify set and together define the multiway table to be predicted Include only one from any set of associated factors in the classify set b Note which of the remaining variables will be averaged over the averaging set and which will be ignored the ignored set The averaging set will include all variables involved in the fixed model but not in the classify set Ignored variables may be explicitly added to the averaging set The combination of the classify set with these averaging variables defines a multiway hyper table Only the base factor in a set of associated factor
454. t what combinations are present from the design matrix It may have trouble with complicated models such as those involving and terms A second PRESENT qualifier is allowed on a predict statement but not with PRWTS The two lists must not overlap is used in conjunction with the first PRESENT v list to specify the weights that ASReml will use for averaging that PRESENT table More details are given below 181 9 3 Prediction Table 9 1 List of prediction qualifiers qualifier action Controlling inclusion of model terms EXCEPT t IGNORE t ONLYUSE t USE t Printing IDEC n PLOT z PRINTALL SED TDIFF TURNINGPOINTS n causes the prediction to include all fitted model terms not in t causes ASReml to set up a prediction model based on the default rules and then removes the terms in t This might be used to omit the spline Lack of fit term IGNORE fac x from predictions as in yield mu x variety r spl x fac x predict x IGNORE fac x which would predict points on the spline curve averaging over variety causes the prediction to include only model terms in t It can be used for example to form a table of slopes as in HI mu X variety X variety predict variety X 1 onlyuse X X variety causes ASReml to set up a prediction model based on the default rules and then adds the terms listed in t gives the user control of the number of decimal places reported in the t
455. t y Py y X H y X7 The log likelihood 2 11 depends on X and not on the particular non unique transformation defined by L The log residual likelihood ignoring constants can be written as 1 lp 5 los det C log det R log det G y Py 2 12 We can also write P R R WC W R with W X Z Letting k o o the REML estimates of x are found by calculating the score 1 U K OlR OK zt PH y PH Py 2 13 and equating to zero Note that H 0H r The elements of the observed information matrix are lR 1 1 tr PH tr PHPH 1 y PH PH Py zy PH Py 2 14 where H 0 H O0K 0K The elements of the expected information matrix are ec ee PH PH 2 15 ki kj 7 2 a i f Given an initial estimate an update of K using the Fisher scoring FS algorithm is KD RO 4 KO KOTU KO 2 16 where U is the score vector 2 13 and I K k is the expected information matrix 2 15 of k evaluated at 13 2 2 Estimation For large models or large data sets the evaluation of the trace terms in either 2 14 or 2 15 is either not feasible or is very computer intensive To overcome this problem ASReml uses the Al algorithm Gilmour Thompson and Cullis 1995 The matrix denoted by T4 is obtained by averaging 2 14 and 2 15 and approximating y PH Py by its expectation tr PH in those cases when H 0 For var
456. tatistics for each term is the so called incremental form For this method Wald statistics are computed from an incremental sum of squares in the spirit of the approach used in classical regression analysis see Searle 1971 For example if we consider a very simple model with terms relating to the main effects of two qualitative factors A and B given symbolically by yv 1 A B where the 1 represents the constant term u then the incremental sums of squares for this model can be written as the sequence R 1 R Al1 R 1 A R 1 R B 1 A R 1 A B R 1 A where the R operator denotes the residual sums of squares due to a model containing its argument and R denotes the difference between the residual sums of squares for any pair 19 2 5 Inference Fixed effects of nested models Thus R B 1 A represents the difference between the reduction in sums of squares between the so called maximal model yv1 A B and y 1l A Implicit in these calculations is that e we only compute Wald statistics for estimable functions Searle 1971 page 408 e all variance parameters are held fixed at the current REML estimates from the maximal model In this example it is clear that the incremental Wald statistics may not produce the desired test for the main effect of A as in many cases we would like to produce a Wald statistic for A based on R A 1 B R 1 A B R 1 B The issue is further complicated w
457. ted in estimating the correlations between distinct traits for example fleece weight and fibre diameter in sheep and for repeated measures of a single trait 8 1 1 Repeated measures on rats Wolfinger 1996 summarises a range of Wolfinger rat data variance structures that can be fitted to treat A repeated measures data and demonstrates wtO wt1 wt2 wt3 wt4 the models using five weights taken weekly T t 4at l on 27 rats subjected to 3 treatments This e ni Ea command file demonstrates a multivari a us Trait GP ate analysis of the five repeated measures Note that the two dimensional structure for residual errors meets the requirement of inde pendent units and corresponds to the data being ordered traits within units 153 8 2 Model specification 8 1 2 Wether trial data Three key traits for the Australian wool in Orange Wether Trial 1984 8 dustry are the weight of wool grown per SheepID I year the cleanness and the diameter of TRIAL that wool Much of the wool is produced BloodLine I from wethers and most major producers post oe have traditionally used a particular strain o ner dat batty 4 or bloodline To assess the importance of GFW FDIAM Trait Trait YEAR bloodline differences many wether trials r us Trait id TEAM us Trait id SheepID were conducted One trial conducted from residual id units us Trait GP 1984 to 1988 at Borenore near Orange in Predict YEAR
458. ter space but do not always work well when there are several matrices on the boundary The options are EMFLAG 1 Standard EM plus 10 local EM steps EMFLAG 2 Standard EM plus 10 local PXEM steps PXEM 2 Standard EM plus 10 local PXEM steps EMFLAG 3 Standard EM plus 10 local EM steps EMFLAG 4 Standard EM plus 10 local EM steps EMFLAG 5 Standard EM only EMFLAG 6 Single local PXEM EMFLAG 7 Standard EM plus 1 local EM step EMFLAG 8 Standard EM plus 10 local EM steps Options 3 and 4 cause all US structures to be updated by PX EM if any particular one requires EM updates The test of whether the AI updated matrix is positive definitite is based on absorbing the matrix to check all pivots are positive Repeated EM updates may bring the matrix closer to being singular This is assessed by dividing the pivot of the first element with the first diagonal element of the matrix If it is less than 1077 this value is consistent with the multiple partial correlation of the first variable with the rest being greater than 0 9999999 ASReml fixes the matrix at that point and estimates any other parameters conditional on these values To preceed with further iterations without fixing the matrix values would ultimately make the matrix such that it would be judged singular resulting the analysis being aborted T1 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action EQORDER o
459. tes Hybridi aif which contains the identifier names IPART 2 reads in inverse additive relationship matrix generated in PART 1 Mline A L Hybridi aif SKIP 1 associates identifier names with levels of Mline used in giv file Pline IP 164 8 9 Reading a user defined inverse relationship matrix Fline ped GIV DIAG Hybridi_A giv formed in part 1 from Mline ped Hybrid asd SKIP 1 grm1i Mline nrm Fline using new synonyms and functions 8 9 1 Genetic groups in GIV matrices If a user creates a GIV file outside ASReml which has fixed degrees of freedom associated with it a GROUPSDF n qualifier is provided to specify the number of fixed degrees of freedom n incorporated into the GIV matrix The GROUPSDF qualifier is written into the first line of the giv matrix produced by the GIV qualifier of the pedigree line if the pedigree includes genetic groups and will be honoured from there when reusing a GIV matrix formed from a pedigree with genetic groups in ASReml When groups are constrained then it will be the number of groups less number of constraints For example if the pedigree file qualified by GROUPS 7 begins AOO Q w gt B O ABC is not present in the subsequent pedigree lines mo oO Om lt 2 2 Cr lt gt gt ti DE 0 O DE is not present in the subsequent pedigree lines there are actually only 5 genetic groups and two constraints so that the fixed effects for A B and C sum to zero and for D and
460. the RENAME ARG argument from the most recent run so that ASReml can retrieve restart values from the most recent run when CONTINUE is specified but there is no particular rsv file for the current ARG argument asp contains transformed data see PRINT in Table 5 2 ass contains the data summary created by the SUM qualifier see page 68 dbr dpr spr contains the data and residuals in a binary form for further analysis see RESIDUALS Table 5 5 Veo holds the equation order to speed up re running big jobs when the model is unchanged This binary file is of no use to the user vll holds factor level names when data residuals are saved in binary form See ISAVE on page 81 vrb contains the estimates of the fixed effects and their variance if VRB qualifier specified VVp contains the approximate variances of the variance parameters It is designed to be read back for calculating functions of the variance parameters see VPREDICT in chapter 12 was basename was is open while ASReml is running and deleted when it finishes It will normally be invisible to the user unless the job crashes It is used by ASRemlI W to tell when the job finishes xml contains key information from the asr pvs and res file in a form easier for computers to parse An ASReml run generates many files and the sln and yht files in particular are often quite large and could fill up your disk space You should therefore regularly tidy your wor
461. the R structure definition lines e there is an error in the G structure definition lines there is a factor name error there is a missing parameter there are too many few initial values The most common problem in running ASReml is that a variable label is misspelt 245 14 2 Common problems The primary file to examine for diagnostic messages is the asr file When ASReml finds something atypical or inconsistent it prints an diagnostic message If it fails to successfully parse the input it dumps the current information to the asr file Below is the output for a job that has been terminated due to an coding error If a job has an error you should e read the whole asr file looking at all messages to see whether they identify the problem e focus particularly on any error message in the Fault line and the text of the Last line read this line appears twice in the file to make it easier to find e check that all variables have been defined and are referenced with the correct case e some errors arise from conflicting information the error may point to something that appears valid but is inconsistent with something earlier in the file e reduce to a simpler model and gradually build up to the desired analysis this should help to identify the exact location of the problem If the problem is not resolved after these checks you may need to email Customer Support at support asreml co uk Please send the as file a
462. the actual parameter fitted the Sigma column reports the gamma converted to a variance scale if appropriate Sigma SE is the ratio of the component relative to the square root of the diagonal element of the inverse of the average information matrix Warning Sigma SE should not be used for formal testing The shows the percentage change in the parameter at the last iteration use VPREDICT see Chapter 12 to calculate meaningful functions of the variance com ponents e a table of Wald F statistics for testing fixed effects Section 6 11 The table contains the 219 13 3 Key output files numerator degrees of freedom for the terms and incremental F statistics for approximate testing of effects It may also contain denominator degrees of freedon a conditional Wald F statistic and a significance probability e estimated effects their standard errors and values for equations in the DENSE portion of the SSP matrix are reported if BRIEF 1 is invoked the T prev column tests difference between successive coefficients in the same factor The reported log likelihood value may be positive or negative and typically excludes some constants from its calculation It is sometimes reported relative to an offset when its magnitude exceeds 10000 any offset is reported in the asr file Twice the difference in the likelihoods for two models is commonly used as the basis for a likelihood ratio test see page 16 This
463. the data file with alphabetic level names but ASReml row column is expecting integer level codes Changing the ae 7 variety line to read variety A resolves Pp ARRE EEE this problem residual ari Row ar1 Col predict varierty Folder C Users Public ASRem1 Docs Manex4 ERR QUALIFIERS SKIP 1 Reading nin89 asd FREE FORMAT skipping 1 lines Univariate analysis of yield Notice Maybe you want A L qualifiers for this factor LANCER Error at field 1 LANCER of record 1 line 1 Since this is the first data record you may need to skip some header lines see SKIP or append the A qualifier to the definition of factor variety Fault Missing faulty SKIP or A needed for variety Last line read was LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 Currently defined structures COLS and LEVELS 1 variety 1 2 2 0 0 0 11 column 1 2 2 0 10 0 12 mu 0 1 8 0 i 0 ninerr3 variety id pid raw rep nloc yield lat Model specification TERM LEVELS GAMMAS mu 0 variety 0 12 factors defined max5000 O variance parameters max2500 2 special structures Last line read was LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 Finished 23 Apr 2014 09 17 05 540 Missing faulty SKIP or A needed for variety 249 14 4 An example 4 A missing comma After correcting the definition of variety we NIN Alliance Trial 1989 get the following abbreviated output We variety A have at least now read the data file as indi cate
464. the factor it is applied to and the order of the rows must match the order of the levels nrm scaled vari SS specified applies a generated relationship matrix derived from ance the functions argument associated pedigree file us variance Vig ij general unstructured symmetric positive definite co variance matrix xfak variance V AATHY factor analytic model of order k with A of size n x k structure name column 4 and a corresponding variance model function name column 5 giving the associated component variance structures column 6 The consolidated model term is the term presented in the final column of the table In contrast in ASReml 3 the linear model terms are defined on the model line and subsequently a G structure line is given for each linear model term which specifies the component terms and their associated structures The simplest form of a consolidated model term is a single model term with a variance model function applied eg idv repl in Table 7 2 and the next simplest is a compound model term with a variance model function applied eg idv A B in Table 7 2 In summary the following are rules in forming consolidated model terms and applying vari ance model functions to random model terms variance model functions can be applied to single model terms see example 1 in Table 7 2 the components in compound model terms examples 4 to 6 and single model terms with a constructed linear model function example 2
465. the field in this case If the covariate values are irregular you would leave the field as a covariate and use the fac function to derive a factor version forms the natural log of v r This may also be used to transform the response variable creates a first differenced by rows design matrix which when defining a random effect is equivalent to fitting a moving average variance structure in one dimension In the mat form the first difference operator is coded across all data points assuming they are in time space order Otherwise the coding is based on the codes in the field indicated is a term that is predefined by using the MBF qualifier see page 71 98 6 6 Alphabetic list of model functions Table 6 2 Alphabetic list of model functions and descriptions model function action mu mv out n out n t pol v n p v n pow z pL o is used to fit the intercept constant term It is normally present and listed first in the model It should be present in the model if there are no other fixed factors or if all fixed terms are covariates or contrasts except in the special case of regression through the origin is used to estimate missing values in the response variable Formally this creates a model term with a column for each missing value Each column contains zeros except for a solitary 1 in the record containing the corresponding missing value This is used in spatial analyses so that computing a
466. the graphics to files whose names are built up as lt basename gt lt args gt lt type gt lt pass gt lt section gt lt ext gt where square parenthe ses indicate elements that might be omitted lt basename gt is the name portion of the as file lt args gt is any argument strings built into the output names by use of the RENAME qual ifier lt type gt indicates the contents of the figure as given in the following table lt pass gt is inserted when the job is repeated RENAME or CYCLE to ensure filenames are unique across repeats lt section gt is inserted to distinquish files produced from different sections of data for example from multisite spatial analysis and lt ext gt indicates the file graphics format lt type gt file contents R marginal means of residuals from spatial analysis of a section V variogram of residuals from spatial analysis for a section S residuals in field plan for a section H histogram of residuals for a section _RvE residuals plotted against expected values XYGi figure produced by X Y and G qualifiers PV_i Predicted values plotted for PREDICT directive 7 The graphics file format is specified by following the G or H option by a number g or specifying the appropriate qualifier on the top job control line as follows 197 10 3 Command line options g qualifier description lt ert gt 1 HPGL HP GL pgl 2 VPS Postscript default ps 6 BMP BMP bmp 10 WPM Wind
467. the size of factor labels stored extra data fields on a line are ignored if there are fewer data items on a line than ASReml expects the remainder are taken from the following line s except in csv files were they are taken as missing If you end up with half the number of records you expected this is probably the reason e all lines beginning with followed by a blank are copied to the asr file as comments for the output their contents are ignored 4 2 2 Fixed format data files The format must be supplied with the FORMAT qualifier which is described in Table 5 5 However if all fields are present and are separated the file can be read free format 4 2 3 Preparing data files in Excel Many users find it convenient to prepare their data in EXCEL ACCESS or some other database Such data must be exported from these programs into either csv Comma separated values or txt TAB separated values form for ASReml to read it ASReml can convert an x1s file to a csv file When ASReml is invoked with an x1s file as the filename argument and there is no csv file or as with the same basename it exports the first sheet as a csv file and then generates a template as command file from any column headings it finds see page 194 It will also convert a Genstat gsh spreadsheet file to csv format The data extracted from the x1s file are labels numerical values and the results from formulae Empty rows at the start and end of a block ar
468. the symbol Exceptions to this rule are single components F id v F and nrm v F terms which are reduced to the corresponding single term F id v F and nrm v F So for example with the random model and residual specification model terms Ir idv A ariv B nrm C us Trait D residual id units us Trait The covariance functions with parameters idv A ariv B us Trait in nrm C us Trait and us Trait in id units us Trait are named idv A ariv B ariv B nrm C us Trait us Trait id units us Trait us Trait If the resulting name is not ambiguous the name can be contracted by reducing the con solidated model term to a unique substring or leaving out the consolidated model term completely For example in the example the covariance functions can be represented by idv A ariv B C us Trait and units us Trait respectively Individual parameters within a covariance component can be specified by number or sequence of numbers n m by appending these in square braces for example C us Trait 3 or units us Trait 4 6 If the residual directive is not used the default R structure parameters are effectively named Residual The orphan term D with no explicit variance function is treated as idv D struc ture with name D If the user is in doubt of the name or number of a parameter then running the program with VPREDICT DEFINE and a blank line will construct a pvc file with the names and numbers of parameters iden
469. ther than the working folder This qualifier must be placed on the top command line as it needs to be processed before any output files are opened Most files produced by ASReml have a filename structure lt basename gt lt subname gt lt extension gt where lt subname gt is a command line argument value If OQUTFOLDER is specified without path the output filename pattern becomes lt basename gt lt subname gt lt basename gt lt extension gt If path is specified the output filename pattern becomes lt path gt lt basename gt lt subname gt lt extension gt There are a few files written by ASReml that do not follow this naming pattern for example ainverse bin and asrdata bin These remain unchanged that is they are not written to the output folder XML requests that the primary tables reported in the asr file and key output from pvs and sln files are written to a xml file in xml format The output is presented in the order of computation The first block written is a asr block and includes start and finish times the data summary the iteration sequence summary and information criteria then from the pvs file the tables and associated information then the summary of estimated variance structure parameters from the asr file then information from the sln file and then finally the Wald F statistics and completion information from the asr file The process is repeated for each cycle of analysis The intended use of this
470. tical Society 82 605 610 Smith A B Cullis B R Gilmour A and Thompson R 1998 Multiplicative models for interaction in spatial mixed model analyses of multi environment trial data Proceedings of the International Biometrics Conference Smith A Cullis B R and Thompson R 2001 Analysing variety by environment data using multiplicative mixed models and adjustments for spatial field trend Biometrics 57 1138 1147 335 BIBLIOGRAPHY Smith A Cullis B R and Thompson R 2005 The analysis of crop cultivar breeding and evaluation trials an overview of current mixed model approaches review Journal of Agricultural Science 143 449 462 Stein M L 1999 Interpolation of Spatial Data Some Theory for Kriging Springer Verlag New York Stevens M M Fox K M Warren G N Cullis B R Coombes N E and Lewin L G 1999 An image analysis technique for assessing resistance in rice cultivars to root feeding chironomid midge larvae diptera Chironomidae Field Crops Research 66 25 26 Stroup W W Baenziger P S and Mulitze D K 1994 Removing spatial variation from wheat yield trials a comparison of methods Crop Sci 86 62 66 Thompson R 1980 Maximum likelihood estimation of variance components Math Op erationsforsch Statistics Series Statistics 11 545 561 Thompson R Cullis B Smith A and Gilmour A 2003 A sparse implementation of the average informat
471. tified The original implementation was based entirely on the numbers but it will generally be better to use the names since the order model terms are reported cannot always be predicted 211 12 2 Syntax Critical change For generalised linear models in ASReml Release 4 the pvc file reports and numbers for completeness a residual or dispersion parameter both when the parameter is estimated or when it is fixed By contrast ASReml 3 does not report nor number if the parameter is fixed by default at 1 Hence the parameters might be numbered differently in ASReml 4 and ASReml 3 12 2 1 Functions of components First ASReml extracts the variance compo a A y mu Ir idv Sire nents from the asr file and their variance besidual idv units matrix from the vvp file The F S V and VPREDICT DEFINE X functions create new components which are F phenvar idv Sire idv units appended to the list For example the F func F genvar idv Sire 4 tion appends component k c v and forms berit genvar phenvar cov c v v and var c v where v is the vec tor of existing variance components c is the vector of coefficients for the linear function and k is an optional offset which is usually omitted but would be 1 to represent the residual variance in a probit analysis and 3 289 to represent the residual variance in a logit analysis The general form of the directive is F labela bxcqp t etd mxk where a b c and
472. tion dependent on REGION In the second model REGION and SITE appear to be independent factors so the initial M codes are A and A However they are not independent because REGION removes additional degrees of freedom from SITE so the M codes are changed from A and A to a and A When using the conditional Wald F statistic it is important to know what the maximal conditional model MCM is for that particular statistic It is given explicitly in the aov file The purpose of the conditional Wald F statistic is to facilitate inference for fixed effects It is not meant to be prescriptive of the appropriate test nor is the algorithm for determining the MCM foolproof The Wald statistics are collectively presented in a summary table in the asr file The basic table includes the numerator degrees of freedom 1 and the incremental Wald F statistic for each term To this is added the conditional Wald F statistic and the M code if FCON is specified A conditional Wald F statistic is not reported for mu in the asr but is in the aov file adjusted for covariates The FOWN qualifier page 78 allows the user to replace any all of the conditional Wald F statistics with tests of the same terms but adjusted for other model terms as specified by the user the FOWN test is not performed if it implies a change in degrees of freedom from that obtained by the incremental model 2 5 3 Kenward and Roger adjustments In moderately sized analyse
473. tistic e Wald F statistic scaled by A e as defined in Kenward amp Roger denominator degrees of freedom Source Size NumDF F value lLambda F Lambda DenDF mu 1 1 3831 9252 331 9252 1 0000 25 0143 variety 56 55 2 2257 2 2245 0 9995 110 8370 225 13 4 Other ASReml output files A more useful example is obtained by adding gpiit plot analysis oat a linear nitrogen contrast to the oats example blocks Section 15 2 nitrogen A subplots variety A The basic design is six replicates of three geike d whole plots to which variety was randomised yield and four subplots which received 4 rates of oats asd skip 2 nitrogen A CONTRAST qualifier defines the CONTRAST linNitr nitrogen 6 4 2 0 model term linNitr as the linear covariate FCON l B l representing ntrogen applied Fitting this be Yt814 mu variety linNitr nitrogen i variety linNitr variety nitrogen fore the model term nitrogen means that this i Ir idv blocks idv blocks wplots latter term represents lack of fit from a linear residual idv units response The FCON qualifier requests conditional Wald F statistics As this is a small example denominator degrees of freedom are reported by default An extract from the asr file is followed by the contents of the aov file Results from analysis of yield Akaike Information Criterion 415 10 assuming 3 parameters Bayesian Information Criterion 421 38 Approximate s
474. to indicate a Pedigree factor A to indicate a 29 3 4 The ASReml command file alphanumerically coded factor I to indicated a factor where the numbers are to be treated as labels for the levels and where the numbers are the actual levels e If none of the names are indicated as factors using the mechanism ASReml will scan the first few lines of data and try and identify alphanumeric integer and simple factors Always check the template as it is likely some variates have been misclassified as factors The template file created by running ASReml on the nin89 asd file looks like WORKSPACE 100 RENAME ARGS DOPART 1 Title nin89 variety id pid raw rep nloc yield lat long row column LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602 1 4 30 1 4 3 22 8 19 1 variety A CODY id 4 pid 4I 1104 raw I 602 rep 1 nloc 4 yield 30 1 lat 4 3 long 22 8 row I 19 column 1 Check Correct these field definitions nin89 asd SKIP 1 yield m Specify fixed model Ir Specify random model residual units We need to change the I associated with row to because the row numbers are actually positions not just labels which could be taken in any order Note that ASReml displays a data value beside each name to make it easier to confirm the labelling 30 3 4 The ASReml command file
475. tratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv blocks 5 00 3175 06 12 0 4 0 1 9 idv blocks wplots 10 00 601 331 0 0 4 0 1 0 Residual Variance 45 00 177 083 0 0 0 0 1 6 Model_Term Gamma Sigma Sigma SE C idv blocks IDV_V 6 121116 214 477 1 27 QFP idv blocks wplots IDV_V 18 0 598937 106 062 1 56 OF idv units 72 effects Residual SCA_V 72 1 000000 177 083 4 74 OP Wald F statistics Source of Variation NumDF DenDF_con F inc F con M P lt con 8 mu 1 6 0 245 14 138 14 lt 001 4 variety 2 10 0 1 49 1 49 A 0 272 T linNitr 1 45 0 110 32 110 32 a lt 001 2 nitrogen 2 45 0 1 37 1 37 A 0 265 9 variety linNitr 2 45 0 0 48 0 48 b 0 625 10 variety nitrogen 4 45 0 0 22 0 22 B 0 928 The analysis shows that there is a significant linear response to nitrogen level but the lack of fit term and the interactions with variety are not significant In this example the conditional Wald F statistic is the same as the incremental one because the contrast must appear before the lack of fit and the main effect before the interaction and otherwise it is a balanced analysis 226 13 4 Other ASReml output files The first part of the aov file the FMAP table only appears if the job is run in DEBUG mode There is a line for each model term showing the number of non singular effects in the terms before the current term is absorbed For example variety nitrogen initially has 12 degrees of freedom non singula
476. trial 1989 variety A explanatory variables after the predict direc tive The predict statement s may appear Column 11 immediately after the model line before or ningg asd skip 1 after any tabulate statements or after the yield mu variety r idv repl R and G structure lines The syntax is predict variety predict factors qualifiers e predict must be the first element of the predict statement in upper or lower case e factors is a list of the variables defining a multiway table to be predicted each variable may be followed by a list of specific levels values to be predicted or the name of the file that contains those values e the qualifiers listed in Table 9 1 modify the predictions in some way e a predict statement may be continued on subsequent lines by terminating the current line with a comma e several predict statements may be specified ASReml parses each predict statement before fitting the model If any syntax problems are encountered these are reported in the pvs file after which the statement is ignored the job is completed as if the erroneous prediction statement did not exist 177 9 3 Prediction The predictions are formed as an extra process in the final iteration and are reported to the pvs file Consequently aborting a run by creating the ABORTASR NOW file see page 68 will cause any predict statements to be ignored Create FINALASR NOW instead of ABORTASR NOW to make the next it
477. two dimensional error structure can be defined see SECTION on page 73 ICSV used to make consecutive commas imply a missing value this is auto matically set if the file name ends with csv or CSV see Section 4 2 Warning This qualifier is ignored when reading binary data IDATAFILE f specifies the datafile name replacing the one obtained from the datafile line It is required when different PATHS see DOPATH in Table 10 3 of a job must read different files The SKIP qualifier if specified will be applied when reading the file FILTER v SELECT New R4 enables a subset of the data to be analysed v is the number or n EXCLUDE n name of a data field When reading data the value in field v is checked after any transformations are performed If SELECT and EXCLUDE are omitted records with zero in field v are omitted from the analysis If SELECT n is specified records with n in field v are retained and all other records are omitted Conversely if EXCLUDE n is specified records with n in field v are ignored 62 5 7 Data file qualifiers Table 5 2 Qualifiers relating to data input and output qualifier action FOLDER s specifies an alternative folder for ASReml to find input files This qualifier is usually placed on a separate line BEFORE the data filename line and any pedigree giv grm filename lines For example FOLDER Data data asd SKIP 1 is equivalent to Data data asd SKIP 1 FORMAT s s
478. ty to the dose sex interaction We also note the comment 3 possible outliers see res file Checking the res file we discover unit 66 has a standardised residual of 8 80 see Figure 15 1 The weight of this female rat within litter 9 is only 3 68 compared to weights of 7 26 and 6 58 for two other female sibling pups This weight appears erroneous but without knowledge of the actual experiment we retain the observation in the following However part 2 shows one way of dropping unit 66 by fitting an effect for it with out 66 Rats example Residuals vs Fitted values Residuals Y 3 02 1 22 Fitted values X 5 04 7 63 e o o o o a o oo gt o 8 0 090 o o8 d abo R 8 o 8 g o 0 08 6 Bo 0o90 0 25 gt ogo o 8go B 08 8 O o g 80o o ao o Gog o o gou 2 o o P Q z Ae o o o oP H go 8 o o8 gt d o goo 8 DooDoo 8 o o o 8 890 8 Fo 8 o 0o00 o i o o S 8 o o o o 9 g o o o o o o9 Figure 15 1 Residual plot for the rat data We refit the model without the dose sex term Note that the variance parameters are re estimated though there is little change from the previous analysis Model_Term Gamma Sigma Sigma SE C idv dam IDV_V 27 0 595157 O 979179E 01 2 93 O P idv units 322 effects Residual SCA_V 322 1 000000 0 164524 12 13 GP Wald F statistics Source of Variation NumDF DenDF_con F_inc F_con M P_con 275 15 4 Source of variability in unbalanced data Volts 7 m
479. u larities occur ASReml runs more efficiently if no constraints are applied Following is an example of Helmert and sum to zero covariables for a factor with 5 levels Hl H2 H3 H4 C1 C2 C3 C4 Fl 1 1 1 1 1 0 0 0 F2 1 1 1 1 0 1 0 0 F3 0 2 1 1 0 0 1 0 F4 0 0 3 1 0 0 0 1 F5 0 0 0 4 1 1 1 1 is used to take a copy of a pedigree factor f and fit it without the genetic relationship covariance This facilitates fitting a second animal effect Thus to form a direct maternal genetic and maternal environment model the maternal environment is defined as a second animal effect coded the same as dams viz r animal dam ide dam forms the reciprocal of v r This may also be used to transform the response variable forms n 1 Legendre polynomials of order 0 intercept 1 linear n from the values in v the intercept polynomial is omitted if n is preceded by the negative sign The actual values of the coefficients are written to the res file This is similar to the pol function described below takes the coding of factor f as a covariate The function is defined for f being a simple factor Trait and units The lin f function does not centre or scale the variable Motivation Sometimes you may wish to fit a covariate as a random factor as well If the coding is say 1 n then you should define the field as a factor in the field definition and use the 1in function to include it as a covariate in the model Do not centre
480. u 1 32 0 8981 48 1093 05 lt 001 3 littersize 1 31 4 27 85 46 43 A lt 001 1 dose 2 24 0 12 05 11 42 A lt 001 2 sex i 301 7 58 27 58 27 A lt 001 Part 4 shows what happens if we wrongly drop dam from this model Even if a random term is not significant it should not be dropped from the model when we are testing fixed effects or desire standard errors of adjusted means if it represents a strata of the design as in this case Model_Term Gamma Sigma Sigma SE C idv units 322 effects Residual SCA_V 322 1 000000 0 253182 12 59 OP Wald F statistics Source of Variation NumDF DenDF_con F_inc F_con M P_con 7 mu 1 317 0 47077 31 3309 42 lt 001 3 littersize 1 317 0 68 48 146 50 A lt 001 1 dose 2 alt 0 60 99 58 43 A lt 001 2 sex 1 317 0 24 52 24 52 A lt 001 15 4 Source of variability in unbalanced data Volts In this example we illustrate an analysis of unbalanced data in which the main aim is to determine the sources of variation rather than assess the significance of imposed treatments The data are taken from Cox and Snell 1981 and involve an experiment to examine the variability in the production of car voltage regulators Standard production of regulators involves two steps Regulators are taken from the production line to a setting station and adjusted to operate within a specified voltage range From the setting station the regulator is then passed to a testing station where it is tested and returned if outside t
481. ualifier and T ke would be degrees of freedom in the typical application to mean squares The default gue value of is 1 a INEGBIN LOGARITHM IDENTITY INVERSE PHI v ptp fits the Negative Binomial distribution Natural logarithms are the default link 2 y 1n 4 gt 5 function The default value of is 1 yin S General qualifiers AOD requests an Analysis of Deviance table be generated This is formed by fitting a series of sub models for terms in the DENSE part building up to the full model and comparing the deviances An example if its use is LS BIN TOT COUNT AOD mu SEX GROUP AOD may not be used in association with PREDICT IDISP A includes an overdispersion scaling parameter h in the weights If DISP is specified with no argument ASReml estimates it as the residual variance of the working variable Traditionally it is estimated from the deviance residuals reported by ASReml as Variance heterogeneity An example if its use is count POIS DISP mu group OFFSET o is used especially with binomial data to include an offset in the model where o is the number or name of a variable in the data The offset is only included in binomial and Poisson models for Normal models just subtract the offset variable from the response variable for example count POIS OFFSET base DISP mu group The offset is included in the model as n X7 0 The offset will often be somethi
482. uare root scale Figure 15 8 presents a plot of the treated and the control root area on the square root scale for each variety There is a strong dependence between the treated and control root area which is not surprising The aim of the experiment was to determine the tolerance of varieties to bloodworms and thence identify the most tolerant varieties The definition of tolerance should allow for the fact that varieties differ in their inherent seedling vigour Figure 15 8 The original approach of the scientist was to regress the treated root area against the control root area and define the index of vigour as the residual from this regression This approach is clearly inefficient since there is error in both variables We seek to determine an index of tolerance from the joint analysis of treated and 298 15 8 Paired Case Control study Rice control root area me this is for the paired data Y axis 1 8957 14 8835 X dxis 8 2675 23 5051 o o o o o o o0 g o o o o o o 3 o o o9 oo o o o 8 o o o o o o o o o o o 2 o o o o o g 0 d o o o 9 s o o o0 o oo o o o o o o o o o o o o p oo oi o o o o o o Q oo o oo o o o a o o o o o Figure 15 8 Rice bloodworm data Plot of square root of root weight for treated versus control 299 15 8 Paired Case Control study Rice 15 8 1 Standard analysis The allocation of bloodworm treatments within varieties a
483. uced more computationally efficiently than it would be using PREDICT For example TPREDICT Animal AVE Trait 2 1 1 2 7 4 ONLYUSE us Trait nrm Animal Part of the motivation for this is the calculation of selection indices The index coefficients are typically derived as w a Gon G gt 1 Where Gmm is the variance matrix for the measured traits corresponding to C in the example Gom is the genetic covariance matrix between the objective traits and the measured traits and a is the vector of economic values for the objective traits The results are given in a sli selection index file This directive should be placed after the model specification 191 10 Command file Running the job 10 1 Introduction The command line its options and arguments are discussed in this chapter Command line options enable more workspace to be accessed to run the job control some graphics output and control advanced processing options Command line arguments are substituted into the job at run time As Windows likes to hide the command line most command line options can be set on an optional initial line of the as file we call the top job control line to distinguish it from the other job control lines discussed in Chapter 6 If the first line of the as file contains a qualifier other than DOPATH it is interpreted as setting command line options and the Title is taken as the next line 10 2 The command line 10 2 1 Normal run The basic command
484. ues ASReml doesn t print predictions of non estimable functions unless the PRINTALL qualifier is specified However using PRINTALL is rarely a satisfactory solution Failure to report predicted values normally means that the predict statement is averaging over some cells of the hyper table that have no information and therefore cannot be averaged in a meaningful way Appropriate use of the AVERAGE and or PRESENT qualifiers will usually resolve the problem The PRESENT qualifier enables the construction of means by averaging only the estimable cells of the hyper table where this is appropriate Table 9 1 is a list of the prediction qualifiers with the following syntax e fis an explanatory variable which is a factor e tis a list of terms in the fitted model e nis an integer number e vis a list of explanatory variables 180 9 3 Prediction Table 9 1 List of prediction qualifiers qualifier action Controlling formation of tables ASSOCIATE v IAVERAGE f weights AVERAGE f gt file n ASAVERAGE f weights ASAVERAGE f gt file n PARALLEL vu PRESENT v PRWTS v facilitates prediction when the levels of one factor are grouped by the levels of another in a hierarchical manner More details are given below Two independent associate lists may be specified is used to formally include a variable in the averaging set and to explicitly set the weights for averaging Variables
485. ults A portion of the file is presented below There is a wide range in SED reflecting the imbalance of the variety concurrence within runs 305 15 8 Paired Case Control study Rice Assuming Power transformation was Y 0 000 0 500 The ignored set run variety Trait Power_value Stand_Error Ecode Retransformed_value approx_SE AliCombo sqrt yc 14 9531 0 9181 E 223 5962 27 4568 AliCombo sqrt ye 7 9941 0 7992 E 63 9050 12 7784 Bluebelle sqrt yc 13 1036 0 9310 E 171 7046 24 3987 Bluebelle sqrt ye 6 6302 0 8062 E 43 9598 10 6903 C22 sqrt yc 16 6676 0 9181 E 277 8096 30 6050 C22 sqrt ye 8 9541 0 7992 E 80 1756 14 3130 YRK1 sqrt yc 15 1857 0 9549 E 230 6068 29 0018 YRK1 sqrt ye 8 3355 0 8190 E 69 4806 13 6531 YRK3 sqrt yc 13 3058 0 9549 E 177 0431 25 4114 YRK3 sqrt ye 8 1134 0 8190 E 65 8265 13 2892 exposed BLUP nm ar N dig 10 u09 Figure 15 9 BLUPs for treated for each variety plotted against BLUPs for control Table 15 10 Estimated variance parameters from bivariate analysis of bloodworm data control treated source variance variance covariance us trait variety 3 84 1 96 2 33 us trait run 1 71 2 54 0 32 us trait pair 2 14 2 35 0 99 306 15 8 Paired Case Control study Rice 15 8 3 Interpretation of results Recall that the researcher is interested in varietal tolerance to bloodworms This could be defined in various ways One option is to consider the regression im
486. umn optional field labels LANCER 1 NA NA 1 4 NA 4 31 21 1 file augmented by missing values LANCER 1 NA NA 1 4 NA 4 3 2 4 2 1 for first 15 plots and 3 buffer LANCER 1 NA NA 1 4 NA 4 3 3 63 1 plots and variety coded LANCER LANCER 1 NA NA 1 4 NA 4 34 8 41 to complete 22x11 array LANCER 1 NA NA 1 4 NA 4 365 1 LANCER 1 NA NA 1 4 NA 4 3 7 2 6 1 LANCER 1 NA NA 1 4 NA 4 3 8 4 7 1 LANCER 1 NA NA 1 4 NA 4 3 9 6 8 1 LANCER 1 NA NA 1 4 NA 4 3 10 8 9 1 LANCER 1 NA NA 1 4 NA 4 3 12 10 1 LANCER 1 NA NA 1 4 NA 4 3 13 2 11 1 LANCER 1 NA NA 1 4 NA 4 3 14 4 12 1 LANCER 1 NA NA 1 4 NA 4 3 15 6 13 1 LANCER 1 NA NA 1 4 NA 4 3 16 8 14 1 buffer plots LANCER 1 NA NA 1 4 NA 4 3 18 15 1 between reps LANCER 1 NA NA 2 4 NA 17 2 7 264 LANCER 1 NA NA 3 4 NA 25 8 22 8 19 6 LANCER 1 NA NA 4 4 NA 38 7 12 0 10 9 LANCER 1 1101 585 1 4 29 25 4 3 19 2 16 1 original data BRULE 2 1102 631 1 4 31 55 4 3 20 4 17 1 REDLAND 3 1103 701 1 4 35 05 4 3 21 6 18 1 CODY 4 1104 602 1 4 30 1 4 3 22 8 19 1 Note that e the pid raw repl and yield data for the missing plots have all been made NA one of the three missing value indicators in ASReml see Section 4 2 e variety is coded LANCER for all missing plots one of the variety names must be used but the particular choice is arbitrary 28 3 4 The ASReml command file 3 4 The ASReml command file By convention an ASReml command file has a as extension The file defines e a title line to describe the job
487. upplies a Fortran like FORMAT statement for reading fixed format files A simple example is FORMAT 314 5F6 2 which reads 3 integer fields and 5 floating point fields from the first 42 characters of each data line A format statement is enclosed in parentheses and may include 1 level of nested parentheses for example e g FORMAT 4x 3 14 f8 2 Field descriptors are e rX to skip r character positions e rAw to define r consecutive fields of w characters width e rIw to define r consecutive fields of w characters width and e rFw d to define r consecutive fields of w characters width d indicates where to insert the decimal point if it is not explicitly present in the field where r is an optional repeat count In ASReml the A and I field descriptors are treated identically and simply set the field width Whether the field is interpreted alphabetically or as a number is controlled by the A qualifier Other legal components of a format statement are e the character required to separate fields blanks are not permitted in the format e the character indicates the next field is to be read from the next line However a on the end of a format to skip a line is not honoured e BZ the default action is to read blank fields as missing values and NA are also honoured as missing values If you wish to read blank fields as zeros include the string BZ e the string BM switches back to blank missing mode e the string T
488. ution Used EMFLAG O Single standard EM update when AI update unacceptable You could try GU negative definite US or use XFA instead Akaike Information Criterion Bayesian Information Criterion 43471 52 Model_Term id units us Trait Trait Trait Trait Trait Trait 2 2 US_C 3 3 Sigma 35200 effects US V 1 1 1 2 1 2 9 46109 7 34181 17 6050 0 272536 0 668009 322 43065 77 assuming 52 parameters Sigma Sigma SE C 9 46109 33 29 OP 7 34181 20 55 OP 17 6050 2709 OP 0 272536 3 38 0P 0 668009 13 99 0P 15 10 Multivariate animal genetics data Sheep Trait US_V Trait US_C Trait US_C Trait US_C Trait US_V Trait US_C Trait US_C Trait US_C Trait US_C Trait US_V diag TrSG123 sex grp TrS G123 DIAG_V TrSG123 DIAG_V TrSG123 DIAG_V diag TrAG1245 age grp TrAG1245 DIAG_V TrAG1245 DIAG_V TrAG1245 DIAG_V TrAG1245 DIAG_V us Trait id sire Trait US_V Trait US_C Trait US_V Trait US_C Trait US_C Trait US_V Trait US_C Trait US_C Trait US_C Trait US_V Trait US_C Trait US_C Trait US_C Trait US_C Trait US_V xfai TrDam123 id dam TrDam123 XFA_V TrDam123 XFA_V TrDam123 XFA_V TrDam123 XFA_L TrDam123 XFA_L TrDam123 XFA_L us TrLit1234 id lit TrLit1234 US_V TrLit1234 US_C TrLit1234 US_V TrLit1234 US_C TrLit1234 US_C TrLit1234 US_V TrLit1234 US_C TrLit1234 US_C TrLit1234 US_C TrLit1234 uS ana n Sk A A WA w anannnF AFP BP WWWNN KE Bee FE OO CO BRR wWwWWNNE
489. v is a single precision lower triangle row wise binary file and dgiv is a double precision lower triangle row wise binary file PRECISION n changes the value used to declare a singularity when inverting a GRM file from 1D 7 to 1D n A GRM can be associated with a factor i by using the variance model function grm f which associates the ith GRM with factor f for example grmiv animal INIT 0 12 or coruh site grm2 variety It is imperative that the GIV GRM matrix be defined with the correct row column order the order that matches the order of the levels in the factor it is associated with The easiest way to check this is to compare the order used in the GIV GRM file with the order reported in the sln file when the model is fitted Another example of L Section 5 4 1 is in analysis on data with 2 relationship matrices based on two separate pedigrees ASReml only allows one pedigree file to be specified but can create an inverse relationship matrix and store the result in a GIV file So 2 relationship matrices based on two separate pedigrees may be used by generating a GIV file from one pedigree and then using that GIV file and the other pedigree in a subsequent run To process the GIV file properly we must also generate a file with identities as required for the GIV matrix An example of this is if the file Hybrid as includes IPART 1 Mline P Fline A Mline ped GIV DIAG GIV generates the file HybridiA giv and DIAG genera
490. v pair diag tmt id run idv uni tmt 2 residual idv units The two paths in the input file define the two univariate analyses we will conduct We consider the results from the analysis defined in PATH 1 first A portion of the output file is 5 LogL 345 306 S2 1 3216 262 df 6 LogL 345 267 52 1 3155 262 df 7 LogL 345 264 S2 1 3149 262 df 8 LogL 345 263 S2 1 3149 262 df Results from analysis of sqrt rootwt Akaike Information Criterion 702 53 assuming 6 parameters Bayesian Information Criterion 723 94 Approximate stratum variance decomposition Stratum Degrees Freedom Variance Component Coefficients idv variety 44 40 26 0156 4238 3 0 3 6 2 0 1 5 1 idv run 45 17 7 41702 0 0 3 5 0 0 2 0 iT 1 idv variety tmt 39 53 2 99833 0 0 0 0 Zot 0 0 0 2 1 idv pair 41 43 3 26838 0 0 0 0 0 0 20 O 0 1 idv run tmt 52 38 5 12369 0 0 0 0 O20 CO 2 2 1 Residual Variance 39 09 1 31486 0 0 0 0 0 0 0 0 020 i Model_Term Gamma Sigma Sigma SE C idv variety IDV_V 44 1 80947 2 37920 3 01 GP idv run IDV_V 66 0 244243 0 321145 0 59 OP idv variety tmt IDV_V 88 0 374220 0 492047 1 78 OP idv pair IDV_V 132 0 742328 0 976056 2 51 OP idv run tmt IDV_V 132 1 32973 1 74841 3 65 OP idv units 264 effects Residual SCA_V 264 1 000000 1 31486 4 42 OP Wald F statistics Source of Variation NumDF DenDF F ine P in 7 mu 1 53 6 1484 96 lt 001 4 tmt 1 60 4 469 35 lt 001 301 oO OD Oo Oo Oo 15 8 Paired Case Control study Rice
491. v units NIN Alliance Trial 1989 variety A id pid raw repl 4 row 22 column 11 nin89 asd skip 1 yield mu variety r idv repl residual idv units 7 5 A sequence of variance structures for the NIN data 3a Two dimensional spatial model with spatial correlation in one direc tion The NIN trial was actually laid out in as a rectangular NIN Alliance Trial 1989 array indexed in the data file by row and column We variety A can therefore consider fitting a spatial model for the id residual term where we allow for autocorrelated errors P14 in the row and or column direction see Section 7 3 a i However there are missing plots in the original data E Before fitting a spatial analysis we therefore need to Be row fill out the data file to contain records for the miss ieee ing plots ASReml can now fill out the data file using aap a ipi IROWFACTOR and COLUMNFACTOR see Table 5 2 This yield mu variety allows us to define a separable variance structure for r idv repl f mv the residual error term that is the kronecker product residual idv column ar1 row of a structure for rows and a structure for columns The example in the code box specifies e N 0 o2 I p that is a two dimensional first order separable autoregressive spatial structure for error but with spatial correlation in the row direction only IDVxAR1 ar1 row models the p correlation stru
492. values 4 19 it solves the mixed model equations by it eration allowing larger models to be fitted With direct solution the estimation REML iteration routine is aborted after n 1 forming the estimates of the vector of fixed and random effects by matrix inversion n 2 forming the estimates of the vector of fixed and random effects REML log likelihood and residuals this is the default n 3 forming the estimates of the vector of fixed and random effects REML log likelihood residuals and inverse coefficient matrix For arguments 4 10 19 ASReml forms the mixed model equations and solves them iteratively to obtain solutions for the fixed and random effects The options are n 4 forming the estimates of the vector of fixed and random effects using the Preconditioned Conjugate Gradient PCG Method Mrode 2005 15 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action DENSE n IDF n n 10 19 forming the estimates of the vector of fixed and random effects by Gauss Seidel iteration of the mixed model equations with relaxation factor n 10 The default maximum number of iterations is 12000 This can be re set by supplying a value greater than 100 with the MAXIT qualifier in conjunction with the BLUP qualifier Iteration stops when the average squared update divided by the average squared effect is less than le Gauss Seidel iteration is generally much s
493. variance in a univariate single site analysis The option will have no effect in analyses with multiple error variances for sites or traits other than in the reported degrees of freedom Use ADJUST r rather than DF n if ris not a whole number Use with YSS r to supply variance when data fully fitted 76 5 8 Job control qualifiers Table 5 5 List of rarely used job control qualifiers qualifier action EMFLAG n PXEM n requests ASReml use Expectation Maximization EM rather than Av erage Information AI updates when the AI updates would make a US structure non positive definite This only applies to US structures and is still under development When GP is associated with a US structure ASReml checks whether the updated matrix is positive definite PD If not it replaces the AI update with an EM update If the non PD char acteristic is transitory then the EM update is only used as necessary If the converged solution would be non PD there will be a EM update each iteration even though EM is omitted EM is notoriously slow at finding the solution and ASReml includes several modified schemes discussed by Cullis et al 2004 particularly relevant when the AI update is consistently outside the parameter space These include optionally performing extra local EM or PXEM Parame ter Expanded EM iterates These can dramatically reduce the number of iterates required to find a solution near the boundary of the parame
494. variance matrices In this case var y 02 ZG y Z R 7 2 8 which we will refer to as the gamma parameterization and the individual variance structure parameters in y and y will be referred to as gammas ASReml switches between the sigma and gamma parameterizations for estimation This is discussed in Section 7 6 2 1 7 Parameter types Each sigma in o and and each gamma in y and y has a parameter type for ex ample variance components variance component ratios autocorrelation parameters factor loadings Furthermore the parameters in Og Or Yg and y can span multiple types For example the spatial analysis of a simple column trial would involve variance components sigma parameterization or variance component ratios gamma parameterization and spa tial autocorrelation parameters 2 1 8 Variance structures for the random model terms The random model terms u in u define the random effects and associated design matrices Zi Z but additional information is required before the model can be fitted This extra 8 2 1 The general linear mixed model step involves defining the G structure for each term In Release 4 this is achieved by using functions to directly apply variance models to the individual component factors in a random model term to define G This produces a consolidated model term that simultaneously defines both the design matrix Z and variance model G This process is described in det
495. variance model functions can also be applied to compound model terms example 3 111 7 2 Process to define a consolidated model term Table 7 2 Building consolidated model terms in ASReml linear model term component s variance variance covariance consolidated model type of term structure model component term name function name 1 repl repl IDV idv idv rep1 idv rep1 single 2 fac x fac x EXPV expv expv fac x expv fac x single 3 A B A B IDV idvQ idv A B idv A B compound 4 column row column IDV idv idv column idv column ari row compound row ARI ar1Q ari row 5 site variety site DIAG diag diag site diag site id variety compound variety ID id id variety 6 Trait animal Trait US us us Trait us Trait nrm animal compound animal NRM nrm nrm animal e variance model functions cannot be applied to expandable model terms for example to A B which expands to A B A B A B which expands to A A B at A i j B which expands to at A i B at A j B e a variance function must be specified for one but only one component in a compound model term Correlation functions must be defined for the remaining terms This is due to the identifiability issues that occur when multiple variance structures are specified This is explained in NIN example 3a see Section 7 5 The defined variance function may be homogeneous name ending in v or heterogeneous variance name ending in h This is discuss
496. ve differing results depending on the order in which the averaging is performed We explore this with the following extended example Consider the mean yields from 15 trials classified by region and location in Table 9 4 Table 9 3 Trials classified by region and location location Region L1 L2 L3 L4 L5 L6 L7 L8 R1 T1 T2 T3 T4 T5 T6 R2 T7 T8 TO T10 T11 T12 T13 Tid T15 Table 9 4 Trial means T1 T2 T3 T4 T5 T6 Tr T8 T9 T10 T11 T12 T13 T14 T15 10 12 11 12 13 13 11 13 11 12 13 10 12 10 10 Assuming a simplified linear model yield mu region location trial the predict statement predict trial ASSOCIATE region location trial will reconstruct the 15 trial means from the fitted mu region location and trial effects Given these trial means it is fairly natural to form location means by averaging the trials in each location to get the location means in Table 9 5 Table 9 5 Location means L1 L2 L3 L4 Ld L6 L7 L8 11 12 13 12 12 11 10 10 These are given by predict location ASSOCIATE region location trial ASAVERAGE trial or equivalently predict location ASSOCIATE region location trial since the default is to average the base associate factor trial within the associated classify factor location 186 9 3 Prediction By contrast by specifying predict location or equivalently predict location AVERAGE region AVERAGE trial ASReml would add the average of all the trial effects and the
497. wing lines are honoured if any one of the listed path numbers is active The PATH qualifier must appear at the beginning of its own line after the DOPATH qualifier A sequence of path numbers can be written using a b notation For example mydata asd DOPATH 4 PATH 2 4 6 10 One situation where this might be useful is where it is necessary to run simpler models to get reasonable starting values for more complex variance models The more complex models are specified in later parts and the CONTINUE command is used to pick up the previous estimates Example The following code will run through 1000 models fitting 1000 different marker variables to some data For processing efficiently the 1000 marker variables are held in 1000 separate files in subfolder MLIB and indexed by Genotype Marker screen Genotype yield PhenData txt ICYCLE 1 1000 IMBF mbf Genotype MLIB Marker I csv RENAME Marker I yld mu r Marker I 204 10 5 Performance issues Having completed the run the Unix command sequence grep LogL screen asr sort gt screen srt sorts a summary of the results to identify the best fit The best fit can then be added to the model and the process repeated Assuming Marker35 was best the revised job could be Marker screen Genotype yield PhenData txt ICYCLE 1 1000 IMBF mbf Genotype MLIB Marker I csv RENAME Marker I IMBF mbf Genotype MLIB Marker35 csv RENAME MKRO35 yld mu r MKRO35 Marker I We have giv
498. x causes ASReml to write the design matrix not including the response variable to a des file It allows ASReml to create the design matrix required by the VCM process see Section 7 8 2 69 5 8 Job control qualifiers Table 5 4 List of occasionally used job control qualifiers qualifier action IDISPLAY n IEPS IG v IGKRIGE p GROUPFACTOR tv p is used to select particular graphic displays In spatial analysis of field trials four graphic displays are possible see Section 13 4 Coding these 1 variogram 2 histogram 4 row and column trends 8 perspective plot of residuals set n to the sum of the codes for the desired graphics The default is 9 1 8 These graphics are only displayed in versions of ASReml linked with Winteracter that is LINUX MAc and PC versions Line printer ver sions of these graphics are written to the res file See the G command line option Section 10 3 on graphics for how to save the graphs in a file for printing Use NODISPLAY to suppress graphic displays sets hardcopy graphics file type to eps is used to set a grouping variable for plotting see X controls the expansion of PVAL lists for fac X Y model terms For kriging prediction in 2 dimensions X Y the user will typically want to predict at a grid of values not necessarily just at data combinations The values at which the prediction is required can be specified separately for X and Y using two PVAL stat
499. x axis variable is numeric Predictions involving two or more factors xaxis factor superimpose factors condition factors Layout goto n saveplot filename layout rows cols pycols plankpanels n extrablanks n and extraspan p Improving the graphical labcharsize n panelcharsize n vertxlab abbrdlab n abbrxlab n If these arguments are used all prediction factors except for those specified with only one prediction level must be listed once and only once otherwise these arguments are ignored specifies the prediction factor to be plotted on the x axis specifies the prediction factors to be superimposed on the one panel specifies the conditioning factors which define the panels These should be listed in the order that they will be used specifies the page to start at for multi page predictions specifies the name of the file to save the plot to specifies the panel layout on each page specifies that the panels be arranged by columns default is by rows specifies that each page contains n blank panels This sub option can only be used in combination with the layout sub option specifies that an additional n blank panels be used every p pages These can only be used with the layout sub option appearance and readability specifies the relative size of the data points labels default 0 4 specifies the relative size of the labels used for the panels default 1
500. xample of the display produced when an XFA structure is fitted The output from a small example with 9 environments and 2 factors is 235 13 4 Other ASReml output files Field Pt SE oR ES diss ae Jul 2405 2 2 41 18 ae eee omy eae ee re ee een o o a aaam se tae n eS G gopikas ieee Rares oe INS ee 2 aai j Figure 13 3 Plot of residuals in field plan order JIN alliance tria Residuals V Hon ana ofinn 5g 595 Bjbion te yaa b005 12 41 18 Figure 13 4 Plot of the marginal means of the residuals 236 13 4 Other ASReml output files DISPLAY of variance partitioning for XFA structure in xfa Env 2 nist yea det ans dal A RA 2005 BA 18 Peak Count 17 Range 24 87 15 91 Andal Figure 13 5 Histogram of residuals Lvl TotalVar EN COMO OANA OTKFWHN In the figure 1 indicates the proportion of TotalVar explained by the first loading 2 indicates the proportion explained by first and second provided it plots right of 1 Consequently the distance from 2 to the right margin represents PsiVar expl reports the percentage of i 1 NN NNN DN Haaser asteesatassetassstesost ssstss st Average Dy 1666 ooo oO OC OC Oo C6 3339 Geno hexpl KROORDOOMON ooooo0oo0oo0oo0oo00 Loadings 0 0 0 0 0 0 0 0 0 0 5147 4003
501. y di mensioned Further when the model term mv is included in the model and ROWFACTOR and COLUMNFACTOR are defined ASReml will check that the observations in each section form a complete grid if not the grid will be completed by adding the appropriate extra data records If only one grid is required from all the data then the SECTION variable does not need specifying The following is a basic example assuming 5 sites sec tions Basic multi envt trial analysis filling out row and column grid site 5 sites coded 1 5 column columns coded 1 row rows coded 1 variety A variety names yield met dat SECTION site ROWFACTOR row COLUMNFACTOR col yield site r variety site variety f mv residual sat site ari row ar1 column defines a spline model term with an explicit set of knot points The basic form of the spline model term sp1 v is defined in Table 6 1 where v is the underlying variate The basic form uses the unique data values as the knot points The extended form is spl v n which uses n knot points Use this SPLINE qualifier to supply an explicit set of n knot points p for the model term Using the extended form without using this qualifier results in n equally spaced knot points being used The SPLINE qualifier may only be used on a line by itself after the datafile line and before the model line When knot points are explicitly supplied they should be in increasing order and adequately cover t
502. ypi cally coded 0 1 2 being counts of the minor allele However if they are imputed they will take real values between 0 and 2 Since marker files may be huge 169 8 11 Factor effects with large Random Regression models SMODE b sets the storage mode for the regressor data indicating whether it is marker data b 2 sets 2bit storage for strictly 0 1 2 marker data b 8 the default sets 8bit storage useful for marker data with imputed values having 2 digits after the decimal b 16 sets 16bit storage useful for marker data with imputation with more than 2 digits and b 32 sets 32bit real storage and should be used for non marker data RANGE l h indicates the marker scores range l h and are to be transformed to have a range 0 2 IGSCALE s controls the scaling of the GRM matrix If unspecified s 2p 1 p is used for marker data s 1 for non marker data SMODE 32 Scaling is often used with centred marker data to scale the MM matrix so that it is a genomic matrix Example IWORK 1 Nassau Clone Data Nfam 71 A Nfemale 26 A Nmale 37 A Clone A 860 rep 8 iblk 80 culture A DBH6 snpData grr Clone Marker nassau csv MAXIT 30 SKIP 1 DFF 1 DBH6 mu culture rep r grmiv Clon 0 27 Clone 0 15 rep iblk 0 31 where snpData grr is first used to declare Clone identifiers taken from the first field in the correct order and then contains the marker scores it looks like Genotype 0 10024 01 114 0 10037 01 25
503. ys e in the variance structure specification for example ari row INIT 0 35 sets the initial value of the autocorrelation parameter for ar1 row at 0 35 when this form is used all of the values required by the structure must be specified e by modifying the tsv or msv file created in a preliminary run Section 7 9 1 e by supplying an rsv file using CONTINUE Section 7 9 2 Important points e when initial values are supplied using INIT there must be the correct number of values and they must be in the appropriate order for example for us the initial values need to be supplied in the order lower triangle row wise e for the gamma parameterization Section 7 6 the variance structure parameters will be gammas in this case the initial values for the gammas that are variance component ratios will be interpreted by ASReml as ratios 129 7 7 Variance model function qualifiers 7 7 6 About subsections SUBSECTION f The SUBSECTION qualifier provides an extension to the sat function of Section 7 3 2 for modelling the residual variance It allows the case of modelling multiple independent sections of correlated observations with a common variance structure and common parameters within sections The sections can be of different sizes and any homogeneous variance correlation model in Table 7 6 may be used for the variance structure This gives an R structure of the form R i1 Pto where R Dj Z Q so R may have
504. ywt IF 1 3 ASSIGN YV gfw IF 1 4 ASSIGN YV fdm IF 1 5 ASSIGN YV fat tag sire 92 II dam 3561 I grp 49 sex brr 4 litter 4871 age wwt MO MO identifies missing values ywt MO gfw MO fdm 1MO fat 1MO coop fmt IPART 1235 YV mu age brr sex age sex r idv sire idv dam idv lit idv age grp idv sex grp f grp traits are substituted for YV PART 4 leaves out sex grp for fdm YV mu age brr sex age sex r idv sire idv dam idv lit idv age grp If grp fdm is substituted for YV Tables 15 13 and 15 14 present the summary of these analyses Fibre diameter was measured on only 2 female lambs and so interactions with sex were not fitted The dam variance component was quite small for both fibre diameter and fat The REML estimate of the variance component associated with litters was effectively zero for fat 318 15 10 Multivariate animal genetics data Sheep Table 15 14 Wald F statistics of the fixed effects for each trait for the genetic example term wwt ywt gfw fdm fat age 331 3 67 1 52 4 26 7 5 brr 5546 73 4 149 03 13 9 sex 196 1 123 3 0 2 29 0 6 age sex 10 3 1 7 1 9 5 0 Thus in the multivariate analysis we consider fitting the following models to the sire dam and litter effects var us Ys Io var ua Ya 3561 var u 5 Q I 4891 where 2 5 and 5f are positive definite symmetric matrices corresponding to the between traits variance

Download Pdf Manuals

image

Related Search

Related Contents

DVR 4/8 Canais VD-0412H, VD-0824H Manual do Usuário  user manual notice d'utilisation benutzerinformation Fridge  3. La caja de medición ARTA  Snapper R194014 Lawn Mower User Manual  Erba Hypoclean CC  RPS 制度に係る申請・届出書記入方法 (ITEM2000 の操作方法)    SHDE ® - Shoei  CINEWALL Basic Set XL  VitaScan LT - Vitacon US  

Copyright © All rights reserved.
Failed to retrieve file