Home
The Survival Kit V3.12 User's Manual
Contents
1. PARAMETER NDIMAX2 1 PARAMETER NDIMAX2 NDIMAX NTIMMAX largest possible value of the dependent time variable this value is necessary as an upper limit for an efficient c comutation of log time and exp time and when calculating functional values of the survivor curve SURVIVAL statement of COX and WEIBULL 6000 NSTIMAX maximum number of distinct times or quantiles that be defined in the SURVIVAL statement options SPECIFIC c and QUANTILE PARAMETER NSTIMAX 100 5 gives the possibility to integrate out loggamma c random effect with the option INTEGRATE OUT WITHIN WHILE a STRATA statement is specified 5 4 gt usual case c INTE 5 5 STRATA INTEGRATE_OUT WITHIN PARAMETER INTE 5 4 48 CHAPTER 3 THE PARINCLU FILE INTEGER BLKSIZE BLKSIZEO BLKSIZE2 PARAMETER BLKSIZEO MXEF 3 PARAMETER BLKSIZE MXEF_USED INTE_ST NRECMAX PARAMETER BLKSIZE2 MXEF 4 NRECMAX2 Parameters used in the lbfgs maximisationn routine INTEGER ITER_BF MVEC_BF NO_LOG REAL 8 EPS_BFDEF c MVEC_BF number of vectors used to store the approximation of the Hessian rank of approx
2. INTEGER MAXWORD MAXLENG MAXFIG LRECL NBUG_2000 INTEGER NKEYFE NKEYC NKEYW maxword max number of words in the parameter file c maxleng max number of characters in a word c maxfig max number of figures i e numbers in the parameter file c lrecl number of columns read per record c nbug 2000 maximum year value yy 00 gt nbug_2000 under which a D6 date c is transformed from to ddmm 100 yy C to avoid the year 2000 bug c parameter files should be automatically transformed to upper case names c nkeyfe number of keywords i e statement types used in the parameter file for PREPARE c nkeyc number of keywords used in the parameter file for COX c nkeyc number of keywords used in the parameter file for WEIBULL PARAMETER MAXWORD 1000 PARAMETER MAXLENG 40 PARAMETER MAXFIG 400 PARAMETER LRECL 80 PARAMETER NBUG_2000 40 PARAMETER NKEYFE 17 PARAMETER NKEYC 61 PARAMETER NKEYW 61 Chapter 4 A small example The following completely artificial example will be used to illustrate how to analyze survival data using the Survival Kit 1 Inital data file small dat Consider the data file 15110154 10 21011015610 31111015610 4110100 5151200 61512015610 7171100 8191100 9211200 10 21 0 2 0 1 5 15 10 11 28 1 2 0 1 5 15 10 As specified in the INPUT
3. 39 11 16 SURVIVAL Statement 39 11 17 RESIDUALS Statement 40 11 18 CONSTRAINTS Statement 40 11 19 CONVERGENCE_CRITERION Statement 41 11 20 STORAGE Statement 41 11 21 STORE_SOLUTIONS and READ_OLD_SOLUTIONS State 41 11 22 STORE_BINARY Statement 42 3 The parinclu file 45 4 A small example 53 5 Analysis of very large data sets 69 12 PREPARE program 69 13 Before running the COX or WEIBULL programs 70 14 COX and WEIBULL programs 71 6 CONTENTS Preface The Survival Kit is primarily intended to fill a gap in the software available to animal breeders who generally tend to use extremely large data sets and want to estimate random effects Methods of survival analysis have primarily been developed in the area of clinical biometrics where data sets and number of levels of effects are usually smaller Theory about random effects in survival analysis frailty models is less well developed than for linear models and there are no programs publicly available which deal with such models Although developed by animal breeders for animal breeders the programs of the Sur vival Kit should therefore be interesting for people from other areas encountering similar problems of large models and random effects To make the Survival Kit user friendly commands u
4. BASELINE CUMULATIVE HAZARD FUNCTION AND SURVIVOR FUNCTION FOR STRATUM 1 SURVIVOR FUNCTION TIME TOTAL TIME Be CUMULATIVE HAZARD ORNa Be 45633 80420 33765 34920 75223 74329 17092 74770 05 SECONDS Residuals for the Cox model 63361 44745 26246 01292 00318 00043 00001 00000 The generalized residuals of Cox and Snell 1966 are calculated for all records of the unsorted small2 dat file If the model is correct they are distributed according to a censored exponential distribution with parameter 1 This can be checked globally by plotting the cumulated hazard distribution log S RES or the expected order statistics of the unit exponential distribution against the value of RESIDUALS This should result in a straight line going through the origin and with slope 1 Note that expected order statistics are computed only when STORAGE IN_CORE is used ANIMALS TOTAL AT RISK FAILED 11 9 NONG 1 o O 0 FAILURE RESIDUALS 1 1 1 1 1 1 1 1 1 1208301 3103519 4439302 4563280 5264675 9773750 1 2108388 1 6299547 2 0211752 S RES 9090909 8080808 TOTOTOT 6060606 5050505 4040404 2693603 1346801 0000000 LOG S RES 0953102 2130932 3466246 5007753 6830968 9062404 1 3117055 2 0048527 EXPECTED ORDER STAT 0909091 2020202 3270202 4698773 6365440 8365440 1 16
5. TIME variable in ascending order the TIME variable is always the first in the recoded file e A LOGGAMMA random variable is defined in the WEIBULL parameter file This random variable is algebraically integrated out in order to reduce the number of parameters to estimate or to obtain an exact marginal posterior distribution This is done using the INTEGRATE OUT statement in the parameter file see below Again two situations exist If the INTEGRATE_OUT statement is followed by the name of one variable only e g INTEGRATE OUT sire the sorting order should be the variable to be integrated out in ascending order IDREC variable in ascending order the IDREC variable is always the third variable in the recoded file TIME variable in ascending order the TIME variable is always the first in the recoded file Note that if the variable to be integrated out is time dependent e g herd year season the statement IDREC STORE_PREVIOUS_TIME must have been used If the INTEGRATE_OUT statement is followed by the name of two variables e g INTEGRATE OUT year WITHIN herd and if the original records were already sorted by herd the statement IDREC STORE PREVIOUS TIME is not needed and there is no need to sort according to the variable year Records should simply be sorted by IDREC variable and ascending TIME variable This is the natural way records are recoded and therefore the need to sort the elementary r
6. O O O O Eb OOO 5 ON PFE F NN FF FF Q F amp Q QQ HE N Q O O OO b O OO cot e 1 4 2101020240 15 11112234 2811120240 The structure of this file is given in file small paa which is an output file of the PREPARE program and an input files for the programs COX and WEIBULL see below The first column is the time of failure of censoring of change in at least one time dependent covariate or of truncature for truncated records These four categories are indicated by a code in column 2 with value 1 0 1 or 2 respectively The third column indicates the id number The remaining ones correspond to the covariates used individually recoded from 1 to the total number of levels for discrete covariates When a covariate is time dependent two columns are used one corresponding to the value of the covariate before the change and the next one for after the change For example for animal 1 first elementary record at time 4 the level of the treatment effect changes code 1 from recoded value 1 to recoded value 2 columns 4 and 5 while the level for the s_by_t effect changes also from 1 to 2 columns 7 and 8 At time 5 second elementary record animal fails code 1 5 parameter file 1 for COX and WEIBULL small paa This file is created by the PREPARE program It explains the structure of the small2 dat recoded file TIME 1 CODE 2 ID 3 COVARIATE class va
7. The text between the two delimiters may be on more than one line Sometimes the statement or option names are longer than 8 characters e g the statement read_old_solutions shown below In this case only the first 8 characters i e read_old in our example are significant to the programs the rest may or may not be written by the user FILES file file file3 file4 filed gt title Title of analysis strata variable rho fixed value MODEL lt variables coefficient variablevalue only statistics random variable estimate moments distribution rules parameterc s repeat sequence for next variable lt s gt gt integrate out lt joint mode gt variable lt within variable gt include only variable value value test lt type lt s gt of lest std error dense hessian baseline kaplan survival file file8 option residual file7 files constraints options convergence criterion storage on disk or in core store solutions read old solutions store binary 30 CHAPTER 2 PROGRAMS AND WEIBULL 11 2 FILES Statement FILES fie file file lt file4 filed file6 The FILES statement is used to define the names of the files needed for running COX or WEIBULL file is the name of the recoded data file file of FILES Statement in PREPARE This file must exist in your directory file2 is the name of the file containing the original and new codes after internal
8. TIMEDEP treat I4 IDENTIFICATION VARIABLE F R EACH RECORD IDREC id DISCRETE VARIABLES CLASS treat sex DEFINITION OF AN INTERACTION COMBINE s_by_t sex treat LIST OF RECODED VARIABLES OUTPUT longev id code treat sex s_by_t FORMAT OF THE OUTPUT FILE FORMOUT FREE_FORMAT The treatment variable is explicitely defined as a time dependent covariate with changes indicated as integers 14 not dates The COMBINE statement defines a new variable s_by_t representing the interaction of sex by treatment The id does not appear in the CLASS statement and therefore will not be recoded Log file from the PREPARE program The file name small par is typed when PREPARE asks for an input file The log file indicates that 17 elementary recoded records were created gt 11 because treatment is time dependent The number of classes is also indicated for the discrete class variables Records read from input data file 11 Elementary records written 17 Class variable treat with 2 classes Class variable sex with 2 classes Class variable s_by_t with 4 classes REMINDER File small2 dat Sorting order has to be sorted for the use of COX 59 Strata variable ascending only if strata are used Time variable descending Censoring variable ascending 4 Recoded data set small2 dat 4 1112112 1120120 6 101 6 111 110 e 90 10 Q Q to t5 B F H
9. 41 73 strata SUPPLY INPUT 52 53 SUPPLY_OUTPUT 53 SURVIVAL 41 syntax 27 TEMPDIR 53 72 TEST 39 59 73 TIME 15 28 57 time dependent 6 7 13 17 18 20 24 29 55 69 time unit 10 17 TIMECOV 16 18 TIMEDEP 16 20 TITLE 33 triplet 20 55 TRUNCATE 17 truncated record 17 57 UNFORMATTED 22 24 25 30 44 72 UPPER_CASE 15 32 USUAL RULES 36 variable name 15 variable type 15 variance parameter 36 Wald test 65 WEIBULL 8 13 24 27 30 67 73 Weibull distribution 6 7 17 WEIBULLC 8 9 27 30 WITHIN 17 37 39 71
10. As already mentioned for extremely large applications for which the number of elemen tary records is huge a modified version of WEIBULL weibulle f denoted hereafter WEIBULLC was written using public domain C subroutines for compressing and de compressing data during I O operations a similar version for the Cox model was not implemeted as the Cox model is not suited for huge applications Except for the format specification in parameter file 1 which is the direct result of the use of PREPAREC see above the parameter files for WEIBULLC are ezactly the same as for WEIBULL 10 Parameter file 1 As pointed out above this parameter file is produced by PREPARE file of the FILES statement of PREPARE Note that you normally will not want to change anything in this file This section is written to aid the user in understanding the file 10 11 Syntax The following statements are used in this parameter file 25 26 CHAPTER 2 PROGRAMS AND WEIBULL TIME position CODE position ID position or PREVIOUS_TIME position COVARIATE variable nlevels posi pos2 variable nlevels pos pos2 discrete scale joincode variable variable format_type 10 2 TIME Statement TIME position position is a figure which labels the position of the dependent TIME variable on the recoded data file file of the FILES statement of PREPARE Note The position of the time variable is always 1 10 3 CODE Statement CODE posi
11. elementary records possibly stored in core at the same time is specified by the parameter NRECMAX in the parinclu file 11 21 STORE SOLUTIONS and READ OLD SOLUTIONS Statement STORE SOLUTIONS 42 CHAPTER 2 PROGRAMS AND WEIBULL READ_OLD_SOLUTIONS These two statements are only useful for very large applications applications running for hours and days If the STORE SOLUTIONS statement has been used to write the final solutions to file6 of the FILES statement the READ OLD SOLUTIONS statement may be used in the next run to retrieve those solutions from filed of the FILES statement The model does not need to be the same in both runs This is mainly useful when one is fitting a complex model from a simpler one or in connection with the ESTIMATE option of the RANDOM statement When the results of the estimation indicate that the true value of the distributional parameter for the log gamma or Normal distributions lies outside the prespecified interval the values of the interval may be changed and using READ OLD SOLUTIONS the optimisation may proceed from the point reached already For storage of solutions not only at the end of the optimisation procedure but after each iteration an alternative approach has to be taken In the parinclu file a character parameter BACKUP is defined This parameter may be used to define the name of a file where solutions will be stored each iteration If the solutions are needed e g in a restart after a s
12. small nco OUTPUT small rwe SO F F 5k K K KOK KOR KOR R k kk kk k k k k k k k k k k kk kk k FAILURE TIME READ IN COLUMN 1 CENSORING CODE READ IN COLUMN CHAPTER 4 A SMALL EXAMPLE IDENTIFICATION NUMBER READ IN COLUMN TOTAL NUMBER OF COVARIATES CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS DISCRETE COVARIATES INCLUDED IN THE ANALYSIS COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS POWERS OF CONTINUOUS COVARIATES O Q 0 MODEL COVARIATE CHARACTERISTICS 1 sex DISCRETE TIME INDEPENDENT READ IN COLUMN 6 2 treat DISCRETE TIME DEPENDENT READ IN COLUMNS 4 AND 5 SIMPLE STATISTICS TOTAL NUMBER OF WEIBULL PARAMETERS TO COMPUTE 1 TOTAL NUMBER OF RECORDS 11 TOTAL NUMBER OF ELEMENTARY RECORDS 17 RIGHT CENSORED RECORDS 2 18 182 MINIMUM CENSORING TIME 11 MAXIMUM CENSORING TIME 21 AVERAGE CENSORING TIME 16 000 UNCENSORED RECORDS 9 MINIMUM FAILURE TIME 5 MAXIMUM FAILURE TIME 28 AVERAGE FAILURE TIME 15 667 EFFECT sex MIN 1 MAX 2 EFFECT treat MIN 1 MAX 2 1 WEIBULL PARAMETER S RHO WILL BE ESTIMATED CONVERGENCE CRITERION USED 10000D 07 DEFAULT VALUE NUMBER OF ELEMENTARY RECORDS KEPT 17 STATISTICS LEVEL EFFECT NUMBER 7 OBSERVED AGE AT TOTAL AVERAGE OF FAILURES FAILURE TIME TIME INDIVIDUALS
13. 1972 Regression models and life table J Royal Stat Soc Series B 34 187 220 with discussion Cox D R and Oakes D 1984 Analysis of survival data Chapman and Hall London UK Cox D and Snell E J 1966 A general definition of residuals J Royal Stat Soc Series 30 248 275 with discussion Ducrocq V and Casella G 1996 A Bayesian nalysis of mixed survival models Genet Sel Evol 28 505 529 Ducrocq V and S lkner J 1994 The Survival Kit a FORTRAN package for the analysis of survival data In 5th World Cong Genet Appl Livest Prod Volume 22 pages 51 52 Dep Anim Poultry Sci Univ of Guelph Guelph Ontario Canada Ducrocq V and S lkner J 1998 The Survival Kit V3 0 a package for large analyses of survival data In 6th World Cong Genet Appl Livest Prod Volume 27 pages 447 448 Anim Genetics and Breeding Unit Univ of New England Armidale Australia Kalbfleisch J D and Prentice R L 1980 The statistical analysis of failure time data John Wiley and sons New York USA Klein J P and Moeschberger M 1997 Survival analysis Springer Verlag New York USA Liu D C and Nocedal J 1989 On the limited memory BFGS method for large scale optimization Mathematical Programming 45 503 528 Maddala G S 1983 L mited dependent and quantitative variables in econometrics Cambridge Univ Press UK 3 74 BIBLIOGRAPHY Perez
14. 5 RHO_FIXED Statement RHO_FIXED value This statement can be used only with the WEIBULL program with COX a warning message is printed and the statement is ignored It specifies a fixed value for the Weibull parameter p For example RHO_FIXED 1 0 will constrain p to be 1 therefore defining an exponential regression model instead of a more general Weibull model This statement is ignored when a grouped data model is fitted then p is always assumed to be 1 11 6 MODEL Statement MODEL lt variables gt The MODEL statement specifies the independent variables affecting the dependent vari able described in the TIME statement remember that only one dependent variable may be specified there The variables listed must be a subset of the variables defined in the COVARIATE statement of Parameter file 1 Model building capabilities Discrete class and continuous variables may be included 32 CHAPTER 2 PROGRAMS AND WEIBULL No covariate name at all is also accepted no covariate if no variable is specified between MODEL and the semi colon the COX model can be used to simply compute a nonparametric estimate of the survivor curve KAPLAN statement see below for each stratum The WEIBULL program with no covariate specified will lead to the fit of a 2 parameter Weibull function for each stratum discrete covariates they have to be stated in the CLASS and COMBINE statements of the parameter file for PREPARE continuous cov
15. 7 36 MVEC BF 50 MXEF 48 MXEF_USED 48 72 MXSTRA 48 NBUG_2000 10 15 54 NDIMAX 49 NDIMAX2 40 49 NDIMSPAR 51 new code 32 58 NITER_GAUSS 50 NONE 43 NORMAL 36 normal distribution 7 36 NPGAUSS 37 50 NRECBLOC 23 45 52 72 NRECMAX 43 48 72 NRECMA X2 48 NSTIMAX 49 NTIMMAX 49 number of corrections 50 NZE 40 51 ON DISK 43 72 ONLY STATISTICS 35 OUTPUT 20 OUTPUT FILE 53 partial likelihood 6 PC 23 PEDIGREE 21 32 polynomial regression 34 prediction 21 Prentice and Gloeckler s model 6 9 10 16 24 29 30 33 34 37 INDEX PREPARE 8 13 56 71 PREPAREC 8 9 23 30 71 PREVIOUS TIME 28 proportional hazards model 6 7 33 QUANTILES 41 Quasi Newton 8 61 R2 of Maddala 64 RANDOM 35 38 random effect 7 17 24 35 37 40 random effects 6 8 17 24 rank 42 READ_BINARY 45 READ_OLD_SOLUTIONS 32 36 44 73 relationship matrix 7 19 21 36 RESIDUALS 42 59 66 results 32 RHO_FIXED 33 73 risk ratio 65 second timing subroutine 9 SEQUENTIAL 39 59 sire model 21 36 sire dam model 19 21 35 36 sire mgs model 19 21 30 35 36 SIRE_ DAM_MODEL 36 sort 23 33 38 44 58 sparse matrix 8 40 51 61 73 SPECIFIC 42 standard error 8 40 65 STATISTICS 35 67 STD 40 STD_ERROR 40 STORAGE 43 72 STORE BINARY 72 STORE PREVIOUS TIME 17 25 28 38 STORE_SOLUTIONS 32 36 44 73 STRATA 23 24 33
16. E EE E GK EE SMALL EXAMPLE MODEL SO F F 5k K K KOK KOR KOR R k kk KOR k k k k k k k k k k kk kk k FILES USED RECODED DATASET small2s dat NEW OLD CODES small nco OUTPUT small rco CHAPTER 4 A SMALL EXAMPLE RESIDUALS RECODED DATASET small2 dat OUTPUT small res SO OK OR ok ook KOR k k k k kk k k k k kk kk k NO STRATIFICATION FAILURE TIME READ IN COLUMN 1 CENSORING CODE READ IN COLUMN N IDENTIFICATION NUMBER READ IN COLUMN TOTAL NUMBER OF COVARIATES CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS DISCRETE COVARIATES INCLUDED IN THE ANALYSIS COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS POWERS OF CONTINUOUS COVARIATES O Q 0 MODEL COVARIATE CHARACTERISTICS 1 sex DISCRETE TIME INDEPENDENT READ IN COLUMN 6 2 treat DISCRETE TIME DEPENDENT READ IN COLUMNS 4 BEFORE T AND 5 AFTER CONVERGENCE CRITERION USED 10000D 07 DEFAULT VALUE SIMPLE STATISTICS TOTAL NUMBER OF STRATA 1 TOTAL NUMBER OF ELEMENTARY RECORDS 17 RIGHT CENSORED RECORDS 2 18 182 MINIMUM CENSORING TIME 11 MAXIMUM CENSORING TIME 21 AVERAGE CENSORING TIME 16 000 UNCENSORED RECORDS 9 MINIMUM FAILURE TIME 5 MAXIMUM FAILURE TIME 28 AVERAGE FAILURE TIME 15 667 EFFECT sex MIN 1 MAX 2 EFFECT treat MIN 1 MAX 2 Then the first few recoded elementary record
17. Enciso Mizstal and Elzo M A 1994 Fspak an interface for public domain sparse matrix subroutine In 5th World Cong Genet Appl Livest Prod Volume 22 pages 87 88 Dep Anim Poultry Sci Univ of Guelph Guelph Ontario Canada Prentice R and Gloeckler L 1978 Regression analysis of grouped survival data with application to breast cancer data Biometrics 34 57 67 Schemper M 1992 Further results on the explained variation in proportional hazards regression Biometrika 79 202 204 Index 34 13 31 parinclu file 9 13 40 45 72 algebraic integration 36 37 ALL_TIMES 41 animal id 17 animal model 21 36 AVAIL 40 51 BACKUP 44 53 73 BASELINE 41 59 66 baseline 6 7 binary file 23 44 BLOCKED_UNFORMATTED 22 24 25 30 38 44 72 Bug year 2000 10 15 54 C subroutines 8 27 71 case sensitive 15 32 CENSCODE 16 censored record 16 CHECK 43 CLASS 19 34 56 class variable 7 29 34 CODE 28 57 COEFFICIENT 35 COMBINE 20 34 38 56 comment 13 31 compile 9 COMPRESSED 8 22 25 30 38 71 CONSTRAINT 8 42 60 65 continuous variable 7 19 29 34 CONVDT 19 convergence 43 50 61 75 convergence criterion 73 CONVERGENCE_CRITERION 43 50 61 COVARIATE 29 COX 8 13 23 27 30 59 73 Cox model 6 data preparation 56 date 13 15 18 20 degrees of freedom 42 DENSE HESSIAN 40 73 DETAIL IT 53 deviance
18. The JOINCODE statement specifies that the two variables indicated must be recoded together as if they pertained to a single list This is necessary e g when sires and maternal grand sires are specified as different variables in the input statement but must appear in a single recoded list sires can also appear as maternal grand sires and their additive genetic effect as grand sires is 0 5 times their effect as sires This is also one way out of 2 for the definition of sire dam models sires and dams are recoded together using the JOINCODE option and an overall relationship matrix is used for both sires and dams Both variable names must have been declared previously in the CLASS statement 8 12 TIMEDEP Statement TIMEDEP variable typediff variable typediff Time dependent covariates i e independent variables whose value may change during the time an individual is observed may be CLASS or continuous covariates are defined with this statement variable names must be defined in the INPUT statement 18 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE In the input file file any changes of value of these variables with time have to be given at the end of each record as triplets in the following way see also section about files e consecutive number of the time dependent covariate This means that one has to count on which position in the record the variable with its initial value is located and this value has to be suppli
19. a typical approach to describe the additive genetic values of individuals in animal breeding The log gamma was chosen because exp u is then gamma distributed a usual assumption for frailty models The two parameters of the gamma distribution are taken to be equal so that the expectation E exp u 1 With the normal distribution E u 0 The distribution parameters Gamma parameter for the log gamma distri bution and variance for the normal or multivariate normal distribution may be either prespecified or estimated alongside with the effects in the model Ducrocq and Casella 1996 Several random effects may be specified in the same model but only one may involve a relationship matrix Random effects may also be time dependent Although the expression frailty term is used to generally describe random effects in survival anal ysis it was originally introduced to account for individual heterogeneity of observations as is done by the error term in the linear model Such a term may also follow one of the distributions mentioned above and is technically treated in the same way as the other random effects Strata Stratification may be used to separate groups of individuals with different baseline hazard functions Ag t where 7 is serving as group indicator Together with time dependent co variates this is another mean of relaxing the required assumption of proportional hazards for all individuals over the total observation period Only one va
20. alternate formulation IDREC STORE PREVIOUS TIME is needed in the special case when later on in the WEIBULL program a log gamma random time dependent covariate is integrated out and only then The beginning of each interval is then stored at the place where the identity of the record is normally stored Note ifthe INTEGRATE OUT variable WITHIN variable statement is used in the WEIBULL parameter file the basic form IDREC variable is maintained no need of STORE PREVIOUS TIME 8 7 TRUNCATE Statement In the TRUNCATE statement information is provided for handling left truncated records For these records the origin point e g date of birth is known but occurs before the beginning of the study period so that the covariate status between origin point and begin of study period is unknown It has the following form TRUNCATE variable variable holds the time of the truncation point It must be of type D6 or D8 If of type 14 it specifies the time between the beginning of the observation and the point in time when the individual actually entered the study truncation point If of type D6 or D8 it holds the truncation date and time will be calculated as difference between this 16 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE date and the date specified in the first argument of the time statement types must be identical If the variable is coded 0 in the data set the record is treated as untruncated independently fro
21. case it may make sense c to have MXEF USED smaller that mxef it must not be larger to save core memory space c PARAMETER MXEF_USED 10 c MXSTRA max number of strata i e levels of the strata variable c defined in the STRATA statement PARAMETER MXSTRA 25 NDIMAX max total number of levels of effects in the model c size of the vector of solutions PARAMETER NDIMAX 200 AT NDIMAX2 this parameter is used to indicate whether the Hessian c matrix is to be stored or not It is used in and WEIBULL with the c DENSE HESSIAN statement c The Hessian is needed to calculate standard deviations of estimated c effects STD ERROR statement and to find dependencies option FIND c in the CONSTRAINT statement WEIBULL the Hessian is normally stored in sparse form see parameters AVAIL and NZE below except when stating DENSE_HESSIAN c ndimax2 ndimax gt Hessian is needed and fully stored c ndimax2 1 Hessian is not needed or sparsely stored CAUTION this is likely to be limiting parameter for the size of your problem An incorrect specification of this parameter is is also one of the most frequent sources of errors when using the c Survival Kit c
22. file is the name of the original data file must exist in your directory e file is the name of the recoded file produced by the program to be used by subse quent programs COX or WEIBULL This file may be much larger than the original data file when time dependent covariates are used in the analysis as so called ele mentary record are written spanning the times between any changes in time depen dent covariates e file3 is the name of a file containing the original and new codes after internal recoding for each variable in the class statement The file name is required even when no class statement is defined In this case it will be an empty file e file is the name of a file containing information about the variables in the data set to be used by subsequent COX or WEIBULL It is one of two parameter files required by those programs 13 Note for UNIX users If the UPPER_CASE flag in file parinclu see later for a detailed description of the parameters in parinclu is set to TRUE all lowercase letters in the parameter file are transformed into upper case internally to avoid troubles with variable names typed in different ways by the user This implies that on systems where file names are case sensitive mainly UNIX the file name of file has to be upper case to be recognised by the program Also the names of the files produced by PREPARE file to file4 will be upper case even if stated lower case in the FILES statement If t
23. is possible to transform the grouped data model into a form close to an exponential regression model including a particular time dependent covariate time_unit with changes at every time point The PREPARE and WEIBULL programs have been modified to accomodate such a model note however that the resulting model is not a Weibull model The only change that is required from the user is the specification that the time scale is discrete keyword DISCRETE_SCALE in the first parameter file The use of D6 ddmmyy for mats with years larger or equal to year 2000 is now possible option NBUG 2000 in the parinclu file e n version 3 12 April 2000 another set of small bugs was corrected The statement SIRE DAM MODEL was made obsolete JOINCODE does the same thing and should be prefered It is now possible to use both at the same time the statements STRATA and INTEGRATE QUT WITHIN 6 Disclaimer The Survival Kit can be freely used for non commercial purposes provided its use is being credited Ducrocq and Solkner 1994 Ducrocq Solkner 1998 and can be freely distributed Use it at your own risk There is no technical support but questions can be directed to Dr Vincent Ducrocq Station de Genetique Quantitative et Appliquee Institut National de la Recherche Agronomique 18352 Jouy en Josas France Phone 33 1 34 65 22 04 Fax 33 1 34 65 22 10 email Vincent Ducrocq dga jouy inra fr Or Dr Johann Solkner Univ
24. line of the data preparation parameter file small par see below the first variable for each line corresponds to the animal identification number not necessarily from 1 to n as it is the case here The second repre sents failure or censoring time The third is the censoring code here 0 censored Column 4 refers to the level of sex effect and column 5 to the level of treatment effect It is assumed that treatment does not start at time 0 but later on Hence the treatment variable is considered as a time dependent covariate This explains why the fifth column is always 0 here the animal is not treated at t 0 The next column indicated the number of changes in time dependent covariates Here there is only 1 change per animal at most When there is a change column 6 1 a triplet follows indicating the column number of the relevant covariate column 5 treatment the time of change e g t 4 for record 1 and the new value of the covariate 10 here at time 4 the treatment level for animal 1 changes from the value 0 to the value 10 53 54 2 Data preparation parameter file small par CHAPTER 4 A SMALL EXAMPLE This is the file that will be used as input for the PREPARE program FILE NAMES FILES small dat small2 dat small nco small paa VARIABLE NAMES INPUT id I4 longev I4 code I4 sex I4 treat I4 TIME VARIABLE TIME longev CENSORING CODE CENSCODE code 0 TIME DEPENDENT COVARIATES
25. make all appropriate changes to the include statements include parinclu for of all subroutines of all programs It is also often preferable to use Fixed Formats instead of Free Formats Programs PREPAREC and WEIBULLC require the compiled C subroutines of zlib These are supplied with the Survival Kit 5 Last Changes e In version 3 0 the programs PREPAREC and WEIBULLC were made available for very large applications A few bugs were corrected and some features like the JOINCODE COEFFICIENT RESIDUALS INTEGRATE_OUT WITHIN statements choice of input output formats inclusion of groups of unknown parents use of left truncated records with COX were added e In version 3 1 the main addition was the possibility to properly analyze discrete failure time data Discrete failure times with very few distinct observed val ues occur for example when survival time is expressed in years or parities Then it becomes more difficult to find proper parametric proportional hazards models and the semi parametric approach of Cox Cox model is no longer suitable an exact calculation of the partial likelihood is not feasible in general in presence of many ties The approach of Prentice and Gloeckler 1978 for grouped data is much more satisfying a full likelihood is written involving a limited number of parameters describing the baseline hazard function Using a reparameterization of 10 CONTENTS Prentice and Gloeckler s model it
26. of the recoded file For the use in COX or sometimes in WEIBULL the recoded file produced by PREPARE file2 of the FILES statement has to be sorted The sorting order is different for and WEIBULL sorting routine is not included in the Survival Kit as it 18 often machine dependent The specification of a fixed format PORMOUT FIXED FORMAT may be necessary when compiling the Survival Kit with some PC compilers where the output of a single record in free format is put into separate lines when it exceeds 80 characters You have to find out 9 1 Program This is the sorting order from highest to lowest level for e STRATA variable in ascending order this is necessary only when a STRATA statement is used in COX see below e TIME variable in descending order the TIME variable is always the first variable in the recoded file e CENSCODE variable in ascending order the censoring variable is always the sec ond variable in the recoded file This sort prevents the use of UNFORMATTED on most computers BLOCKED_ UN FORMATTED or COMPRESSED formats 22 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE 9 2 WEIBULL Program Several special cases exist These cases are mutually exclusive e A STRATA statement is used in WEIBULL The sorting order should be STRATA variable in ascending order IDREC variable in ascending order the IDREC variable is always the third variable in the recoded file
27. recoding for each CLASS variable file of FILES Statement in PREPARE The file name is required even when no CLASS variables were defined in PREPARE In this case it will be an empty file file3 is the name of the output file where the results of the analysis are listed file4 is the recoded pedigree file file8 of the PEDIGREE Statement of PREPARE This file is not obligatory and will only be needed for analyses where the additive genetic relationship structure between individuals is considered 5 is only needed when for very large applications solutions from a previous run are to be read in because the READ OLD SOLUTIONS statement is used see below file6 is only needed when for very large applications solutions are to be stored for a subsequent run with the STORE SOLUTIONS statement see below Files 1 2 must exist in your directory files 4 and 5 must exist when subsequent statements make use of them file 3 and 6 will be new files Note existing files with the same name are overwritten without warning In situations where files 5 or 6 are required but not files 4 or 4 and 5 dummy names have to be stated for the latter ones Note for UNIX users Unless the UPPER CASE flag in file parinclu see later for a detailed descriptions of the parameters in parinclu is set to TRUE all lowercase letters in the parameter file are transformed into upper case internally to avoid troubles with variable names typed in different
28. residual 8 42 66 discrete scale 6 9 10 16 29 30 discrete variable 7 19 29 34 DISCRETE SCALE 10 16 29 30 EFFECT 40 73 elementary record 13 17 56 71 elementary records 29 EPS BFDEF 43 50 EQUAL SPACE 42 ESTIMATE 35 36 38 78 explained variation 64 FILES 14 32 FIND 42 FIXED FORMAT 22 30 format 22 30 FORMOUT 22 frailty 6 7 36 FREE_FORMAT 22 30 fspak 40 51 FUTURE 21 25 41 76 Gauss Hermite quadrature 37 GENERAL 39 generalized residual 8 27 42 66 genetic parameter estimation 36 goodness of fit test 66 group of unknown parents 22 grouped data 6 9 10 16 24 29 30 33 34 37 hazard function 6 herd year effect 38 ID 28 IDREC 17 21 25 28 38 56 72 IMPOSE 43 65 IN_CORE 43 72 INCLUDE_ONLY 39 INPUT 15 55 INPUT FILE 52 53 INTE ST 47 INTEGRATE_OUT 17 24 36 38 71 73 interaction 20 34 38 56 71 intercept 34 ITER_BF 50 JOINCODE 19 21 30 JOINT_MODE 36 38 73 KAPLAN 34 41 Kaplan Meier estimate 41 Laplacian integration 36 large dataset 6 8 27 38 44 71 LARGEST 42 LAST 40 59 likelihood ratio test 8 39 59 64 LINE SEARCH FAILED 48 log gamma distribution 7 17 24 27 35 38 40 LOGGAMMA 24 35 Maddala 64 martingale residual 8 42 66 INDEX MAXFIG 18 MGS_RULES 36 minimization 8 61 MODEL 34 59 MOMENTS 36 73 MULTINORMAL 36 multivariate normal distribution
29. the recoded 21 9 1 COX Program 21 9 2 WEIBULL Program 22 9 3 Sorting of future 8 23 2 Programs COX and WEIBULL 25 10 Parameter fle 1 25 10 1 25 10 2 TIME Statement 26 10 3 CODE Statement 26 10 4 ID Statement and PREVIOUS_TIME Statement 26 10 5 COVARIATE Statement csl 27 10 6 DISCRETE_SCALE Statement 27 10 7 JOINCODE Statement 27 10 8 Format Statement 28 11 Parameter file 2 28 11 1 29 11 2 FILES Statement 30 11 3 TITLE Statement 30 11 4 STRATA 31 11 5 RHO_FIXED Statement 31 11 6 MODEL Statement 31 11 7 COEFFICIENT Statement 32 11 8 ONLY gt 5 5 Statement 33 11 9 RANDOM Statement 33 11 10 INTEGRATE_OUT Statement 35 CONTENTS 5 11 11 INCLUDE_ONLY Statement 3T 11 12 TEST Statement 3T 11 13 STD ERROR and DENSE_HESSIAN Statements 38 11 14 BASELINE Statement 38 11 15 KAPLAN Statement
30. treatment while the restricted model includes sex only and the corresponding likelihood ratio statistic has a value of 3 1395 This statistic follows a chi square distribution with 1 degree of freedom under H0 treat ment effects 0 with a corresponding p value of 0 07 The R2 OF MADDALA isa measure of proportion of explained variation by the model Maddala 1983 quoted by Schemper 1992 Its formal definition is 2 n Ri 1 Ln Lu where n denotes the total sample size and Lg and Ly represent the restricted and unrestricted full model maximum likelihoods respectively Under moderate cen soring A4 can be grossly interpreted as the usual for linear models 63 Then the estimates of the regression coefficients are presented followed by their asymptotic standard error a chi square statistic square of estimate standard er ror for a Wald test of each particular regression coefficient 2 which tests 9 0 with its associated p value The risk ratio is the exponential of the estimate of the regression coefficient The last column indicates the number of uncensored records for each level it is related to the standard error of the estimate SOLUTIONS 4444444 COVARIATE ESTIMATE STANDARD CHI2 PROB RISK UNCENSORED ERROR gt CHI2 RATIO FAILURES 1 sex DISCRETE 1 0000 1 000 5 2 2 0251 1 0316 3 85 0496 132 4 2 treat DISCRETE 0 1 5583 9027 2 98 0843 210 4 10 0000 1 000 5 For ex
31. valid Because the baseline hazard function can then be described with few parameters these can be estimated together with b and u using an approach due to Prentice and Gloeckler 1978 PREFACE 7 The second case corresponds to a Weibull model a common type of regression model that has been shown to be flexible and often adequate for biological data Fixed covariates Any number of fixed covariates is supported They may either be discrete class variables or continuous There is no explicit limit to the number of levels of a discrete covariate the limit will usually be a function of the computer memory avalaible An intercept grand mean is always implicitly included in the Weibull mode when there is only one stratum and no covariate specified this intercept is equal to p log where p A are the two Weibull parameters Covariates may be time dependent x t where the dependency is modelled through piecewise constant hazard functions with jumps at times corresponding to calendar dates e g January Ist or linked to the individual itself e g starting or stopping of smoking when the effect of smoking on survival is investigated Random covariates The random discrete covariates in vector z t may be defined to follow a log gamma or a normal distribution They may also follow a multivariate normal distribution where the covariance structure between individuals is modelled by the matrix of genetic rela tionships
32. 0211D 01 1 696D 03 1 000D 00 1 202 04 3 4 1 411350124D 01 2 899D 07 1 000D 00 2 054D 08 4 5 1 411350124D 01 2 847D 12 1 000D 00 2 017D 13 THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS CONVERGENCE AFTER 4 EVALUATIONS OF FUNCTION FCOX IH 00 SECONDS ako KR KK Rk KK Rok kk kK kok ok kk kk kk kk kk ok kok ok k k k oe k ok kk kok ok R N 4 NUMBER OF CORRECTIONS 15 INITIAL VALUES F 1 411D 01 GWORM 2 542D 00 ako KR KK Rk KK Rok kk kK kok ok kk kk kk kk kk ok kok ok k k k oe k ok kk kok ok R I FUNCT GNORM STEPLENGTH CONV CRITERION 1 2 1 230257068 01 1 122 00 3 934 01 6 511D 02 2 3 1 172877696D 01 2 155D 01 1 000D 00 8 071D 03 3 4 1 170091695D 01 6 500D 02 1 000D 00 2 212D 03 4 5 1 169961281D 01 2 672D 02 1 000D 00 8 924 04 5 6 1 169943368D 01 3 665D 04 1 000D 00 1 226D 05 6 7 1 169943364D 01 9 265D 06 1 000D 00 3 099D 07 7 8 1 169943364D 01 1 296D 07 1 000D 00 4 334D 09 8 9 1 169943364D 01 2 662D 10 1 000D 00 8 905 12 THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS CONVERGENCE AFTER 8 EVALUATIONS OF FUNCTION FCOX 01 SECONDS VALUE OF F 11 6994336385 TIME FOR THE COMPUTATION OF THE HESSIAN 00 SECUNDS TIME FOR NUMERICAL FACTORIZATION 00 SECONDS TOTAL TIME 05 SECUNDS BYE BYE 10 Results from COX The beginning is exactly the same as the log file files names model characteristics simple statistics constraints SO F F 5k K K KOK KOR KOR R k kk kk k
33. 1 3 57 93613232 3 9439 1 0470 3013 treat 4 55 40054940 2 5356 1 1113 4451 LIKELIHOOD RATIO TESTS LAST VARIABLE TOTAL 2 LOG LIK CHI2 DELTA PROB R2 OF Z DF EXCLUDING 7 DF gt CHI2 MADDALA sex 3 61 28410415 5 8836 1 0153 0527 treat 3 57 93613232 2 5356 1 1113 3013 The main difference iin WEIBULL compared to is the inclusion of estimates of the Weibull parameter and plog called INTERCEPT here These two parameters fully describe the baseline hazard function and the baseline survivor function So t ezp At SOLUTIONS 4444444 WEIBULL PARAMETER S 68 CHAPTER 4 A SMALL EXAMPLE FOR STRATUM 1 gt RHO 3 38225 STE 97667 INTERCEPT 8 07086 STE 2 65373 COVARIATE ESTIMATE STANDARD CHI2 PROB RISK UNCENSORED ERROR gt CHI2 RATIO FAILURES 1 sex DISCRETE 1 0000 1 000 5 2 2 1895 1 0001 4 79 0286 112 4 2 treat DISCRETE 0 1 2901 8108 2 53 1116 275 4 10 0000 1 000 5 TOTAL TIME 10 SECONDS Chapter 5 Analysis of very large data sets This section includes some advice to efficiently analyze very large data sets 12 PREPARE program Usually computing time and memory requirements related to the PREPARE program are not limiting factors The only potential problem that may occur is related to the size of the output recoded data file because the use of time dependent covariates may lead to a much larger number of elementary record
34. 1 6 54 55 5 55 56 12 40 42 20 12 17 2 5 45 45 4 44 44 19 75 100 57 80 20 00 0 11 100 00 4 44 44 18 00 135 78 03 12 27 10 6 54 55 55 56 13 80 38 21 97 6 33 Note that these statistics should be interpreted with caution for time dependent covariates for example here 11 individuals had a record with level of treatment 67 effect 0 all animals at t 0 while 6 of them also had a record with level of treatment effect 10 CONSTRAINTS THE SOLUTION FOR THE FOLLOWING RECODED LEVELS IS SET TO LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT WARNING THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED IF THEY ARE MORE DEPENDENCIES DEGREES OF FREEDOM IN TEST S BELOW ARE INCORRECT EFFECT sex LEVEL 1 EFFECT treat LEVEL 10 CONVERGENCE AFTER 19 EVALUATIONS OF FUNCTION FWEIB IN 02 SECONDS CONVERGENCE AFTER 16 EVALUATIONS OF FUNCTION FWEIB IN 03 SECONDS CONVERGENCE AFTER 17 EVALUATIONS OF FUNCTION FWEIB IN 02 SECONDS RESULTS 2 LOG LIKELIHOOD 55 40054940 STANDARDIZED NORM OF GRAD 2 LOG L 00000 TESTS LIKELIHOOD RATIO TEST FOR HO BETA MODEL CHI2 6 479457 WITH 4 DF PROB gt CHI2 1661 R2 OF MADDALA MEASURE OF EXPLAINED VARIATION 4451 LIKELIHOOD RATIO TESTS SEQUENTIAL VARIABLE TOTAL 2 LOG LIK CHI2 DELTA PROB R2 OF Z DF INCLUDING Z DF gt CHI2 MADDALA HO COVARIATE 2 61 8800066
35. 98773 1 6698773 2 6698773 It is also possible to get individual residuals i e for each observation including martingale and deviance residuals by simply specifying STORAGE ON_DISK in the small coz file 65 TIME ANIMAL CENSORING STRATUM GENERALIZED MARTINGALE DEVIANCE CODE RESIDUAL RESIDUAL RESIDUAL 5 1 1 1 4563280 5436720 6940770 10 2 1 1 4439302 5560698 7155672 11 3 1 1 9773750 0226250 0227979 11 4 0 1 2815734 2815734 7504310 15 5 1 1 1208301 8791699 1 5711144 15 6 1 1 5264675 4735325 5797123 17 7 1 1 1 2108388 2108388 1976130 19 8 1 1 1 6299547 6299547 5317941 21 9 1 1 3103519 6896481 9802044 21 10 0 1 1 0211752 1 0211752 1 4291083 28 11 1 1 2 0211752 1 0211752 7968640 12 parameter file 72 for Weibull The following parameter file small wei now replaces small FILES FILES small2 dat small nco small rwe TITLE TITLE SMALL EXAMPLE WEIBULL MODEL MODEL MODEL sex treat STATISTICS TEST TEST SEQUENTIAL LAST STANDARD ERROR STD_ERROR 13 results from WEIBULL The beginning is similar to the Cox outputs except that here raw statistics by level of each discrete factor included in the model are given SO F F 5k K K KOK KOR KOR R k kk kk k k k k k k k k k k kk kk k SMALL EXAMPLE WEIBULL MODEL SO F F 5k K K KOK KOR KOR R k kk KOR k k k k k k k k k k kk kk k FILES USED RECODED DATASET small2 dat NEW OLD CODES
36. ATA PREPARATION PROGRAM PREPARE till death or censoring of an individual must be of type 14 e TIME variable variable2 variable is the time variable as above variable is a date D6 or D8 indicating the beginning of the observation must also be contained in list of input variables This is necessary for models with time dependent covariates when changes in risk are given by dates and not by times in the TIMECOV or TIMEDEP statements e TIME variable variable variables variable is the name of the time variable It may or may not be contained in the list of variables in the INPUT statement variable and variable3 are dates D6 or D8 indicating the beginning and end of the observation They must have been defined in the INPUT statement If variable is not found in the input statement its value will be calculated as difference between end and beginning of the observation Note Only one dependent variable may be specified If different dependent variables are of interest separate analyses have to be run all of them starting with the data preparation step 8 4 CENSCODE Statement CENSCODE variable value In the CENSCODE statement the censoring variable is defined This is an indicator variable The variable name must be identical to one of the names specified in the INPUT statement and must contain the censoring code must be of type 14 value defines a number indicating right censored records All other values of the
37. EVELS IS SET TO LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT WARNING THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED IN CASE THEY ARE MORE DEPENDENCIES 62 CHAPTER 4 A SMALL EXAMPLE THE DEGREES OF FREEDOM IN THE TEST S BELOW ARE INCORRECT EFFECT sex LEVEL 1 EFFECT treat LEVEL 10 CONVERGENCE AFTER 6 EVALUATIONS OF FUNCTION FCOX IN 00 SECONDS CONVERGENCE AFTER 4 EVALUATIONS OF FUNCTION FCOX IN 00 SECONDS CONVERGENCE AFTER 8 EVALUATIONS OF FUNCTION FCOX IN 01 SECONDS The first part of the results includes all the likelihood ratio tests RESULTS 2 LOG LIKELIHOOD 23 39886728 STANDARDIZED NORM OF GRAD 2 LOG L 00000 TESTS LIKELIHOOD RATIO TEST FOR HO MODEL CHI2 6 368872 WITH 2 DF PROB gt 12 0414 R2 OF MADDALA MEASURE OF EXPLAINED VARIATION 4395 LIKELIHOOD RATIO TESTS SEQUENTIAL VARIABLE TOTAL 2 LOG LIK CHI2 DELTA PROB R2 OF Z DF INCLUDING Z DF gt CHI2 MADDALA HO COVARIATE O 29 76773961 sex 1 26 53835455 3 2294 1 0723 2544 treat 2 23 39886728 3 1395 1 0764 4395 LIKELIHOOD RATIO TESTS LAST VARIABLE TOTAL 2 LOG LIK CHI2 DELTA PROB R2 OF Z DF EXCLUDING Z DF gt CHI2 MADDALA sex 1 28 22700247 4 8281 1 0280 1307 treat 1 26 533835455 3 1395 1 0764 2544 For example when the treatment effect is added to a model with sex as only co variate the full model includes sex
38. Hessian MVEC_BF Values between 3 and 20 might be c used PARAMETER MVEC_BF 15 EPS_BSDEF Default convergence criterion may be overridden using c the CONVERGENCE CRITERION statement in COX or WEIBULL c IGRADIENT FUNCTION lt EPS BS max 1 BETAI I C PARAMETER EPS_BFDEF 1 D 8 maximum number of iterations PARAMETER ITER_BF 1000 c NO_LOG relates to the parameter rho of the WEIBULL distribution c log 1 gt rho is used in the maximisation routine c no log 0 gt log rho is used in the maximisation routine PARAMETER NO_LOG 1 c c Parameters used in the numerical integration to get moments of the c marginal posterior of a random effect variance parameter c when the MOMENTS option in the RANDOM statement is used INTEGER NPGAUSS NITER_GAUSS NPGAUSS number of points used during Gauss Hermite integration has to be between 3 and 5 PARAMETER NPGAUSS 5 49 NITER_GAUSS number of iterations in the Gauss Hermite integration process PARAMETER NITER_GAUSS 5 Parameters relating to sparse storage of the Hessian matrix in WEIBULL INTEGER NDIMSPAR INTEGER AVAIL NZE NDIMSPAR working vector to store intermediate elements used in the recoding in PREPARE or during the sparse computation of c the HESSIAN in Weibull this should be few times bigger than NDIMAX no genera
39. The Survival Kit V3 12 User s Manual Vincent Ducrocq and Johann Solkner July 12 2001 Contents 6 1 Class of models supported 6 2 Main 6 3 The programs 8 4 Hardware 9 5 Last 9 6 Disclaimer 2 2 2 10 7 Availability 10 1 Data Preparation Program PREPARE 11 8 Syntax ee 11 8 1 FILES Statement 12 8 2 INPUT Statement 13 8 3 TIME Statement 13 8 4 CENSCODE 14 8 5 DISCRETE_SCALE Statement 14 8 6 IDREC Statement 15 8 7 TRUNCATE 15 8 8 TIMECOV Statement 16 8 9 16 8 10 CLASS Statement 17 8 11 JOINCODE Statement 17 8 12 TIMEDEP Statement 17 4 CONTENTS 8 13 COMBINE Statement 18 8 14 OUTPUT Statement 18 8 15 FUTURE Statement 19 8 16 PEDIGREE 19 8 17 FORMOUT 20 9 Sorting of
40. als with 12 digits including 5 decimal figures The other admitted types are FREE FORMAT UNFORMATTED BLOCKED UNFORMATTED COMPRESSED e FREE FORMAT is the default option e FIXED FORMAT is useful when you need to sort output files but your sorting routine is not really adequate when the records are not strictly presented in columns especially on PCs Be careful if the formats for integers and or reals are not big enough the results will be incorrect Note the previous way version 2 0 to produce a recoded file in fixed format through a parameter in the parinclu file is no longer available e UNFORMATTED writes binary files This saves time and some space when writ ing 21 e BLOCKED_UNFORMATTED reads and writes NRECBLOC binary records to gether NRECBLOC is specified in the parinclu file It is the most efficient option it is much faster than when records are written or read one at a time e With COMPRESSED blocs are compressed before being written and decompressed after being read COMPRESSED is available only when the program PREPAREC calling subrou tines for compression decompression is used It takes about 3 times longer to read and write compressed files but these are substantially smaller The BLOCKED UNFORMATTED and COMPRESSED options should not be used if the output file s must be sorted In this same situation the UNFORMATTED option can be used only if a binary sort subroutine is available 9 Sorting
41. ample females sex 2 are at a 1 132 7 577 lower risk of dying than males sex 1 For ease of interpretation it is a good idea to impose constraints to force the computation of specific contrasts For example one may be interested in evaluating the increase in risk of males compared to females and not treated compared to treated by choosing non treated females as a reference This is done by using the CONSTRAINTS IMPOSE statement in the parameter file small cox To define the proper constraints one has to look in the small nco file sex 2 old code is recoded as 2 new code and treatment 0 old code is recoded as 1 new code If we add CONSTRAINTS IMPOSE sex 2 treat 1 to small cor and running COX we now get SOLUTIONS 4444444 COVARIATE ESTIMATE STANDARD CHI2 PROB RISK UNCENSORED ERROR gt CHI2 RATIO FAILURES 1 sex DISCRETE 1 2 0251 1 0316 3 85 0496 7 577 5 2 0000 1 000 4 2 treat DISCRETE 0 0000 1 000 4 10 1 5583 9027 2 98 0843 4 751 5 64 11 CHAPTER 4 A SMALL EXAMPLE Now it directly appears that males are 7 577 times more likely to fail than females and that treated animals are at a 4 751 times higher risk than non treated ones The contrasts between males and females or between treated and non treated ani mals remain the same The baseline however is changed The one reported below corresponds to the situation without explicit constraints
42. ams 1 In order to limit memory requirements it is essential to accurately specify some parameters in the parinclu file In particular e NRECMAX maximum number of elementary records stored in core at one time This number should be relatively large in particular it should be larger or equal to the exact number of elementary records if the statement STORAGE IN_CORE is used However for obvious reasons this number should not exceed the memory capacity e MXEF_USED maximum number of covariates actually used in the MODEL statement of COX or WEIBULL Often the recoded files created with PRE PARE contain many variables but these are never actually used together in COX or WEIBULL In such a case MXEF_USED should be equal to the strict maximum number of effects fitted together 2 For fast Input Output e I O operations on free formatted files are relatively slow In order to effi ciently read and write the data file the options UNFORMATTED or even better BLOCKED_UNFORMATTED should have been used in PREPARE or the STORE BINARY statement should be used the first time COX or WEIBULL is run In the latter case the recoded data file is copied as a blocked binary file and the format in the first parameter file must be changed to BLOCKED UNFORMATTED NRECBLOC records are written and read together This parameter is specified the parinclu file It should be rather large several thousands in order to play the role of a buffer bu
43. ams of the SURVIVAL KIT ok ok 2 ok ke okc ok ok ok ok ok ok okc ok ke kk kkk ke 2 kkk k kk kk kkk kkk k kk kkk kkk IMPLICIT NONE c PARAMETERS THAT MAY BE MODIFIED BY THE USER c Remember when you change any of the parameters you have to recompile c your programs INTEGER MXSTRA NTIMMAX NSTIMAX 45 46 CHAPTER 3 THE PARINCLU FILE INTEGER NRECMAX NRECMAX2 MXEF MXEF_USED NDIMAX NDIMAX2 c NRECMAX maximum number of elementary records Caution with time dependent variables this number may be much larger c than the number of records in your original data file c PARAMETER NRECMAX 100000 c c NRECMAX2 maximum number of elementary records in the future c data file FUTURE statement in PREPARE These will normally be much fewer than in the data file c PARAMETER NRECMAX2 100 c MXEF max number of covariates These are discrete c CLASS covariates and continuous covariates c Time dependent covariates have to be counted twice because the states before and after change occupy two positions on c the recoded data file c PARAMETER MXEF 20 c MXEF_USED max number of covariates actually used in the MODEL statement of COX or WEIBULL There may be situations where you create recoded files with very many covariates with PREPARE and never use all of them together in COX WEIBULL In this
44. ariates all variables in the MODEL which have not been specified via CLASS and COMBINE Polynomial regression models may be fitted by stating vari able N where N is the power to which the covariate value is taken For example for a third order polynomial in varz the statement would be MODEL varz varz 2 varz 3 any number of other discrete and continuous covariates Covariables are always automatically centered for computations i e their mean value is subtracted from each observation For polynomial expressions the values are first taken to the power of N the mean value is then calculated and subtracted for this new variable The mean value or the sum of mean values if there are more than just one continuous covariate is printed in the output of COX and is incorporated to the intercept in the output of WEIBULL This is important to know when you want to draw the regression curves interactions between discrete covariates They may only be fitted by combining the orig inal values of two or three variables into a new one via the COMBINE statement of PREPARE Interactions between class and continuous covariates i e individual regressions and nested models are currently not supported Variables are treated as fixed unless they are stated in the RANDOM statement described below When a grouped data model Prentice and Gloeckler s model is fitted an extra time dependent covariate time_unit wil be automatically generated and will ap
45. clu Then you must change the format statement in parameter file 1 from what it is to BLOCKED_UNFORMATTED In the subsequent runs the STORE_BINARY statement is deleted and file is replaced by its new name i e with bin at the end Note how ever that you cannot sort the blocked binary file anymore This means that the models should not differ in the definition of the STRATA statement or of the INTEGRATE_OUT statement these are sorting criteria Note the statement READ_BINARY used in previous versions of the Survival Kit is now obsolete the fact that the file to be read is unformatted is indicated in the parameter file 1 used in COX or WEIBULL 44 CHAPTER 2 PROGRAMS AND WEIBULL Chapter 3 The parnclu file In the parinclu file indexparinclu parameters are defined which are used for dimensioning of vectors and arrays They may be redefined to fit your personal needs and computational facilities Criteria like iteration numbers and stopping rules are defined and may be redefined Finally file and directory names may be supplied for redirecting special input and output files The variables and their settings are explained subsequently If you make any changes to parinclu you have to recompile prepare f coz f and weibull f or preparec f and weibullc f CR ok ok ok ok ok ok 5k ak ak ak ak ak aK KOR c Definition of parameters to used in the various progr
46. e analysis should be repeated changing the values of the bounds Example you stated parameters 0 01 0 1 0 001 as lower bound upper bound and tolerance which means that the program must look for the mode of the marginal posterior density in the interval 0 01 0 1 and the searching process is stopped when the current interval for the estimate is smaller than 0 001 If the program output tells you the mode is between 0 0100 and 0 0109 and the best value is 0 0104 you need to reset the bounds e g to 0 005 and 0 011 and start again The STORE SOLUTIONS and READ OLD SOLUTIONS statements described below may be used to avoid starting from scratch again When the INTEGRATE_OUT JOINT_MODE statement is used only with a Weibull model and for a log gamma random effect the ESTIMATE statement should not be used the gamma parameter is then estimated jointly with the other effects after exact algebraic integration of the log gamma random effect MOMENTS this option may be used together with ESTIMATE to compute estimated mean standard deviation and skewness of the marginal persterior value of the distribu tional parameter The computation 1s based on an iterated Gauss Hermite quadrature of the approximate marginal posterior densities The number of points used for the integra tion is NPGAUSS which is specified in the parinclu file When the amount of information available for the estimation is limited unpredictable results if any may be obta
47. e handled by the program in version 3 0 and above 8 9 CONVDT Statement CONVDT vcariable variable variable variable variable variable With the CONVDT statement the difference between two date variables is calculated and written onto a new time variable The variable to the left of the equality sign must be a new name the variables to both sides of the minus sign must be dates of the same 17 type D6 or D8 This statement will be used for calculation of covariates e g at first calving from date of birth and date of calving but not for the dependent variable which is processed in the TIME statement Note the variable to the left of the minus sign is the date later in time e g date of first calving to the right is the one earlier in time e g date of birth 8 10 CLASS Statement CLASS variables The CLASS statement lists the names the classification variables to be used in the anal ysis Independent variables defined in the MODEL statements of the COX or WEIBULL programs which are not defined in the CLASS statement will be treated as continuous covariates All variables defined in the class statement will be internally recoded list of codes in file3 Codes are transformed back to original values for the listing of the results from COX and Weibull The recoded values are needed though in the CONSTRAINTS statement of COX and WEIBULL see there 8 11 JOINCODE Statement JOINCODE variable variable
48. ecords disappears This may lead to a substantial saving of time and allows the use of UNFORMATTED BLOCKED UNFORMATTED or COMPRESSED formats lalso applies to the grouped data model of Prentice and Gloeckler 1978 28 e General case In all other situations there is no need to sort the recoded data set is naturally recoded by IDREC variable and ascending TIME variable This allows the use of UNFORMATTED BLOCKED_UNFORMATTED or COMPRESSED formats 9 3 Sorting of future records The sorting of future records file6 of the FUTURE statement is identical to the sorting order for WEIBULL general case irrespective of whether COX or WEIBULL is used later on 24 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE Chapter 2 Programs COX and WEIBULL Depending on the parameterisation of the baseline hazard function either program COX or program WEIBULL may be called after data preparation with PREPARE Using the same recoded data set various alternative survival analyses may be carried out with both programs The programs require two different parameter files The first is produced by PREPARE The second parameter file essentially describes the model of analysis Most statements used in the second parameter file may be called by both programs a few mainly relating to estimation of the distributional parameters of log gamma distributed random effects may only be called by WEIBULL Generalized residuals are computed only by COX
49. ed e date or time of change in covariate value e new covariate value typediff relates to the way the difference between beginning of the observation and the actual change of the covariate is given If it is I4 it is given as time if it is D6 or D8 it is calculated as the difference between the date given in b and the date describing the beginning of the observation in the TIME statement both dates must be of the same type One of the most frequent problems errors that users encounter with the Survival Kitx is related to an inadequate preparation of these triplets for example with time of change 0 or even lt 0 or dates of change before the origin or the truncation point or after the failure date 8 13 COMBINE Statement COMBINE variable variable variable variable variable variable variable The COMBINE statement is used to to produce interaction effects between two or three variables defined in the CLASS statement Continuous variables are not allowed The variable to the left of the equality sign must be a new name If one of the variables combined is time dependent defined in the TIMEDEP statement the resulting variable will also be time dependent 8 14 OUTPUT Statement OUTPUT variables In the OUTPUT statement the variables defined in the variables list will be written to the output file file2 of the FILES statement This file will then be used for the analysis with the COX or WEIBULL
50. effect at a time This is done for each effect separately EFFECT list of variables the same type of test as with LAST is performed but only for effects stated in the list SEQUENTIAL and LAST may be stated together EFFECT may be stated either alone with SEQUENTIAL or with LAST When EFFECT is used with LAST it is redundant 38 CHAPTER 2 PROGRAMS AND WEIBULL If it is used with SEQUENTIAL the sequential inclusion of the effects in the model at a time will start with the first effect appearing in the MODEL statement which is stated in the list following EFFECT For example MODEL varl var2 var3 var4 var5 var6 TEST SEQUENTIAL EFFECT var4 var5 var6 will lead to likelihood ratio tests corresponding to the sequential introduction of variables var4 var5 var6 11 13 STD ERROR and DENSE HESSIAN Statements Asymptotic standard errors of estimates may be requested by the following statement STD ERROR Be cautious with the use of the STD_ERROR statement with large models as the Hes sian matrix has to be calculated and stored to calculate standard deviations of esti mates In the case of the COX program or the WEIBULL program with the particular DENSE_HESSIAN statement the full square Hessian matrix is stored Its dimension is specified by the parameter NDIMAX2 which is defined in the parinclu file and may become limiting with respect to storage capacity In the case of the WEIBULL program the Hessian matrix is stored
51. ersit t f r Bodenkultur Gregor Mendel Strasse 33 1180 Vienna Austria Tel 43 1 47654 3272 Fax 48 1 3105175 email soelkner mail boku ac at 7 Availability The programs and this manual can be retrieved on the Web in compressed form at http www boku ac at nuwi popgen or at http www sgqa jouy inra fr diffusions htm Chapter 1 Data Preparation Program PREPARE When called PREPARE will ask for the name of a parameter file or a default name can be prespecified in the parinclu file This parameter file provides the program with information about the files and variables to be used In particular the dependent and censoring variables are defined class variables and variables that may be combined into new variables are stated Information relating to the time scale may either be given in absolute values relative to the start of the individual observation or via dates In the latter case the date of the start of the observation has also to be given and the program will transform the information given though dates into information about time relative to the starting point of the observation Levels of variables defined in the CLASS statement will be recoded from 1 to no of levels for use by the analysis programs COX and WEIBULL When time dependent variables are defined or when it is specified that the time scale is discrete with few categories the program will cut the observation into pieces called elementary records each
52. gence criterion A careful examination of the variation of the FUNCT value during the last iterations will help to reveal that If FUNCT is still varying substantially the program failed to find a proper minimum often because the model is incorrect or inconsistent If FUNCT is stable one can safely assume that the results are correct and that a minimum has been found DATA SRR GGG a FOR E E E E EE EE EE EE KK ak ak N 4 NUMBER OF CORRECTIONS 15 INITIAL VALUES F 1 488D 01 GNORM 2 289D 00 SRR GGG a FOR E E E E EE EE EE EE KK ak ak I NFN FUNCT GNORM STEPLENGTH CONV CRITERION 341197562D 01 6 570D 01 327040139D 01 5 823D 02 326918347D 01 4 121D 03 326917728D 01 3 349D 05 326917728D 01 1 974D 08 369D 01 4 898D 02 000D 00 3 128D 03 000D 00 2 154D 04 000D r00 1 747 06 000D 00 1 030D 09 m gt Q t B gt 60 CHAPTER 4 A SMALL EXAMPLE 6 7 1 326917728D 01 9 451D 14 1 000D 00 4 930D 15 THE MINIMIZATION TERMINATED WITHOUT DETECTING ERRORS CONVERGENCE AFTER 6 EVALUATIONS OF FUNCTION FCOX IH 00 SECONDS ako KR KK Rk KK Rok kk kK kok ok kk kk kk kk kk ok kok ok k k k oe k ok kk kok ok R N 4 NUMBER OF CORRECTIONS 15 INITIAL VALUES F 1 488D 01 GNORM 1 534D 00 ako KR ook ok KK Rok kok ok kk oe oko ok oe ok oe k ok kok ok k k o oe k ok kk kok ok R I FUNCT GNORM STEPLENGTH CONV CRITERION 1 2 1 411382471D 01 3 266D 02 6 517D 01 2 314D 03 2 3 1 41135
53. he INTEGRATE_OUT statement is used with the JOINT_MODE option as INTE GRATE_OUT JOINT_MODE effect in the example above then the gamma parameter is estimated Jointly with the other effects as the mode of the marginal posterior distribu tion of the gamma parameter effect2 effect3 and the Weibull parameters The value 10 0 specified in the RANDOM statement is just used as a starting value in the optimisation subroutine The advantage of this approach is that it performs exactly the marginalisation of the random effect which otherwise is done only approximately via Laplacian integration Ducrocq and Casella 1996 Note however that the fixed and other random effects as well as the Weibull parameters still have to be integrated out if one wants the full marginal posterior density for the gamma parameter of the log gamma distribution The combined use of RANDOM ESTIMATE and INTEGRATE OUT JOINT MODE offers an easy solution to the joint estimation of the distribution parameters of two random variables 36 CHAPTER 2 PROGRAMS AND WEIBULL Note In order to perform the algebraic integration the recoded data file file 1 must be sorted by increasing levels of the random effect to integrate out For very large datasets this sorting according to the levels of the random effect to integrate out may be cumbersome in particular because it is not compatible with the use of the BLOCKED UNFORMATTED and COMPRESSED options alternative appr
54. he UPPER_CASE flag is set to FALSE all names other than keywords which can be either upper or lower case used in parameter files are case sensitive they are not transformed 8 2 INPUT Statement INPUT variable type variable type variable type In the INPUT statement variables names and their respective types are defined The input file file is read in free format implying that variables have to be separated by blanks and that all variables in the data set whether needed or not in the current analysis have to be specified Variable names may be up to 8 characters long otherwise the name is truncated and may be comprised of any combination of characters numbers and symbols except for the following symbols The following variable types are allowed I4 integer number RA real number D6 date with 6 digits ddmmyy D8 date with 8 digits ddmmyyyy Note that the Y 2000 bug is avoided for D6 dates by adding 100 to the value of yy when yy is less than NBUG_2000 set in the parinclu file 8 3 TIME Statement TIME variable lt variable lt variable gt gt In the TIME statement the dependent variable is defined This is usually time but may be other things like lifetime production etc Time values of 0 should be avoided The time statement may have three different forms e TIME variable The variable must have been specified in the input statement and contains the time 14 CHAPTER 1 D
55. he definition of the time unit time dependent variable it may be more efficient to start again running PREPARE after deleting the DISCRETE SCALE statement in its own input parameter file 10 7 JOINCODE Statement JOINCODE variable variable 28 CHAPTER 2 PROGRAMS AND WEIBULL The JOINCODE statement indicates that the two variables variable and variable were recoded together in a single list This is the case e g when sires and maternal grand sires are specified as different variables but used in a sire maternal grand sire model sires may appear as maternal grand sires and their additive genetic effect as grand sires is 0 5 times their effect as sires This statement is the direct result of the usage of the JOINCODE statement in the PREPARE parameter file 10 8 Format Statement format_type format_type displays the format of the recoded output files from PREPARE in the PRE PARE parameter files these are file of the FILES statement and file6 of the FUTURE statement If a fixed format is chosen format_type is FIXED_FORMAT and is followed by the Fortran description of formats for integers and reals In Fn d e g FIXED FORMAT 18 F12 5 for integers with 8 digits and reals with 12 digits including 5 decimal figures The other possible expressions for format_type are FREE FORMAT default UNFORMATTED BLOCKED_UNFORMATTED COMPRESSED only with PREPAREC WEIBULLC 11 Parameter file 727 In th
56. ile is supplied true if the name of the input supply_file true gt parameter file is supplied at running time supply_file false gt name of parameter file is defined parinclu PARAMETER SUPPLY INPUTO true INPUT_FILEO name of the parameter file used only if c supply fileO false PARAMETER INPUT FILEO PARM1 Parameters related to the parameter and report files COX and WEIBULL LOGICAL UPPER_CASE DETAIL_IT LOGICAL SUPPLY INPUT1 SUPPLY INPUT2 SUPPLY OUTPUT CHARACTER 30 INPUT FILE1 INPUT FILE2 0UTPUT FILE CHARACTER 100 TEMPDIR BACKUP upper_case TRUE if all words including file names used in the c parameter file are to be converted to upper case otherwise the c variables in the statements are case sensitive PARAMETER UPPER CASE false 51 SUPPLY_FILE1 logical variable true if the name of parameter file 1 provided by PREPARE is supplied at running time false if supplied in parinclu PARAMETER SUPPLY INPUTi 2 true INPUT FILE1 name of parameter file 1 only if supply filel false PARAMETER INPUT_FILE1 para SUPPLY_FILE2 logical variable true if the name of parameter file 2 is supplied at running time false if supplied in parinclu PARAMETER SUPPLY INPUT2 true INPUT FILE2 name of parameter file 2 used only if supply file2 false PARAMETER INPUT FILE2 PARM2 SUPPLY_OUTPUT logical va
57. in sparse form unless the DENSE_HESSIAN statement is specified and the vector space required in order to do so is specified by the parameters AVAIL and NZE in the parinclu DENSE HESSIAN The full storage of the square Hessian matrix is the only option when the COX program is used the DENSE_HESSIAN statement is not required The default option for the WEIBULL program is the sparse storage of the Hessian unless the DENSE HESSIAN statement is used In cases where the memory is not limiting or when a log gamma random effect is algebraically integrated out leading to smaller but denser Hessian the DENSE HESSIAN option may save substantial computing time 11 14 BASELINE Statement BASELINE The statement specifies that the baseline hazard the baseline cumulative hazard and the baseline survivor functions should be computed and printed in the COX program These values are computed at each distinct value of the TIME variable In stratified analyses use of the STRATA statement separate baseline functions are calculated for strata 39 With the WEIBULL program the computation of the baseline Weibull parameters is always implicit the BASELINE statement is ignored 11 15 KAPLAN Statement KAPLAN The statement specifies that the product limit Kaplan Meier estimate of the sur vivor function should be computed and printed The Kaplan Meier estimator is a non parametric population estimator of the survivor function that does n
58. ined although the mode of the distribution may have been 39 successfully computed 11 10 INTEGRATE_OUT Statement INTEGRATE_OUT lt JOINT MODE gt variable lt WITHIN variable2 gt This option is only available in the WEIBULL program the COX program prints an error message and stops and is not applicable to a grouped data model When a random effect is assumed to follow a log gamma distribution in a Weibull model it is possible to algebraically integrate it out from the joint posterior density This technique decreases the number of parameters to estimate sometimes drastically Algebraic integration is equivalent here to absorption of a group of equations in systems of linear equations The consequences are similar a smaller system but usually much less sparse and no direct availability of the estimates of the effects integrated out other than the gamma parameter The INTEGRATE_OUT statement can be used alone i e only followed by the name of the random variable to integrate out must appear in the RANDOM statement too Then the gamma parameter of the log gamma distribution is assumed to be known For example MODEL effecti effect2 effects RANDOM LOGGAMMA effecti 10 0 INTEGRATE OUT effect 1 will lead to the integration of the log gamma distributed random variable effect and the estimation of effects effect2 and effect3 and of the Weibull parameters assuming a value of 10 0 for the gamma parameter of effect If t
59. integrated in the programs PREPARE is used to prepare the data for the actual analysis done with either COX or WEIBULL Data preparation includes recoding of class variables and in the presence of time dependent covariates splitting up individual records into so called elementary records with each elementary record covering only the time span from one change in any time dependent covariate to the next The recoded file may therefore have many more records than the original one The estimation of effects under the proportional hazards model described above is then performed by COX or WEIBULL depending on whether the baseline hazard function is assumed to be unspecified in the Cox model or it is assumed to follow the two parameter Weibull hazard distribution Specifications for both models are similar but it is compu tationally easier and less time consuming to estimate the parameters of distributions of the random effects under the Weibull model For extremely large applications e g national genetic evaluations the number of elemen tary records may become huge sometimes gt 100 millions when time dependent covari ates are used in the model In the Survival Kit version 3 0 and above modified versions of programs PREPARE and WEIBULL preparec f and weibullc f denoted hereafter as PREPAREC and WEIBULLC were written using public domain C subroutines for com pressing and decompressing data during I O operations zlib general purpose compres
60. is parameter file the statistical model used in the data analysis is described together with various options regarding storage statistical tests and output Most statements and options are valid for both COX and WEIBULL programs Therefore the same param eter file can be used with both programs However a few statements apply to only one of these programs and should be commented out when the other program is run in order to avoid warning or error messages These will be indicated below The grouped data model of Prentice and Gloeckler 1978 also be fitted using the WEIBULL program although it is not a Weibull model For the user the pa rameter file 2 has exactly the same form as for the Weibull model the use of the grouped data model being already specified in the parameter file 1 through the DIS CRETE SCALE keyword The latter will automatically generate an extra time dependent covariate time unit which will appear in the model do not repeat it in the MODEL state ment Note that a few statements are not applicable to such a model e g RHO_FIXED or INTEGRATE OUT 29 11 11 Syntax The following statements are used in this parameter file Statements written in capitals are obligatory Names between lt gt gt omitted when they are not needed The sequence of statements should be as shown Comments may be included in the parameter file The start of a comment is the end is i e This text is a comment
61. ividual two figures are needed time interval and time limit For example SURVIVAL EQUAL SPACE 5 100 will compute the value of the survivor curve at time 5 10 15 40 CHAPTER 2 PROGRAMS AND WEIBULL 95 100 SPECIFIC time lt s gt The estimate of the survivor curve will be computed at the specified times indicated for each individual Note The statements SURVIVAL and RESIDUALS are mutually exclusive and should not appear in the same parameter file 11 17 RESIDUALS Statement RESIDUALS file7 files The use of the RESIDUALS statement is so far limited to the COX program It specifies that the generalized residuals Cox and Snell 1966 of the records in file 7 should be computed and printed in the output file file 8 If all generalized residuals are requested general case file 7 is the unsorted recoded data file direct output of PREPARE If the option STORAGE ON DISK is used the output will also include the martingale residuals and the deviance residuals Klein and Moeschberger 1997 Note The statements RESIDUALS and SURVIVAL are mutually exclusive and should not appear in the same parameter file 11 18 CONSTRAINTS Statement CONSTRAINTS option For models not of full rank all models with fixed discrete class covariates constraints can be imposed upon the effects to be estimated to get a set of meaningfull easy to interpret estimable effects The program offers different options for setting those co
62. k k k k k k k k k kk kk k SMALL EXAMPLE COX MODEL SO F F 5k K K KOK KOR KOR R k kk kk k k k k k k k k k k kk kk k FILES USED RECODED DATASET small2s dat NEW OLD CODES small nco OUTPUT small rco RESIDUALS RECODED DATASET small2 dat OUTPUT small res SO F F 5k K K KOK KOR KOR R k kk kk k k k k k k k k k k kk kk k NO STRATIFICATION FAILURE TIME READ IN COLUMN 1 CENSORING CODE READ IN COLUMN 2 IDENTIFICATION NUMBER READ IN COLUMN 3 TOTAL NUMBER OF COVARIATES 3 CONTINUOUS COVARIATES INCLUDED IN THE ANALYSIS 0 DISCRETE COVARIATES INCLUDED IN THE ANALYSIS 2 COVARIATES READ BUT NOT INCLUDED IN THE ANALYSIS 1 POWERS OF CONTINUOUS COVARIATES 0 MODEL COVARIATE CHARACTERISTICS 1 sex DISCRETE TIME INDEPENDENT READ IN COLUMN 6 2 treat DISCRETE TIME DEPENDENT READ IN COLUMNS 4 BEFORE T AND 5 CAFTER T SIMPLE STATISTICS TOTAL NUMBER OF STRATA 1 TOTAL NUMBER OF ELEMENTARY RECORDS 17 RIGHT CENSORED RECORDS 2 18 182 MINIMUM CENSORING TIME 11 MAXIMUM CENSORING TIME 21 AVERAGE CENSORING TIME 16 000 UNCENSORED RECORDS 9 MINIMUM FAILURE TIME 5 MAXIMUM FAILURE TIME 28 AVERAGE FAILURE TIME 15 667 EFFECT sex MIN 1 MAX 2 EFFECT treat MIN 1 MAX 2 NUMBER OF ELEMENTARY RECORDS KEPT 17 CONSTRAINTS THE SOLUTION FOR THE FOLLOWING RECODED L
63. l rule PARAMETER NDIMSPAR NDIMAX 100 AVAIL declared dimension of working vector 15 used in fspak f the sparse matrix package of MISZTAL and PEREZ ENCISO the required number is a function of the number of nonzero elements c of the HESSIAN matrix c note this is the dimension of a VECTOR not a MATRIX PARAMETER AVAIL 10000 NZE declared dimension of the vector HESSIAN which includes c all nonzero elements of the matrix of second derivatives Often one can take NZE AVAIL c note this is the dimension of a VECTOR not a MATRIX PARAMETER NZE 10000 Parameters relating to the binary storage of the initial file when the statement STORE_BINARY is used INTEGER NRECBLOC c nrecbloc number of records per block when writting reading 50 CHAPTER 3 THE PARINCLU FILE c a binary file for fast 1 0 statements STORE_BINARY and c READ_BINARY c PARAMETER NRECBLOC 10000 c c Parameters related to the parameter file of PREPARE c c LOGICAL SUPPLY_INPUTO CHARACTER 30 INPUT_FILEO c c Normally the name of the parameter file will be requested by c program PREPARE The following parameters are designed to c circumvent this request and start the program using a defined name c for the parameter file SUPPLY_INPUTO is a logical variable defining the way the name of the parameter f
64. m data type Left truncated records are now in version 3 0 and above handled by both the WEIBULL and the COX programs 8 8 TIMECOV Statement TIMECOV variable type values The TIMECOV statement is needed when the hazard changes with time this may be calendar time or time as measured from the beginning of each observation independent from special stochastic events e variable defines the name of the timecov variable This must be a new name not specified previously in the INPUT or TIME statements e type may be either integer 14 or a date D6 or D8 e values indicates a list of values with a maximum number specified through the parameter MAXFIG in the parinclu file defining points in time when the hazard changes If type is integer values indicate changes in the hazard rate relative to the beginning of each observation If type is a date it determines points in the calendar time when the risk changes for all individuals The change on the time axis of the individual is then calculated as difference between the respective date given and the date indicating the starting point of the observation in the TIME statement Therefore when using a TIMECOV statement with type as a date the TIME statement must hold the date of the beginning of the observation as second argument and the date of the end of the observation as third argument The variable types of those arguments must be identical Note Only one TIMECOV statement can b
65. model MGS RULES relate to a sire model or a sire maternal grandsire model accounting for male relationships Note that STRE DAM MODEL that was used in earlier versions to relate sires and dams in a sire dam model in which sires and dams had been recoded separately is now obsolete and not accepted any more Instead simply use the JOINCODE option in PREPARE F and MULTINORMAL USUAL RULES sire for sire dam models When MULTINOR MAL is used information about the covariances between individuals has to be provided via a pedigree file file4 of the FILES statement parameter is the distribution parameter gamma parameter with the log gamma distri bution and the variance with normal or multivariate normal distributions It may be preset in which case one value of the parameter has to be provided or it may be estimated see ESTIMATE option below in which case three values have to be given see below ESTIMATE when stated after the variable name the distribution parameter is esti mated as the mode of its marginal posterior density which is approximated by Laplacian integration the parameter value is replaced by three values the first two values are the bounds of the interval to be searched the third value gives the final tolerance accuracy Using the ESTIMATE option the parameter of only one random variable may be estimated at one time If the estimated value is at one of the bounds prespecified a warning message is printed and th
66. nstraints LARGEST the program will set to zero the level of each discrete covariate with the largest number of uncensored failures This is the default procedure of handling constraints FIND the constraints are found by the program This will guarantee the correct number of constraints to produce a full rank Hessian matrix so that the tests for the full model as well as for individual effects are based on the correct degrees of freedom however the constraints found do not guarantee an easy interpretation of the results Linear dependencies in the Hessian matrix are found by performing a Cholesky decomposition For very large problems the storage of this matrix may be a limiting factor as for the computation of the standard error see the STD_ERROR statement NONE no constraints specified but implicit ones will appear if standard errors are to be computed i e a generalized inverse of the Hessian matrix will be used if needed 41 IMPOSE lt gt variable rec level variable rec level the constraints are supplied by the user variable defines the effect for which a constraint is to be set must be a CLASS variable that has been stated in the MODEL statement rec_level is the recoded value of the effect that should be constraint Note You have to look into the file that holds original and recoded levels of the CLASS effects file defined in the PREPARE parameter file to look up what is the recoded value of the le
67. o need to restrict the output format to free format no need to sort the recoded file The data preparation parameter file looks like for example INPUT herd I4 year I4 IDREC animal 14 37 CLASS herd year OUTPUT herd year FORMOUT BLOCKED UNFORMATTED and the parameter file 2 for WEIBULL MODEL year RANDOM year LOGGAMMA 10 0 INTEGRATE_OUT JOINT MODE year WITHIN herd The output of these two approaches would be exactly the same 11 11 INCLUDE ONLY Statement INCLUDE ONLY variable value value This option was developed in version 2 0 of the Survival Kit almost exclusively for the estimation of the gamma parameter of a log gamma time dependent random variable This was needed because the Weibull model dids not accept left truncated records Now that left truncated records are handled by WEIBULL as well INCLUDE_ONLY can be regarded as obsolete 11 12 TEST Statement TEST lt type lt s gt of test Hypothesis testing is generally performed via likelihood ratio tests The following types of tests may be requested by the TEST statement GENERAL test of the full model vs the model with no covariate This is the default option and will be used even without request SEQUENTIAL test of the effects included in the model in sequential order i e de pending on the sequence in the MODEL statement LAST likelihood ratio test comparing the full model with models excluding one
68. oach which avoids this sorting using the WITHIN statement has been developed when the effect to integrate out has a hierachical structure This will be better illustrated through the example of a random herd year season interaction effect The hard work approach is to have the herd and year effects included in the original data file The data preparation data file for PREPARE includes the following statements INPUT herd I4 year I4 IDREC STORE PREVIOUS TIME CLASS herd year COMBINE hy herd year OUTPUT FORMOUT FREE_FORMAT Each possible herd year combination is recoded separately and the recoded data file has to be sorted by increasing hy levels The use of IDREC STORE_PREVIOUS_TIME is necessary because the effect hy to integrate out is a time dependent covariate see the de scription of the IDREC STORE_PREVIOUS_TIME statement and the sorting rules The need to sort also forces the use of a free format for the output file Then in WEIBULL the integration of a random herd year season effect implies the following statements as suming a log gamma distribution with initial parameter 10 0 MODEL hy RANDOM hy LOGGAMMA 10 0 INTEGRATE_OUT JOINT_MODE Ay The alternative much easier way simply requires that the initial data set be sorted by herd and by herd only There is no need to recode each herd year interaction separately no need to specify STORE_PREVIOUS_TIME n
69. olutions at the end of each run STORE SOLUTIONS statement or use backup files BACKUP parameter in the parinclu file Start the next run from the old solutions READ OLD SOLUTIONS statement If only a few effects are added to a well known model don t use the time consuming TEST SEQUENTIAL or TEST LAST statements which specify that all the effects in the model should be tested starting from scratch Use the TEST lt SEQUENTIAL gt EFFECTS statement instead Check that the convergence criterion is not too strict by looking at the evolu tion of Log likelihood the function which is mimized to get the parameters estimates CONVERGENCE statement Be careful however an incorrect concergence criterion may lead to incorrect results With WEIBULL the use of the RHO_FIXED statement should lead to a faster convergence rate 4 When estimating the hyperparameter s of the distribution s of random effects e When a loggamma prior is used with the INTEGRATE_OUT statement the JOINT_MODE option which computes the gamma parameter jointly with the other effects should be prefered faster convergence e Don t choose huge initial intervals or too strict final tolerances with the ESTI MATE statement e Use the MOMENTS statement only if necessary or at least after a run to find the mode of the marginal posterior distribution of the hyperparameter 72 CHAPTER 5 ANALYSIS OF VERY LARGE DATA SETS Bibliography Cox D
70. ot take into account any of the effects stated in the MODEL statement The KAPLAN statement is permitted only with the COX program it is ignored otherwise 11 16 SURVIVAL Statement SURVIVAL file7 fileS option lt s gt The SURVIVAL statement may be used to get information about the survivor function of individuals with specific covariate values It is very useful to relate the estimated regression coefficients to a more conventional scale like median survival time or probability of survival to a certain age file7 is the name of the recoded file produced by program PREPARE on request in the FUTURE statement of PREPARE called file6 there file8 holds the results invoked by the use of the SURVIVAL statement The following options may be used QUANTILES value lt s gt With this option times at which specific values quantiles of the survivor curve are reached are calculated and printed for each individual specified in file7 The values must lie between 0 and 100 and will be divided by 100 to produce the value of the survivor function e g with a value of 25 S t 0 25 The median survival time can therefore be requested by the option QUANTILE 50 ALL_TIMES The estimate of the survivor curve will be computed at all times failure time or change in time dependent covariates indicated for each individual in file7 EQUAL_SPACE interval limit The estimate of the survivor curve will be computed at equally spaced times for each ind
71. pear in the model do not repeat it in the MODEL statement The purpose of this covariate is the estimation of the baseline hazard function at the same time as the regression parameters 11 7 COEFFICIENT Statement COEFFICIENT variable value variable value gt The COEFFICIENT statement specifies that the covariate variable will always be mul tiplied by the coefficient value This is especially useful when used in conjunction with the statement JOINCODE in PREPARE For example a typical sire maternal grand sire used in animal breeding will be obtained by specifying JOINCODE sire mgs in the 33 data preparation parameter file for PREPARE and COEFFICIENT mgs 0 5 Also the best possible way to define a sire dam model is JOINCODE sire dam in the data preparation parameter file for PREPARE and COEFFICIENT sire 0 5 dam 0 5 11 8 lt ONLY_ gt STATISTICS Statement ONLY STATISTICS STATISTICS These statements can be used only with the WEIBULL program they are ignored other wise They request the computation and the printing of a number of elementary statistics related to each level of the class variables specified in the MODEL statement number of observations number of observed failures age at failure average value of continuous covariates for all observations and for the uncensored ones etc If STATISTICS is preceded by ONLY the program will stop as soon as these values have been printed 11 9 RANDOM S
72. programs The variables must be of type I4 or R4 when defined in the INPUT statement All variables newly defined in subsequent statements TIME CONVDT COMBINE are also allowed No date variables D6 or D8 may be included into the output file 19 8 15 FUTURE Statement FUTURE filed file6 In the FUTURE statement you provide information about a set of records for which you want a printout on predicted values of the survivor curve quantile values or functional values of the survival function at given times e filed is the name of the original file holding the future records must exist in your directory The structure of the file must be exactly identical to that of file e file6 is the name of the recoded file produced by the program identical in structure to the file data file It will be used by COX or WEIBULL to produce values related to the corresponding survival functions Note if you have a future statement you also need to have an IDREC statement as elementary records have to be sorted by identity of the original records 8 16 PEDIGREE Statement In the PEDIGREE statement you give information about a pedigree file for including relationships between individuals into the model Depending on the model to be used later it can have two alternative forms PEDIGREE file7 fileS variable for sire or animal models or for sire dam models if JOINCODE sire dam is used PEDIGREE file7 file8 variable variable2 for
73. ranging from one change in any of the time dependent covariates to the next 8 Syntax The following statements may be used with PREPARE Statements written in capitals are obligatory Names between lt gt gt are omitted when they are not needed The sequence of statements should be as shown Comments may be included in the parameter file The start of a comment is the end is i e This text is a comment The text between the two delimiters may be on more than one line No more than 80 characters should appear on one line but a same statement can cover several lines each statement ends with a semi colon 11 12 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE FILES fie file filed files INPUT variable type variable type variable type TIME variable variable variable CENSCODE variable value discrete scale idrec variable truncate variable timecov variable type values convdt variable variable variable variable variable variable class variables joincode variable variable timedep variable typediff variable typediff combine variable variable variable variable variable variable variable OUTPUT variables future file5 file6 pedigree file7 fileS variable variable formout type 8 1 FILES Statement FILES fie file file3 files The FILES statement is used to define the names of the files needed in the process of data preparation for the actual survival analysis e
74. riable true if the name of the detailed output with information about iterations is supplied at running time PARAMETER SUPPLY OUTPUT true OUTPUT FILE name of output file used only if supply output false PARAMETER OUTPUT FILE TEMPDIR name of the directory used for temporary files jf current directory PARAMETER TEMPDIR BACKUP name of the file used to backup solutions Solutions will be stored after each iteration and may be used to restart the iteration procedure in case the machine breaks down during the maximisation you have to use the READ OLD SOLUTIONS statement of or WEIBULL then Only useful for very large applications backup if no backup file should be produced PARAMETER BACKUP backup 52 CHAPTER 3 THE PARINCLU FILE DETAIL_IT switch on if true or off if false the c printing of detailed information for each effect at each iteration of the maximization subroutine average mean or max absolute change c between iterations PARAMETER DETAIL IT false PARAMETERS THAT MAY _NOT_ BE MODIFIED BY THE USER c Don t you dare look beyond this point
75. riable may be chosen as strata variable The number of strata is not restricted In addition to estimation of fixed and random effects and their distribution parameters 8 CONTENTS the Survival Kit offers options for calculating asymptotic standard errors of effects only for smaller models where the matrix of second derivatives may be actually set up a sequence of likelihood ratio tests and different ways of setting constraints to deal with dependencies in the model As a special feature different values of the survivor function may be estimated for individuals with preset covariate structure This way it is possible to calculate estimated median survival time for example or survival probability to a specified age for combinations of covariate values of special interest Generalized mar tingale and deviance residuals Cox and Snell 1966 Klein and Moeschberger 1997 can also be computed 3 The programs The Survival Kit mainly consists of a set of three Fortran programs called prepare f coz f and weibull f denoted as PREPARE COX and WEIBULL subsequently and a file parinclu holding parameter definitions that is included in each of the programs via a Fortran include statement The package works stand alone i e does not rely on any subroutines from mathematical subroutine libraries The optimisation routines used are partly taken from public domain subroutine libraries Liu and Nocedal 1989 Perez Enciso et al 1994 and are
76. riables treat 2 4 5 sex 2 6 6 s_by_t 4 7 8 FREE_FORMAT 56 CHAPTER 4 A SMALL EXAMPLE For example there are 2 levels of treatment stored in columns 4 before any change and 5 after the change The sex effect is time independent only one column 6 is needed file small nco This file is also an output file from the PREPARE program It gives the orrespon dence between old and new codes treat 21 0 0 0 11 1 treat 21 10 0 0 6 2 sex 21 1 0 0 6 1 sex 21 2 0 0 2 s_by_t 42 1 0 0 6 1 s_by_t 42 1 10 0 3 2 s_by_t 42 2 0 0 3 s_by_t 42 2 10 0 3 4 For each level of a discrete covariate the name of the covariate e g s_by_ t for the last line is followed by the total number of levels of this effect 4 the number of columns involved in its description 2 because it is a 2 term interaction and the corresponding original codes sex 2 x treatment 10 Up to 3 effects can be combined into an interaction Column 7 indicates the number of elementary records found with this particular level of the covariate 3 and column 8 last one gives the new code 4 recoded data set for COX small2s dat This is the recoded data set small2 dat after sorting by decreasing time change column 1 and increasing codes column 2 no stratification assumed NNNOCOCOOFRFR P F F N F Q F F N N Q D F KW O N Q O m 51120120 4 1112112 8 parame
77. s corresponding to each time interval for which all time dependent covariates remain constant than the actual number of records in the initial data set In order to limit the size of the recoded data set 1 in the OUPUT statement specify only the covariates that you will need later 2 because at this stage the PREPARE program requires all main effects to be in cluded in the OUTPUT statement when an interaction is created with the COM BINE statement it may be advantageous to recode the interactions beforehand if the main effects are not to be used later In contrast if the interaction effect is going to be treated as a random log gamma effect that will be integrated out in a Weibull model it is possible to avoid the recoding of the interaction in PREPARE if INTEGRATE_OUT WITHIN is used later on 3 for extremely large applications the program PREPAREC with the statement FOR MOUT COMPRESSED should be prefered as the output recoded file will be written in compressed format using C subroutines In some applications the compressed file may be 20 to 30 times smaller but computing times are typically increased by a factor 2 5 to 3 69 70 CHAPTER 5 ANALYSIS OF VERY LARGE DATA SETS Also unless when an individual random effect is to be fitted later on the ID number should not be recoded i e it should not appear in the CLASS statement only in the IDREC and OUTPUT statements 13 Before running the COX or WEIBULL progr
78. s are printed as well as the constraints used here the COX program chose to set to zero level 1 of sex and level 10 of treatment because they are the levels with the largest number of uncensored obser vations DATA 1 2810 20 2 2100 20 3 2110 10 4 1910 10 5 1710 10 6 15 10 12 7 15 10 12 59 8 1510 10 9 1510 20 NUMBER OF ELEMENTARY RECORDS KEPT 17 CONSTRAINTS THE SOLUTION FOR THE FOLLOWING RECODED LEVELS IS SET TO LEVEL WITH LARGEST NUMBER OF UNCENSORED FAILURES FOR EACH EFFECT WARNING THE VALIDITY OF THESE CONSTRAINTS IS NOT CHECKED IN CASE THEY ARE MORE DEPENDENCIES THE DEGREES OF FREEDOM IN THE TEST S BELOW ARE INCORRECT EFFECT sex LEVEL 1 EFFECT treat LEVEL 10 Then limited storage quasi Newton Liu and Nocedal 1989 minimization of FUNCT minus the likelihood function starts for each sub model GNORM is the norm of vector of first derivative of FUNCT CONV CRITERION is equal to GNORM max 1 FUNCT Let CC be the requested convergence criterion The program stops the minimiza tion process in the following situations e CONV CRITERION is less than 0 01 CC e CONV CRITERION is less than CC and the total number of FUNCT evalu ations is at least equal to the number of corrections 15 below e After many attemps usually 20 the program fails to find a smaller value of FUNCT This is very often due to too strict a conver
79. sed in the parameter files mimic the SAS command language 1 Class of models supported The models supported by the Survival Kit belong to the following class of univariate proportional hazards models with a single continuous or discrete response time A t x t z t 2 exp x t b z t uj 1 where A t x t z t is the hazard function of an individual depending on time t Ao t is the baseline hazard function x t is a vector of possibly time dependent fixed co variates with corresponding parameter vector b z t is a vector of random possibly time dependent covariates with corresponding parameter vector u For a detailed statistical presentation of the methodology of survival analysis see Cox 1972 Prentice and Gloeckler 1978 Cox and Oakes 1984 Kalbfleisch and Prentice 1980 Klein and Moeschberger 1997 2 Main features Baseline hazard function The possibly stratified baseline hazard function Ao t may either be unspecified or follow a Weibull hazard distribution Ao 4 A p A t In the first case if the failure time variable is t is coninuous 1 defines a Cox model Estimates of b and are obtained using what is known as a partial likelihood a part of the full likelihood in which the baseline hazard function does not appear When the failure time variable is discrete with few categories and many observations with the same failure time ties the Cox s partial likelihood approach is no longer
80. sion library version 1 0 4 J Gailly and M Adler web page http quest jpl nasa gov zlib Our experience is that the resulting programs are about 3 times slower but compression is extremely efficient since the compressed files may take up to 20 or 30 times less disk space similar version for the Cox model or for the grouped data model of Prentice and PREFACE 9 Gloeckler 1978 was not implemeted as they are not really suited for huge applications 4 Hardware requirements The programs have been written in Fortran 77 and have been tested on PC using Lahey s Fortran compiler and on several UNIX platforms No system routines are used except a timing subroutine second for UNIX platform which can be replaced or switched off without any consequence The size of the program may be varied through changes in parameters affecting the maximum number of records and maximum number of levels of effects to be estimated These parameters are defined in a single file called parinclu and included in each program using a Fortran INCLUDE statement To make the changes effective and only then it is is necessary to recompile the programs For PC compilers making use of extended memory are favorable To be used on most PCs it is important to change the extension f into for for all programs e g prepare for cox for weibull for At least on the Lahey s Fortran compiler one should also rename the par nclu file into parinclu for and
81. sire dam models e file is the name of the original pedigree file must exist e file8 is the name of the recoded pedigree file produced by the program e variable is the name of the random variable animal or sire in the data file must be stated in INPUT and OUTPUT e variable and variable2 for sire dam models unless JOINCODE is used are the names of the sire and dam variables in the data file must be stated in INPUT and OUTPUT FORM OF THE PEDIGREE FILE The pedigree file has ALWAYS to consist of four columns 20 CHAPTER 1 DATA PREPARATION PROGRAM PREPARE e identity of animal e 2 sex code in sire dam models 1 male 2 female 1 for group of unknown parents may be used for other purposes in other situations 3 identity of sire 4 identity of dam or maternal grandsire Unknown parents are defined as 0 or with negative values corresponding to groups of unknown parents Important there should be one line defined for each animal each parent and each group of unknown parents 8 17 FORMOUT Statement FORMOUT type FORMOUT FIXED FORMAT In Fn d With the FORMOUT statement the format of the recoded output files is specified these are file2 of the FILES statement and file6 of the FUTURE statement If a fixed format is chosen the type FIXED FORMAT must be followed by the Fortran description of formats for integers and reals In Fn d e g I8 F12 5 for integers with 8 digits and re
82. t not irrealis tically large e g if NRECBLOC 100000 when the actual number of records is 100001 will lead to the writing and reading of two blocks of size 100000 e When the STORAGE ON_DISK statement is used a blocked binary temporary file is created with NRECMAX records by block Again NRECMAX should be large but not too large If space is limiting on the directory where the programs are run the name of a work directory should be specified in the parinclu file parameter TEMPDIR 71 14 COX and WEIBULL programs 1 If the assumption of a Weibull baseline hazard function is reasonable it is highly recommended to use a Weibull model instead of a Cox model in particular when standard errors or distribution parameters of random effects are requested In order to decrease the size of the vectors of parameters e Use stratification if possible STRATA statement e With WEIBULL when a random effect is included in the model whose es timates are not really needed a loggamma prior should be used if possible and the effect should be integrated out INTEGRATE OUT statement this is analogous to the absorption of the equations corresponding to the specified effect in a linear system of equations Note that like for an absorption the re sulting Hessian matrix is no longer very sparse Hence the DENSE_HESSIAN statement should also be used 3 To limit the number of iterations or function evaluations required e Store s
83. tatement RANDOM variable ESTIMATE lt MOMENTS gt gt distribution rules gt parameter lt s gt lt repeat previous sequence for next variable lt s gt gt The RANDOM statement gives information on variables to be treated as random The distribution parameters of the random covariates are either assumed to be known or they may be estimated optional parameter ESTIMATE distribution allows to specify the distribution that the random variable is assumed to fol low The user may choose from 3 alternatives LOGGAMMA the levels of the random effect follow log gamma distribution This is identical with assuming a gamma frailty term The frailty term say w is defined as a mul tiplicative term to the usual hazard function with fixed effects only i e w exp z t u NORMAL the levels of the random effect are independently normally distributed This assumption is not so common in survival analysis but for rather large gamma parameters y gt 10 the log gamma and normal are similar and the normal distribution is needed to make the step to the multivariate normal distribution described next MULTINORMAL rules the levels of the random effect follow a multivariate normal distribution with covariances between levels being induced by genetic relationships 34 CHAPTER 2 PROGRAMS COX AND WEIBULL Two types of relationships are allowed and may be stated via the rules parameter USUAL RULES are the relationships under an animal
84. ter file 2 for COX 57 FILES FILES small2s dat small nco small rco TITLE TITLE SMALL EXAMPLE COX MODEL MODEL MODEL sex treat TEST TEST SEQUENTIAL LAST STANDARD ERROR STD_ERROR OTHER COMPUTATIONS BASELINE RESIDUAL small2 dat small res The input data file is the sorted recoded data file small2s dat The model sim ply includes sex treatment effects no interaction here The results will be stored in small rco Likelihood ratio tests will be performed comparing models with no covariates with sex only and sex and treatment TEST SEQUENTIAL and com paring the full model with restricted model excluding sex with treatment only and excluding treatment with sex only TEST LAST After computing the re gression coefficients using Cox s partial likelihood the baseline will be computed BASELINE and generalized residuals will be calculated for all records in the un sorted file small2 dat and they will be stored in the small res file log file from COX This file appears on the screen terminal if the answer NONE is given when a name for the detailed output file is requested The file starts with the names of the 2 input parameter files and other file names followed by a brief list of characteristics of the data set of the model and some elementary statistics parameter file small paa parameter file small cox LEETE EEEE EEE EEEE EE EE EE EEEE a E E E
85. tion position is a figure which labels the position of the censoring code on the recoded data file file2 of the FILES statement of PREPARE In this file censored records are always coded 0 uncensored records are coded 1 but other codes are also used 2 for the first elementary record refering to a truncated observation 1 for an elementary record indicating a change in a time dependent covariate Note The position of the censoring code is always 2 10 4 ID Statement and PREVIOUS TIME Statement The two statements are mutually exclusive ID position PREVIOUS TIME position position labels the position of the record identification code ID Statement or of the beginning of the time period for an elementary record in the special case described in the IDREC STORE PREVIOUS TIME statement of program PREPARE required when a time dependent covariate is integrated out in the WEIBULL program and only then Note The position of the identification variable is always 3 27 10 5 COVARIATE Statement COVARIATE variable nlevels pos pos2 variable nlevels pos pos2 The COVARIATE statement gives information about the covariates that might be used in the model of analysis These are all variables in the OUTPUT statement of the PREPARE parameter file except for the TIME and CODE variables described above The covariates are listed one per line first continuous and then class covariates variable is the name of the covariable nle
86. variable indicate death failure 8 5 DISCRETE SCALE Statement DISCRETE SCALE The DISCRETE_SCALE statement specifies that the failure time variable variable in the TIME statement is expressed in discrete units with few say 20 distinct values This is in preparation for later on an analysis using a grouped data model Prentice and Gloeckler 1978 with the WEIBULL program although it is not a Weibull model The immediate consequence of the DISCRETE SCALE statement is the implicit definition and calculation of a time dependent covariate called time unit taking distinct values at 15 time 0 1 2 3 failure time 1 An elementary record is created for each value of this time_unit variable 8 6 IDREC Statement The statement can have two mutually exclusive alternative forms IDREC variable IDREC STORE PREVIOUS TIME e In the IDREC statement normally a variable is defined that is unique to each record This variable may be needed for sorting the elementary records when using a Weibull model and is therefore obligatory when using WEIBULL as analysis program The variable must be of type I4 Note The same variable may also appear in the class statement e g animal identification for an animal model In this case the variable will appear twice in the recoded file2 that is a result of this program first unrecoded as the 3rd variable on the record and second recoded as one of the class variables e The
87. vel you actually want to put the constraint on To do so look for the name of the effect of interest Columns 4 when there is no interaction 4 and 5 when two effects were combined into an interaction or 4 5 and 6 when three effects were combined display the original code The last column defines the new code By use of the option check the program will check whether the constraints specified by the user make sense 11 19 CONVERGENCE_CRITERION Statement CONVERGENCE_CRITERION real number The real number in the CONVERGENCE_CRITERION statement provides the termi nation point for the optimisation routine used in the likelihood maximisation Its default value is parameter EPS_BFDEF set in parinclu usually 1 D 8 Alternative values may be invoked either by using this statement or by changing the value of EPS_BFDEF in parinclu Warning an unnecessary strict convergence criterion set by the user will lead to a warning message by the optimisation routine telling amongst others LINE SEARCH FAILED although the results are printed and usually valid 11 20 STORAGE Statement STORAGE option The data file file may either be read in and held in core option N CORE or while being read for the first time be written out on a temporary file in binary mode on disk option ON DISK for faster access in subsequent readings of the file in cases where the core memory available is not big enough to hold the whole data file The largest number of
88. vels gives the number of different levels found for class variables zero for continuous covariates and pos and give the position s of the covariate on the recoded file file in the FILES statement of the PREPARE parameter file When posi and are identical the covariate is time independent When pos and pos2 are different subsequent numbers the covariate is time dependent and the covariate values on pos and pos2 give the status of the covariate before and after the chDange of the hazard function of an individual due to a change in this or another time dependent covariate 10 6 DISCRETE SCALE Statement DISCRETE SCALE The DISCRETE_SCALE statement indicates that the recoded file is prepared to be used with the WEIBULL program to fit the grouped data model of Prentice and Gloeckler 1978 It is the direct result of the usage of the DISCRETE SCALE in the PREPARE parametesr file The consequence has been the definition of a specific time dependent covariate called time_unit which changes values at each time point 1 2 between the origin and the observed failure time or censoring time corresponding number of elementary records were therefore created The statement also avoids the need to repeat it in the user supplied parameter file 2 If one wants to run a regular Weibull model one can simply comment out this statement in parameter file 1 However if a lot of elementary records were created because of t
89. ways by the user This implies that on systems where file names are case sensitive mainly UNIX the file name of file has to be upper case to be recognised by the program Also the names of the files produced by PREPARE file2 to file will be upper case even if stated lower case in the FILES statement If UPPER CASE is set to FALSE in file parinclu and variable names will not be recognized if typed differently 11 3 TITLE Statement TITLE Title of analysis 31 The TITLE of analysis may fill the rest of the line up to position 80 after the statement name It will be written at the beginning of the output files file of the FILES statement above and file8 of the SURVIVAL statement below 11 4 STRATA Statement STRATA variable Stratification is an extension of the proportional hazards model With stratification the assumption of the proportionality of hazards is restricted to individuals within subgroups of the population where grouping is defined by the variable indicated in the STRATA statement Only one variable can be used as strata variable If a combination of variables is more sensible e g year season the individual variables may be combined in the COMBINE statement of PREPARE Note Do not forget that in models with strata data have to be sorted by strata as cending and within strata by the time variable descending and the censoring variable ascending See section on Files records and sorting 11
90. ystem shut down you only have to give the BACKUP filename as file5 they have exactly the same structure in the FILES statement and include the READ OLD SOLUTIONS statement in your parameter file 11 22 STORE BINARY Statement STORE BINARY This statement is only useful for very large applications applications running for hours and days The reading of binary files is much faster than the reading of files in free format If you have very large datasets which you may use for different types of repeated analyses it may therefore be useful to create and repeatedly read a binary version of the recoded data file This is easily done by specifying FORMOUT or FORMOUT BLOCKED_UNFORMATTED in the data preparation parameter file for PREPARE when the initial data file is recoded However this strategy is often in the case of FORMOUT UNFORMATTED or al ways in the case of FORMOUT BLOCKED_UNFORMATTED incompatible with the sorting of the recoded file for example when stratification or integration of a random effect 1s envisioned The statement STORE BINARY is a way to avoid this difficulty with STORE_BINARY in your parameter file the program only reads file in free for mat and writes it under the same name bin as a binary blocked unformatted file Then the program will stop In the new file records are blocked together by 43 groups of size NRECBLOC a parameter that is set and may be changed in parin
Download Pdf Manuals
Related Search
Related Contents
使用上の注意はカタログー取扱説明書をお読みください。 Benutzerhandbuch - HP Support Center Montage- und Bedienungsanleitung capítulo 2 – instrucciones de uso ASUS M 51AC-IT011S MKZ Primer MKZ Primer MKZ Primer Copyright © All rights reserved.
Failed to retrieve file