Home

SAS/STAT 922 User's Guide: The GENMOD Procedure (Book Excerpt)

1. Example 37 4 Ordinal Model for Multinomial Data 2585 Output 37 4 2 displays estimates of the intercept terms and covariates and associated statistics The intercept terms correspond to the four cumulative logits defined on the taste categories in the order shown in Output 37 4 1 That is Intercept1 is the intercept for the first cumulative logit log P a Intercept2 is the intercept for the second cumulative logit loge J and so forth Output 37 4 2 Parameter Estimates Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Confidence Wald Parameter DF Estimate Error Limits Chi Square Intercept1 1 1 8578 0 1219 2 0967 1 6189 232 35 Intercept2 1 0 8646 0 1056 1 0716 0 6576 67 02 Intercept3 1 0 9231 0 1060 0 7154 1 1308 75 87 Intercept4 1 1 8078 0 1191 1 5743 2 0413 230 32 brand icel 1 0 3847 0 1370 0 1162 0 6532 7 89 brand ice2 1 0 6457 0 1397 0 9196 0 3719 21 36 brand ice3 0 0 0000 0 0000 0 0000 0 0000 Scale 0 1 0000 0 0000 1 0000 1 0000 Analysis Of Maximum Likelihood Parameter Estimates Parameter Pr gt ChiSq Intercept1 lt 0001 Intercept2 lt 0001 Intercept3 lt 0001 Intercept4 lt 0001 brand icel 0 0050 brand ice2 lt 0001 brand ice3 Scale NOTE The scale parameter was held fixed The Type 1 test displayed in Output 37 4 3 indicates that Brand is highly significant that is there are significant differences among the brands The log odds ratios and odds ratios in the
2. Correlation Matrix Model Based and Correlation Matrix Empirical tables GEE Working Correlation Matrix If you specify the REPEATED statement and the CORRW option PROC GENMOD displays the Working Correlation Matrix table GEE Fit Criteria If you specify the REPEATED statement PROC GENMOD displays the quasi likelihood information criteria for model fit QIC and QIC in the GEE Fit Criteria table Analysis of GEE Parameter Estimates If you specify the REPEATED statement PROC GENMOD uses empirical standard error estimates to compute and display the Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates table that contains the parameter names as follows e the variable name for continuous regression variables e the variable name and level for classification variables and interactions involving classification variables e Scale for the scale variable related to the dispersion parameter In addition the parameter estimate the empirical standard error a 95 confidence interval and the Z score and p value are displayed for each parameter If you specify the MODELSE option in the REPEATED statement the Analysis Of GEE Parameter Estimates Model Based Standard Error Estimates table based on model based standard errors is also produced 2564 Chapter 37 The GENMOD Procedure GEE Observation Statistics If you specify the OBSTATS option in the REPEATED statement PROC GENMO
3. sos sas otg oe ee ER ee Re ee ere ee oa eS 2528 Multinomial Models ssia Sed pe BE SS Bee BE Sey Se 2529 Zero Inflated Models cs sos t neta Be ee A Ee 2530 Generalized Estimating Equations 0 0 2532 Assessment of Models Based on Aggregates of Residuals 2541 Case Deletion Diagnostic Statistics 2 2 2 2 00 2544 Bayesian Analysis 4 644 2 beaded ueee bak phar EEE ees 2548 Exact Logistic and Poisson Regression 000 2553 Missing Valesit dit kk a UAE Babs BE OS Ee eS 2556 Displayed Output for Classical Analysis 0 2556 Displayed Output for Bayesian Analysis 2564 Displayed Output for Exact Analysis 00 2567 ODS Table Names 5 5 4 6 a eA As BS SERS Re ee Ee 2568 ODS Graphics 4 24540 24488 65 5h bee GOR OER bw wee eee 2572 Examples GENMOD Protedur 2 5 46 56 54 ee dR Roe Oe Ee 2574 Example 37 1 Logistic Resressiony 2 6 sos vag E2988 4 eee Se wees S 2574 Example 37 2 Normal Regression Log Link 2576 Example 37 3 Gamma Distribution Applied to Life Data 2579 Example 37 4 Ordinal Model for Multinomial Data 2582 Example 37 5 GEE for Binary Data with Logit Link Function 2586 Example 37 6 Log Odds Ratios and the ALR Algorithm 2589 Example 37 7 Log Linear Model for Count Data 2592 Example 37 8 Model Assessment of Multiple Regression Using Aggrega
4. 1217 2260 4234 0000 OrPrRPRRO Wald 95 Confidence Analysis Of Maximum Likelihood Parameter Parameter Intercept Heat Heat Heat Soak Soak Soak Soak Scale Estimates NOTE The scale parameter was held fixed Limits 3 8548 0 7147 518094 518039 0 4 9756 1 0458 3 2253 0 2106 2 4906 1 9998 1 6412 2 7557 1 9951 2 8109 2 9199 2 6597 1 0000 1 0000 Pr gt ChiSq 0 1780 0 9999 0 0027 0 0255 0 8304 0 6193 0 7394 0 9272 Wald Chi Square oooo 8k OOF 81 00 02 99 05 25 11 01 Example 37 11 Exact Poisson Regression 2625 Following the output from the asymptotic analysis the exact conditional Poisson regression results are displayed as shown in Output 37 11 3 Output 37 11 3 Exact Tests The GENMOD Procedure Exact Conditional Analysis Conditional Exact Tests p Value Effect Test Statistic Exact Mid Joint Score 18 3665 0 0137 0 0137 Probability 1 294E 6 0 0471 0 0471 Heat Score 15 8259 0 0023 0 0022 Probability 0 000175 0 0063 0 0062 Soak Score 1 4612 0 8683 0 8646 Probability 0 00735 0 8176 0 8139 The Joint test in the Conditional Exact Tests table in Output 37 11 3 is produced by specifying the JOINT option in the EXACT statement The p values for this test indicate that the parameters for Heat and Soak are jointly significant as explanatory effects in the model If the Heat variable is the only explanatory variable i
5. 2552 Chapter 37 The GENMOD Procedure e Deviance information criterion DIC DIC LL pp where LL 7 Lj 1 LLG 6 DD PROC GENMOD uses the full log likelihoods defined in the section Log Likelihood Functions on page 2514 with all terms included for computing the DIC Posterior Distribution Denote the observed data by D The posterior distribution is m B D x L p D B p B where L p D B is the likelihood function with regression coefficients as parameters Starting Values of the Markov Chains When the BAYES statement is specified PROC GENMOD generates one Markov chain containing the approximate posterior samples of the model parameters Additional chains are produced when the Gelman Rubin diagnostics are requested Starting values or initial values can be specified in the INITIAL data set in the BAYES statement If INITIAL option is not specified PROC GENMOD picks its own initial values for the chains Denote x as the integral value of x Denote X as the estimated standard error of the estimator X Regression Coefficients For the first chain that the summary statistics and regression diagnostics are based on the default initial values are estimates of the mode of the posterior distribution If the INITIALMLE option is specified the initial values are the maximum likelihood estimates that is 0 A BY Bi Initial values for the rth chain r gt 2 are given by p 6 2 BIO w
6. Data Set WORK LIFDAT Distribution Gamma Link Function Log Dependent Variable lifetime Class Level Information Class Levels Values mfg 2 AB Criteria For Assessing Goodness Of Fit Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Wald DF Estimate Error Confidence Limits Chi Square Pr gt ChiSq 1 6 1302 0 1043 5 9257 6 3347 3451 61 lt 0001 A 1 0 0199 0 1559 0 2857 0 3255 0 02 0 8985 B 0 0 0000 0 0000 0 0000 0 0000 1 0 8275 0 0714 0 6987 0 9800 scale parameter was estimated by maximum likelihood LR Statistics For Type 3 Analysis Chi Source DF Square Pr gt ChiSq mfg 1 0 02 0 8985 The p value of 0 8985 for the chi square statistic in the Type 3 table indicates that there is no significant difference in the part life between the two manufacturers 2582 Chapter 37 The GENMOD Procedure Using the following statements you can refit the model without using the manufacturer as an effect The LRCI option in the MODEL statement is specified to compute profile likelihood confidence intervals for the mean life and scale parameters proc genmod data lifdat model lifetime dist gamma link log lrci run Output 37 3 2 displays the results of fitting the model with the mfg effect omitted Output 37 3 2 Refitting of the Gamma Model Omitting the mfg Effect The GENMOD Procedure Analysis Of Maximum Likelihood Parameter Estimates Likelihood Ratio Standard 95 Confidence Wald Par
7. Geweke Diagnostics table displays the Geweke statistic and its p value for each parameter Displayed Output for Exact Analysis 2567 Raftery and Lewis Diagnostics The Raftery Diagnostics tables is produced if you include the RAFTERY suboption in the DIAG NOSTIC option in the BAYES statement This table displays the Raftery and Lewis diagnostics for each variable Heidelberger and Welch Diagnostics The Heidelberger and Welch Diagnostics table is displayed if you include the HEIDELBERGER suboption in the DIAGNOSTIC option in the BAYES statement This table shows the results of a stationary test and a halfwidth test for each parameter Effective Sample Size The Effective Sample Size table displays for each parameter the effective sample size the correlation time and the efficiency Monte Carlo Standard Errors The Monte Carlo Standard Errors table displays for each parameter the Monte Carlo standard error the posterior sample standard deviation and the ratio of the two Displayed Output for Exact Analysis If an exact analysis is requested with an EXACT statement the displayed output includes the following tables If the METHOD NETWORKMC option is specified the test and estimate tables are renamed Monte Carlo tables and a Monte Carlo standard error column y p 1 p n is displayed Sufficient Statistics Displays if you request an OUTDIST data set in an EXACT statement The tab
8. SAS Press Need to learn the basics Struggling with a programming problem You ll find the expert answers that you need in example rich books from SAS Press Written by experienced SAS professionals from around the world SAS Press books deliver real world insights on a broad range of topics for all skill levels support sas com saspress SAS Documentation To successfully implement applications using SAS software companies in every industry and on every continent all turn to the one source for accurate timely and reliable information SAS documentation We currently produce the following types of reference documentation to improve your work experience e Online help that is built into the software e Tutorials that are integrated into the product e Reference documentation delivered in HTML and PDF free on the Web e Hard copy books support sas com publishing SAS Publishing News Subscribe to SAS Publishing News to receive up to date information about all new SAS titles author podcasts and new Web site features via e mail Complete instructions on how to subscribe as well as access to past issues are available at our Web site support sas com spn SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks of SAS Institute Inc in the USA and other countries indicates USA registration Other brand and product names are trademarks of their respective companies 2009 SAS I
9. lt effects gt lt options gt The MODEL statement specifies the response or dependent variable and the effects or explanatory variables If you omit the explanatory variables the procedure fits an intercept only model An intercept term is included in the model by default The intercept can be removed with the NOINT option You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events trials The first form is applicable to all responses The second form is applicable only to summarized binomial response data When each observation in the input data set contains the number of events for example successes and the number of trials from a set of binomial trials use the events trials syntax In the events trials model syntax you specify two variables that contain the event and trial counts These two variables are separated by a slash The values of both events and trials events must be nonnegative and the value of the trials variable must be greater than 0 for an observation to be valid The variable events or trials can take noninteger values When each observation in the input data set contains a single trial from a binomial or multinomial experiment use the first form of the preceding MODEL statements The response variable can be numeric or character The ordering of response levels is critical in these models You can use the RORDER2 option in the PROC GENMOD stat
10. p b are approximately independent F This F statistic is computed for the Type 1 analysis Type 3 analysis and hypothesis tests specified in CONTRAST statements when the dispersion parameter is estimated by either the deviance or Pearson s chi square divided by degrees of freedom as specified by the DSCALE or PSCALE option in the MODEL statement In the case of a Type 1 analysis model 0 is the higher order model obtained by including one additional effect in model 1 For a Type 3 analysis and hypothesis tests model 0 is the full specified model and model 1 is the submodel obtained from constraining the Type II contrast or the user specified contrast to be 0 Lagrange Multiplier Statistics When you select the NOINT or NOSCALE option restrictions are placed on the intercept or scale parameters Lagrange multiplier or score statistics are computed in these cases These statistics assess the validity of the restrictions and they are computed as where s is the component of the score vector evaluated at the restricted maximum corresponding to the restricted parameter and V Ij I 215 I5 The matrix I is the information matrix refers to the restricted parameter and 2 refers to the rest of the parameters Under regularity conditions this statistic has an asymptotic chi square distribution with one degree of freedom and p values are computed based on this limiting distribution If you set k 0 in a negative binomial mo
11. x S1 where Six Y KOX x i l n So x Y Ki x X x i l Then the loess estimate of Y at x is defined by A wi x Y 5 m i 1 Suwe He Loess smoothed residuals for checking the functional form of the jth covariate are defined by replacing Y with e and X with x To implement the graphical and numerical assessment methods I xij lt x is replaced with aOR in the formulas for W x and Wj x You can perform the model checking described earlier for marginal models for dependent responses fit by generalized estimating equations GEEs Let y denote the kth measurement on the ith cluster i 1 K k 1 nj and let x denote the corresponding vector of covariates The marginal mean of the response j4 E y k is assumed to depend on the covariate vector by S Mik X B where g is the link function Define the vector of residuals for the ith cluster as ej Giese Gin W Bit ine Ban You use the following extension of W x defined earlier to check the functional form of the jth covariate Wj x z gt 3 I xikj lt X eix i 1 k 1 where x x is the jth component of x x The null distribution of W x can be approximated by the conditional distribution of nj Wj x aE Do Tg ere VC Blo IVs e Zi i 1 2544 Chapter 37 The GENMOD Procedure where D and V are defined as in the section Generalized Estimating Equations on page 2532 with the unknown
12. 0 3016 0 0697 0 4383 0 1649 18 70 lt 0001 Scale 0 1 0000 0 0000 1 0000 1 0000 NOTE The scale parameter was held fixed The GEE solution is requested with the REPEATED statement in the GENMOD procedure The SUBJECT ID option indicates that the variable id describes the observations for a single cluster and the CORRW option displays the working correlation matrix The TYPE option specifies the correlation structure the value EXCH indicates the exchangeable structure Example 37 7 Log Linear Model for Count Data 2595 The following statements perform the analysis proc genmod data new class id model y x1 trt d poisson offset ltime repeated subject id corrw covb type exch run These statements first fit a generalized linear model GLM to these data by maximum likelihood The estimates are not shown in the output but are used as initial values for the GEE solution Information about the GEE model is displayed in Output 37 7 3 The results of fitting the model are displayed in Output 37 7 4 Compare these with the model of independence displayed in Out put 37 7 2 The parameter estimates are nearly identical but the standard errors for the independence case are underestimated The coefficient of the interaction term 63 is highly significant under the independence model and marginally significant with the exchangeable correlations model Output 37 7 3 GEE Model Information The GENMOD Procedure GEE Mod
13. 2 104 3 2 3 104 3 3 4 104 3 4 5 106 3 1 6 106 5 2 7 106 3 3 8 106 3 4 9 107 2 1 10 107 4 2 11 107 0 3 12 107 5 4 13 114 4 1 14 114 4 2 trt bline age 0 11 31 0 11 31 0 11 31 0 11 31 0 11 30 0 11 30 0 11 30 0 11 30 0 6 25 0 6 25 0 6 25 0 6 25 0 8 36 0 8 36 Some further data manipulations create an observation for the baseline measures a log time interval variable for use as an offset and an indicator variable for whether the observation is for a baseline measurement or a visit measurement Patient 207 is deleted as an outlier as in the Diggle Liang and Zeger 1994 analysis 2594 Chapter 37 The GENMOD Procedure The following statements prepare the data for analysis with PROC GENMOD data new set thall output if visit 1 then do y bline visit 0 output end run data new set new if id ne 207 if visit 0 then do x1 0 1time log 8 end else do x1l 1 1ltime log 2 end run For comparison with the GEE results an ordinary Poisson regression is first fit The results are shown in Output 37 7 2 Output 37 7 2 Maximum Likelihood Estimates The GENMOD Procedure Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Wald Parameter DF Estimate Error Confidence Limits Chi Square Pr gt ChiSq Intercept 1 1 3476 0 0341 1 2809 1 4144 1565 44 lt 0001 x1 1 0 1108 0 0469 0 0189 0 2027 5 58 0 0181 trt 1 0 1080 0 0486 0 2034 0 0127 4 93 0 0264 xle trt 1
14. 2498 TYPE option 2498 TYPE3 option 2498 WALD option 2498 WALDCI option 2498 XCONVE option 2487 XVARS option 2499 GENMOD procedure OUTPUT statement 2499 keyword option 2499 OUT option 2499 GENMOD procedure PROC GENMOD statement 2457 DATA option 2457 NAMELEN2 option 2457 ORDER option 2457 PLOTS option 2458 RORDER option 2461 GENMOD procedure REPEATED statement 2453 2503 ALPHAINIT option 2503 CONVERGE option 2503 CORR option 2506 CORRB option 2504 CORRW option 2504 COVB option 2504 ECORRB option 2504 ECOVB option 2504 INITIAL option 2504 INTERCEPT option 2504 LOGOR2 option 2504 MAXITER2 option 2505 MCORRB option 2505 MCOVB option 2505 MODELSE option 2505 RUPDATE option 2505 SORTED option 2505 SUBCLUSTER2 option 2505 SUBJECT option 2503 TYPE option 2506 V6CORR option 2506 WITHIN option 2506 WITHINSUBJECT option 2506 YPAIR option 2506 ZDATA option 2507 ZROW option 2507 GENMOD procedure SCWGT statement 2509 GENMOD procedure SLICE statement 2507 GENMOD procedure STORE statement 2507 GENMOD procedure STRATA statement 2507 CHECKDEPENDENCY option 2508 INFO option 2509 MISSING option 2508 NOSUMMARY option 2509 GENMOD procedure VARIANCE statement 2509 GENMOD procedure WEIGHT statement 2509 GENMOD procedure ZEROMODEL statement 2510 LINK option 2510 GENMODprocedure PROC GENMODstatement EXACTONLY option 2457 INFLUENCE option MODE
15. 2568 overdispersion GENMOD procedure 2521 parameter estimates GENMOD procedure 2565 Pearson residuals GENMOD procedure 2528 2529 Pearson s chi square GENMOD procedure 2491 2517 2519 Poisson distribution GENMOD procedure 2512 Poisson regression GENMOD procedure 2432 2435 polynomial effects GENMOD procedure 2522 probability distribution built in GENMOD 2433 2493 exponential family GENMOD 2510 user defined GENMOD 2479 profile likelihood confidence intervals GENMOD procedure 2525 programming statements GENMOD procedure 2502 quasi likelihood functions GENMOD 2539 GENMOD procedure 2521 quasi likelihood information criterion GENMOD 2539 raw residuals GENMOD procedure 2528 regressor effects GENMOD procedure 2522 repeated measures GEE GENMOD 2429 2532 residuals GENMOD procedure 2497 2528 2529 response variable sort order of levels GENMOD 2461 scale parameter GENMOD procedure 2514 score statistics GENMOD procedure 2527 singularity criterion contrast matrix GENMOD 2479 information matrix GENMOD 2498 standard error GENMOD procedure 2565 stratified exact logistic regression GENMOD procedure 2507 stratified exact Poisson regression GENMOD procedure 2507 subpopulation GENMOD procedure 2491 suppressing output GENMOD procedure 2462 Type 1 analysis GENMOD procedure 2434 2523 Type 3 analysis GENMOD procedure 2434 2524 variance function GE
16. DCLS CLUSTERCOOKD CLUSTERCOOKSD A measure of the standardized influence of the subset m of observations on the overall fit is Bim X WX B Bim p For deletion of cluster i this is approximated by DCLS E Wz 0050 W3 01 Ei pd 2548 Chapter 37 The GENMOD Procedure DOBS COOKD COOKSD The measure of overall fit in the section DCLS CLUSTERCOOKD CLUSTERCOOKSD on page 2547 for the deletion of the tth observation in the ith cluster is approximated by Oit Po Woit Qi where its O it and Weit are defined in the section DFBETAO on page 2547 In the case of the independence working correlation this is equal to the measure for ordinary generalized linear models defined in the section DOBS COOKD COOKSD on page 2545 DOB Sit MCLS CLUSTERDFIT A studentized distance measure of the type defined in the section DCLS CLUSTER COOKD CLUSTERCOOKSD on page 2547 of the influence of the ith cluster is given by MCLS E W5 Q Hi Ei po Bayesian Analysis In generalized linear models the response has a probability distribution from a family of distributions of the exponential form That is the probability density of the response Y for continuous response variables or the probability function for discrete responses can be expressed as y8 b a p for some functions a b and c that determine the specific distribution The canonical par
17. ESTIMATE ODDS ESTIMATE BOTH ESTIMATE ESTIMATE PARM ESTIMATE BOTH Default Default Default INFO OUTDIST 2572 Chapter 37 The GENMOD Procedure ODS Graphics To request graphics with PROC GENMOD you must first enable ODS Graphics by specifying the ODS GRAPHICS ON statement See Chapter 21 Statistical Graphics Using ODS for more information Some graphs are produced by default other graphs are produced by using statements and options You can reference every graph produced through ODS Graphics with a name The names of the graphs that PROC GENMOD generates are listed in Table 37 11 along with the required statements and options ODS Graph Names PROC GENMOD assigns a name to each graph it creates using ODS You can use these names to reference the graphs when using ODS The names are listed in Table 37 11 To request these graphs you must specify the ODS GRAPHICS ON statement in addition to the options indicated in Table 37 11 Table 37 11 ODS Graph Name Description ODS Graphics Produced by PROC GENMOD Statement Option ADPanel Autocorrelation function BAYES PLOTS AUTOCORR DENSITY and density panel AutocorrPanel Autocorrelation function BAYES PLOTS AUTOCORR panel AutocorrPlot Autocorrelation function BAYES PLOTS UNPACK AUTOCORR plot ClusterCooksDPlot Cluster Cook s D by clus PROC PLOTS ter number ClusterDFFITPlot Cluster DFFIT by cluster PROC PLOTS number ClusterLeveragePlot Cluster
18. Subject Effect id center 111 levels Number of Clusters 111 Correlation Matrix Dimension 4 Maximum Cluster Size 4 Minimum Cluster Size 4 Output 37 5 3 Results of Model Fitting Working Correlation Matrix Coll Co1l2 Col13 Col4 Rowl 1 0000 0 3351 0 2140 0 2953 Row2 0 3351 1 0000 0 4429 0 3581 Row3 0 2140 0 4429 1 0000 0 3964 Row4 0 2953 0 3581 0 3964 1 0000 Example 37 6 Log Odds Ratios and the ALR Algorithm 2589 Output 37 5 3 continued Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95 Confidence Parameter Estimate Error Limits Z Pr gt Z Intercept 0 8882 0 4568 1 7835 0 0071 1 94 0 0519 treatment A 1 2442 0 3455 0 5669 1 9214 3 60 0 0003 center 2 0 6558 0 3512 0 0326 1 3442 1 87 0 0619 sex F 0 1128 0 4408 0 7512 0 9768 0 26 0 7981 age 0 0175 0 0129 0 0427 0 0077 1 36 0 1728 baseline 1 1 8981 0 3441 1 2237 2 5725 5 52 lt 0001 Output 37 5 4 Model Fit Criteria GEE Fit Criteria QIC 512 3416 QICu 499 6081 The nonsignificance of age and sex make them candidates for omission from the model Example 37 6 Log Odds Ratios and the ALR Algorithm Since the respiratory data in Example 37 5 are binary you can use the ALR algorithm to model the log odds ratios instead of using working correlations to model associations In this example a fully parameterized cluster model for the log odds ratio is fit That is there is a log o
19. options gt lt plot request lt options gt gt gt specifies plots to be created using ODS Graphics Many of the observational statistics in the output data set can be plotted using this option You are not required to create an output data set in order to produce a plot When you specify only one plot request you can omit the parentheses around the plot request Here are some examples PLOTS ALL PLOTS PREDICTED PLOTS PREDICTED RESCHT PLOTS UNPACK DFBETA You must enable ODS Graphics before requesting plots for example like this ods graphics on proc genmod plots all model y x run ods graphics off Any specified global plot options apply to all plots that are specified with plot requests The following global plot options are available CLUSTERLABEL displays formatted levels of the SUBJECT effect instead of plot symbols This option applies only to diagnostic statistics for models fit by GEEs that are plotted against cluster number and provides a way to identify cluster level names with corresponding ordered cluster numbers UNPACK displays multiple plots individually The default is to display related multiple plots in a panel See the section OUTPUT Statement on page 2499 for definitions of the statistics specified with the plot requests The plot requests include the following PROC GENMOD Statement 2459 ALL produces all available plots COOKSD DOBS plots the Cook s distance
20. statement it is also related to the exponential family dispersion parameter in the same way Link Function For distributions other than the zero inflated Poisson or zero inflated negative binomial the mean ju of the response in the ith observation is related to a linear predictor through a monotonic differentiable link function g g Mi xB Here x is a fixed known vector of explanatory variables and is a vector of unknown parameters There are two link functions and linear predictors associated with zero inflated distributions one for the zero inflation probability w and another for the mean parameter A See the section Zero Inflated Models on page 2530 for more details about zero inflated distributions Log Likelihood Functions Log likelihood functions for the distributions that are available in the procedure are parameterized in terms of the means u and the dispersion parameter Zero inflated log likelihoods are parameterized in terms two parameters A and The parameter w is the zero inflation probability and is a function of the distribution mean The relationship between the mean of the zero inflated Poisson and zero inflated negative binomial distributions and the parameter A is defined in the section Response Probability Distributions on page 2510 The term y represents the response for the ith observation and w represents the known dispersion weight The log likelihood functions are of the form L
21. the value of the SEED_ variable is used as the seed of the random number generator for the corresponding chain INITIALMLE specifies that maximum likelihood estimates of the model parameters be used as initial values of the Markov chain If this option is not specified estimates of the mode of the posterior distribution obtained by optimization are used as initial values METROPOLIS YES METROPOLIS NO specifies the use of a Metropolis step to generate Gibbs samples for posterior distributions that are not log concave The default value is METROPOLIS YES NBl number specifies the number of burn in iterations before the chains are saved The default is 2000 NMC number specifies the number of iterations after the burn in The default is 10000 BAYES Statement 2469 OUTPOST SAS data set OUT SAS data set names the SAS data set that contains the posterior samples See the section OUTPOST Output Data Set on page 2553 for more information Alternatively you can create the output data set by specifying an ODS OUTPUT statement as follows ODS output posteriorsample SAS data set PRECISIONPRIOR GAMMA lt options gt IMPROPER PPRIOR GAMMA lt options gt IMPROPER specifies that Gibbs sampling be performed on the generalized linear model precision parameter and the prior distribution for the precision parameter if there is a precision parameter in the model For models that do not have a precision parameter the Poisso
22. 0834 0 1538 0 2086 0 2653 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2611 Output 37 10 6 Interval Statistics Posterior Intervals Parameter Alpha Equal Tail Interval HPD Interval Intercept 0 050 2 0169 2 9056 2 0069 2 8923 X1 0 050 0 0210 0 0106 0 0212 0 0103 X2 0 050 0 0181 0 00878 0 0181 0 00885 x3 0 050 0 00757 0 00109 0 00764 0 000989 X4 0 050 0 4250 0 1132 0 4232 0 1119 x5 0 050 0 1552 0 4821 0 1647 0 4905 x6 0 050 0 0477 0 3749 0 0490 0 3758 Output 37 10 7 Posterior Sample Correlation Matrix Posterior Correlation Matrix Parameter Intercept X1 X2 X3 X4 X5 X6 Intercept 1 000 0 705 0 430 0 046 0 225 0 180 0 415 X1 0 705 1 000 0 211 0 013 0 068 0 067 0 128 X2 0 430 0 211 1 000 0 006 0 070 0 057 0 118 x3 0 046 0 013 0 006 1 000 0 016 0 055 0 089 X4 0 225 0 068 0 070 0 016 1 000 0 011 0 089 x5 0 180 0 067 0 057 0 055 0 011 1 000 0 042 X6 0 415 0 128 0 118 0 089 0 089 0 042 1 000 Posterior sample autocorrelations for each model parameter are shown in Output 37 10 8 The autocorrelation after 10 lags is negligible for all parameters indicating good mixing in the Markov chain Output 37 10 8 Posterior Sample Autocorrelations The GENMOD Procedure Bayesian Analysis Posterior Autocorrelations Parameter Lag 1 Lag 5 Lag 10 Lag 50 Intercept 0 0551 0 0134 0 0101 0 0012 X1 0 0894 0 0054 0 0080 0 0019 X2 0 1197 0 0170 0 0061 0 000
23. 106 Placebo 11 3 5 3 3 107 Placebo 6 2 4 0 5 101 Progabide 76 11 14 9 8 102 Progabide 38 8 7 9 4 103 Progabide 19 0 4 3 0 Model the data as a log linear model with V u u the Poisson variance function and log E i Bo xi1B1 xi2B2 xi1xi2 3 log ti where Yi number of epileptic seizures in interval j Example 37 7 Log Linear Model for Count Data 2593 ti length of interval j an 1 weeks 8 16 treatment 1 0 weeks 0 8 baseline a 1 progabide group 2 0 placebo group The correlations between the counts are modeled as rj a i j exchangeable correlations For comparison the correlations are also modeled as independent identity correlation matrix In this model the regression parameters have the interpretation in terms of the log seizure rate displayed in Table 37 13 Table 37 13 Interpretation of Regression Parameters log E Yi tij Treatment Visit Placebo Baseline 1 4 Progabide Baseline 1 4 Bo Bo Bi Bo Bo Bo Bi B2 B3 The difference between the log seizure rates in the pretreatment baseline period and the treatment periods is 64 for the placebo group and 6 63 for the Progabide group A value of 63 lt 0 indicates a reduction in the seizure rate Output 37 7 1 lists the first 14 observations of the data which are arranged as one visit per observation Output 37 7 1 Partial Listing of the Seizure Data Obs id y visit 1 104 5 1
24. 2517 gradient GENMOD procedure 2516 Hessian matrix GENMOD procedure 2516 information matrix expected GENMOD 2517 observed GENMOD 2517 initial values GENMOD procedure 2493 2504 intercept GENMOD procedure 2433 2436 2495 inverse Gaussian distribution GENMOD procedure 2512 Lagrange multiplier statistics GENMOD 2527 life data GENMOD procedure 2579 likelihood residuals GENMOD procedure 2529 linear model GENMOD procedure 2430 2431 linear predictor GENMOD procedure 2429 2430 2436 2523 2560 link function built in GENMOD 2432 2494 GENMOD procedure 2429 2431 2514 user defined GENMOD 2487 log likelihood functions GENMOD 2515 log linear models CATMOD procedure 2435 GENMOD procedure 2435 logistic regression GENMOD procedure 2432 2574 main effects GENMOD procedure 2522 maximum likelihood estimation GENMOD 2516 model assessment 2597 2604 model checking 2597 2604 multinomial distribution GENMOD 2513 models GENMOD 2529 negative binomial distribution GENMOD procedure 2512 nested effects GENMOD procedure 2522 Newton Raphson algorithm GENMOD procedure 2516 normal distribution GENMOD procedure 2511 offset GENMOD procedure 2497 2560 offset variable GENMOD procedure 2436 ordinal model GENMOD procedure 2582 output data sets GENMOD procedure 2553 2554 output ODS Graphics table names GENMOD procedure 2572 output table names GENMOD procedure
25. Data Generalized Estimating Equations and Connections with Weighted Least Squares Biometrics 49 1033 1044 Myers R H Montgomery D C and Vining G 2002 Generalized Linear Models with Applica tions in Engineering and the Sciences New York John Wiley amp Sons Nelder J A and Wedderburn R W M 1972 Generalized Linear Models Journal of the Royal Statistical Society Series A 135 370 384 Nelson W 1982 Applied Life Data Analysis New York John Wiley amp Sons Neter J Kutner M H Nachtsheim C J and Wasserman W 1996 Applied Linear Statistical Models Fourth Edition Chicago Irwin Pan W 2001 Akaike s Information Criterion in Generalized Estimating Equations Biometrics 57 120 125 Pregibon D 1981 Logistic Regression Diagnostics Annals of Statistics 9 705 724 Preisser J S and Qaqish B F 1996 Deletion Diagnostics for Generalised Estimating Equations Biometrika 83 551 562 Rao C R 1973 Linear Statistical Inference New York John Wiley amp Sons Rotnitzky A and Jewell N P 1990 Hypothesis Testing of Regression Parameters in Semipara metric Generalized Linear Models for Cluster Correlated Data Biometrika 77 485 497 Royall R M 1986 Model Robust Inference Using Maximum Likelihood Estimators Interna tional Statistical Review 54 221 226 Searle S R 1971 Linear Models New York J
26. ESTIMATE Statement Results table indicate the relative differences among the brands For example the odds ratio of 2 8 in the Exp LogOR12 row indicates that the odds of brand 1 being in lower taste categories is 2 8 times the odds of brand 2 being in lower taste categories Since in this ordering the lower categories represent the more favorable taste results this indicates that brand 1 scored significantly better than brand 2 This is also apparent from the data in this example 2586 Chapter 37 The GENMOD Procedure Output 37 4 3 Type 1 Tests and Odds Ratios LR Statistics For Type 1 Analysis Chi Source Deviance DF Square Pr gt ChiSq Intercepts 65 9576 brand 9 8654 2 56 09 lt 0001 Contrast Estimate Results Mean Mean L Beta Standard Label Estimate Confidence Limits Estimate Error Alpha LogOR12 0 7370 0 6805 0 7867 1 0305 0 1401 0 05 Exp LogOR12 2 8024 0 3926 0 05 LogOR13 0 5950 0 5290 0 6577 0 3847 0 1370 0 05 Exp LogOR13 1 4692 0 2013 0 05 LogOR23 0 3439 0 2850 0 4081 0 6457 0 1397 0 05 Exp LogOR23 0 5243 0 0733 0 05 Contrast Estimate Results L Beta Chi Label Confidence Limits Square Pr gt ChiSq LogOR12 0 7559 1 3050 54 11 lt 0001 Exp LogOR12 2 1295 3 6878 LogOR13 0 1162 0 6532 7 89 0 0050 Exp LogOR13 1 1233 1 9217 LogOR23 0 9196 0 3719 21 36 lt 0001 Exp LogOR23 0 3987 0 6894 Example 37 5 GEE for Binary Data with Logit Link Function Output 37 5 1 displays a partial listing of
27. In Bayesian analysis the model parameters are treated as random variables and inference about parameters is based on the posterior distribution of the parameters given the data The posterior distribution is obtained using Bayes theorem as the likelihood function of the data weighted with a prior distribution The prior distribution enables you to incorporate knowledge or experience of the likely range of values of the parameters of interest into the analysis If you have no prior knowledge of the parameter values you can use a noninformative prior distribution and the results of the Bayesian analysis will be very similar to a classical analysis based on maximum likelihood A closed form of the posterior distribution is often not feasible and a Markov chain Monte Carlo method by Gibbs sampling is used to simulate samples from the posterior distribution See Chapter 7 Introduction to Bayesian Analysis Procedures for an introduction to the basic concepts of Bayesian statistics Also see the section Bayesian Analysis Advantages and Disadvantages on page 147 for a discussion of the advantages and disadvantages of Bayesian analysis See Ibrahim Chen and Sinha 2001 for a detailed description of Bayesian analysis 2430 Chapter 37 The GENMOD Procedure In a Bayesian analysis a Gibbs chain of samples from the posterior distribution is generated for the model parameters Summary statistics mean standard deviation quartiles HPD and
28. Model Based Standard Error Estimates output table If a fixed scale parameter is specified with the NOSCALE option in the MODEL statement then the fixed value is used in estimating the model based covariance matrix and standard errors Fitting Algorithm The following is an algorithm for fitting the specified model by using GEEs Note that this is not in general a likelihood based method of estimation so that inferences based on likelihoods are not possible for GEE methods 1 Compute an initial estimate of 6 with an ordinary generalized linear model assuming indepen dence 2 Compute the working correlations R based on the standardized residuals the current B and the assumed structure of R 3 Compute an estimate of the covariance dy eel 1 1 V pA W R a W 7A 4 Update B X ami 1 Omi ae Opi Br 1 Br t bs T y Ad h T V Y mo i 1 i 1 5 Repeat steps 2 4 until convergence Missing Data See Diggle Liang and Zeger 1994 Chapter 11 for a discussion of missing values in longitudinal data Suppose that you intend to take measurements Yj1 Yin for the ith unit Missing values for which Y are missing whenever Y is missing for all j gt k are called dropouts Otherwise missing values that occur intermixed with nonmissing values are intermittent missing values The GENMOD procedure can estimate the working correlation from data containing both types of missing values by using the all available pa
29. NONE If you want some but not all of the diagnostics or if you want to change certain settings of these diagnostics specify a subset of the following keywords The default is DIAGNOSTICS AUTOCORR ESS GEWEKE AUTOCORR lt LAGS numeric list gt computes the autocorrelations of lags given by LAGS list for each parameter Elements in the list are truncated to integers and repeated values are removed If the LAGS option is not specified autocorrelations of lags 1 5 10 and 50 are computed for each variable See the section Autocorrelations on page 168 for details ESS computes Carlin s estimate of the effective sample size the correlation time and the efficiency of the chain for each parameter See the section Effective Sample Size on page 168 for details GELMAN lt gelman options gt computes the Gelman and Rubin convergence diagnostics You can specify one or more of the following gelman options NCHAIN N number specifies the number of parallel chains used to compute the diagnostic and must be 2 or larger The default is NCHAIN 3 If an INITIAL data set is used NCHAIN defaults to the number of rows in the INITIAL data set If any number other than this is specified with the NCHAIN option the NCHAIN value is ignored 2466 Chapter 37 The GENMOD Procedure ALPHAz value specifies the significance level for the upper bound The default is ALPHA 0 05 resulting in a 97 5 bound See the section Ge
30. ODS Table Name IterParms LastGradHess MCError ModellInfo NObs ParameterEstimates ParmInfo ParmPrior PostIntervals PosteriorSample PostSummaries Raftery Description Iteration history for parame ter estimates Last evaluation of the gra dient and Hessian for max imum likelihood estimation Monte Carlo standard errors Model information Number of observations Maximum likelihood esti mates of model parameters Parameter indices Prior distribution for scale and shape HPD and equal tail intervals of the posterior samples Posterior samples for ODS output data set only Summary statistics of the posterior samples Raftery and Lewis conver gence diagnostics Statement MODEL MODEL BAYES PROC MODEL MODEL BAYES BAYES BAYES BAYES BAYES ODS Table Names 2571 Option ITPRINT ITPRINT DIAG MCSE Default Default Default Default Default Default Default DIAG RAFTERY Table 37 10 ODS Tables Produced in PROC GENMOD for an Exact Analysis ODS Table Name ExactOddsRatio ExactParmEst ExactTests NStratalgnored StrataSummary StrataInfo SuffStats Description Exact odds ratios Parameter estimates Conditional exact tests Number of uninformative strata Number of strata with spe cific response frequencies Event and nonevent frequen cies for each stratum Sufficient statistics Statement EXACT EXACT EXACT STRATA STRATA STRATA EXACT Option
31. Output 37 1 1 Model Information The GENMOD Procedure Model Information Data Set WORK DRUG Distribution Binomial Link Function Logit Response Variable Events r Response Variable Trials n The five levels of the CLASS variable DRUG are displayed in Output 37 1 2 Output 37 1 2 CLASS Variable Levels Class Level Information Class Levels Values drug 5 ABCDE In the Criteria For Assessing Goodness Of Fit table displayed in Output 37 1 3 the value of the deviance divided by its degrees of freedom is less than 1 A p value is not computed for the deviance however a deviance that is approximately equal to its degrees of freedom is a possible indication of a good model fit Asymptotic distribution theory applies to binomial data as the number of binomial trials parameter n becomes large for each combination of explanatory variables McCullagh and Nelder 1989 caution against the use of the deviance alone to assess model fit The model fit for each observation should be assessed by examination of residuals The OBSTATS option in the MODEL statement produces a table of residuals and other useful statistics for each observation Output 37 1 3 Goodness of Fit Criteria Criteria For Assessing Goodness Of Fit Criterion DF Value Value DF Deviance 12 5 2751 0 4396 Scaled Deviance 12 5 2751 0 4396 Pearson Chi Square 12 4 5133 0 3761 Scaled Pearson X2 12 4 5133 0 3761 Log Likelihood 114 7732 Full Log Lik
32. Row2 0 5941 1 0000 0 5941 0 5941 0 5941 Row3 0 5941 0 5941 1 0000 0 5941 0 5941 Row4 0 5941 0 5941 0 5941 1 0000 0 5941 Row5 0 5941 0 5941 0 5941 0 5941 1 0000 If you specify the COVB option you produce both the model based naive and the empirical robust covariance matrices Output 37 7 6 contains these estimates Output 37 7 6 Covariance Matrices Covariance Matrix Model Based Prml Prm2 Prm3 Prm4 Prml1 0 01223 0 001520 0 01223 0 001520 Prm2 0 001520 0 01519 0 001520 0 01519 Prm3 0 01223 0 001520 0 02495 0 005427 Prm4 0 001520 0 01519 0 005427 0 03748 Example 37 8 Model Assessment of Multiple Regression Using Aggregates of Residuals 2597 Output 37 7 6 continued Covariance Matrix Empirical Prml Prm2 Prm3 Prm4 Prml 0 02476 0 001152 0 02476 0 001152 Prm2 0 001152 0 01348 0 001152 0 01348 Prm3 0 02476 0 001152 0 03751 0 002999 Prm4 0 001152 0 01348 0 002999 0 02931 The two covariance estimates are similar indicating an adequate correlation model Example 37 8 Model Assessment of Multiple Regression Using Aggregates of Residuals This example illustrates the use of cumulative residuals to assess the adequacy of a normal linear regression model Neter et al 1996 Section 8 2 describe a study of 54 patients undergoing a certain kind of liver operation in a surgical unit The data consist of the survival time and certain covariates After a model selection procedure
33. Structures Keyword Log Odds Ratio Regression Structure EXCH Exchangeable FULLCLUST Fully parameterized clusters LOGORVAR variable Indicator variable for specifying block effects NESTK k nested NESTI 1 nested ZFULL Fully specified z matrix specified in ZDATA data set ZREP Single cluster specification for replicated z matrix specified in ZDATA data set ZREP matrix Single cluster specification for replicated z matrix MAXITER number MAXIT number specifies the maximum number of iterations allowed in the iterative GEE estimation process The default number is 50 MCORRB displays the estimated regression parameter model based correlation matrix MCOVB displays the estimated regression parameter model based covariance matrix MODELSE displays an analysis of parameter estimates table that uses model based standard errors for inference By default an Analysis of Parameter Estimates table based on empirical standard errors is displayed PRINTMLE displays an analysis of maximum likelihood parameter estimates table The maximum likeli hood estimates are not displayed unless this option is specified RUPDATE number specifies the number of iterations between updates of the working correlation matrix For example RUPDATES S5 specifies that the working correlation is updated once for every five regression parameter updates The default value of number is 1 that is the working correlation is updated every time the regress
34. Y X1 X2 X3 scale Pearson assess var X1 resample 10000 seed 603708000 crpanel run Output 37 8 2 Regression Model for Linear X1 The GENMOD Procedure Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Wald Parameter DF Estimate Error Confidence Limits Chi Square Pr gt ChiSq Intercept 1 0 4836 0 0426 0 4001 0 5672 128 71 lt 0001 X1 1 0 0692 0 0041 0 0612 0 0772 288 17 lt 0001 X2 1 0 0093 0 0004 0 0085 0 0100 590 45 lt 0001 X3 1 0 0095 0 0003 0 0089 0 0101 966 07 lt 0001 Scale 0 0 0469 0 0000 0 0469 0 0469 NOTE The scale parameter was estimated by the square root of Pearson s Chi Square DOF See Lin Wei and Ying 2002 for details about model assessment that uses cumulative residual plots The RESAMPLE keyword specifies that a p value be computed based on a sample of 10 000 simulated residual paths A random number seed is specified by the SEED keyword for reproducibility If you do not specify the seed one is derived from the time of day The keyword CRPANEL specifies that the panel of four cumulative residual plots shown in Output 37 8 4 be created each with two simulated paths The single residual plot with 20 simulated paths in Output 37 8 3 is created by default These graphical displays are requested by specifying the ODS GRAPHICS statement and the ASSESS statement For general information about ODS Graphics see Chapter 21 Statistical Graphics Using ODS For specific i
35. a 1 5 matched study this table enables you to verify that every stratum in the analysis has exactly one event and five non events Strata that contain only events or only non events are reported in this table but such strata are uninformative and are not used in the analysis For an exact Poisson regression the Strata Summary table displays the number of strata that contain a specific number of observations which enables you to check whether every stratum in the analysis has the same number of observations The ASSESSMENT BAYES CONTRAST EFFECTPLOT ESTIMATE LSMEANS LSMESTIMATE OUTPUT SLICE and STORE statements are not available with a STRATA statement Exact analyses are not performed when you specify a WEIGHT statement a model other than LINK LOGIT with DIST BIN or LINK LOG with DIST POISSON or an offset variable The following option can be specified for a stratification variable by enclosing the option in paren theses after the variable name or it can be specified globally for all STRATA variables after a slash MISSING treats missing values A Z for numeric variables and blanks for character variables as valid STRATA variable values The following strata options are also available after the slash CHECKDEPENDENCY CHECK keyword specifies which variables are to be tested for dependency before the analysis is performed The available keywords are as follows NONE performs no dependence c
36. a G c c prior 2468 Chapter 37 The GENMOD Procedure ISCALE c when specified alone results in a G c c prior An inverse gamma prior IG a b with density f t Dot Ot DeP t is specified by DISPERSIONPRIOR IGAMMA which can be followed by one of the following inverse gamma options enclosed in parentheses The hyperparameters a and b are the shape and scale parameters of the inverse gamma distribution respectively See the section Inverse Gamma Prior on page 2550 for details The default is G 2 001 0 001 RELSHAPE lt c gt specifies independent 7 a t c distribution where b is the MLE of the dispersion parameter With this choice of hyperparameters the mean of the prior distribution is By default c 10 SHAPE a SCALE b when both specified results in a ZG a b prior SHAPE c when specified alone results in an ZG c c prior SCALE c when specified alone results in an ZG c c prior An improper prior with density f t proportional to t is specified with DISPERSION PRIOR IMPROPER INITIAL SAS data set specifies the SAS data set that contains the initial values of the Markov chains The INITIAL data set must contain all the variables of the model You can specify multiple rows as the initial values of the parallel chains for the Gelman Rubin statistics but posterior summaries diagnostics and plots are computed only for the first chain If the data set also contains the variable SEED_
37. a SAS data set of clinical trial data comparing two treatments for a respiratory disorder See Gee Model for Binary Data in the SAS STAT Sample Program Library for the complete data set These data are from Stokes Davis and Koch 2000 Patients in each of two centers are randomly assigned to groups receiving the active treatment or a placebo During treatment respiratory status represented by the variable outcome coded here as O poor 1 good is determined for each of four visits The variables center treatment sex and baseline baseline respiratory status are classification variables with two levels The variable age age at time of entry into the study is a continuous variable Explanatory variables in the model are Intercept x 1 treatment x 2 center xi 3 sex Xij4 age xij5 and baseline x j6 so that x xij1 Xjj2 Xijo is the vector of explanatory variables Indicator variables for the classification explanatory variables can be automatically generated by listing them in the CLASS statement in PROC GENMOD To be consistent with the analysis in Stokes Davis and Koch 2000 the four classification explanatory variables are coded as follows Example 37 5 GEE for Binary Data with Logit Link Function 2587 via options in the CLASS statement ae 0 placebo ee O center 1 ue 1 active J3 1 center 2 p 0 male o 00 4 l female 6 11 Suppose y j represents the respiratory status of patient i at the jth v
38. analyzed are the 16 selected cases in Lipsitz et al 1994 The binary response is the wheezing status of 16 children at ages 9 10 11 and 12 years The mean response is modeled as a logistic regression model by using the explanatory variables city of residence age and maternal smoking status at the particular age The binary responses for individual children are assumed to be equally correlated implying an exchangeable correlation structure The data set and SAS statements that fit the model by the GEE method are as follows data six input case cityS do i 1 to 4 input age smoke output end datalines portage kingston kingston portage kingston portage kingston portage portage kingston kingston portage kingston portage kingston portage oA It nAU FWD BE rR e O PRPPRRP Au BWDN run W O o o o o o o o NrRPRPOoOOOOFrFO rFPOoOOOOrRrRF PRPRPRPPRPRO rPoOoOoorO 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 wheeze NRPRPRPRrRPOON Oo oooo oroOorrF PrRPDdNHOOWOOO rFPOoOoOrROO O 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 FPRPORPRORNO CoO OOrF COR NRFPRrRPROO O ooororo 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 PNORPFRPRHNO ooo0oo0oo0oo0o000 e NN eE OORFBP orrBRORBRO 2454 Chapter 37 The GENMOD Procedure proc genmod data six class case city model wheeze city age smoke dist bin repeated sub
39. and E J Snell eds Statistical Theory and Modelling London Chapman amp Hall Diggle P J Liang K Y and Zeger S L 1994 Analysis of Longitudinal Data Oxford Clarendon Press Dobson A 1990 An Introduction to Generalized Linear Models London Chapman amp Hall Firth D 1991 Generalized Linear Models in D V Hinkley N Reid and E J Snell eds Statistical Theory and Modelling London Chapman amp Hall Fischl M A Richman D D and Hansen N 1990 The Safety and Efficacy of Zidovudine AZT in the Treatment of Subjects with Mildly Symptomatic Human Immunodeficiency Virus Type I HIV Infection Annals of Internal Medicine 112 727 737 Gamerman D 1997 Efficient Sampling from the Posterior Distribution in Generalized Linear Models Statistical Computing 7 57 68 References 2627 Gilks W 2003 Adaptive Metropolis Rejection Sampling ARMS software from MRC Biostatistics Unit Cambridge UK http www maths leeds ac uk wally gilks adaptive rejection web_page Welcome html Gilks W R Best N G and Tan K K C 1995 Adaptive Rejection Metropolis Sampling with Gibbs Sampling Applied Statistics 44 455 472 Gilks W R Richardson S and Spiegelhalter D J 1996 Markov Chain Monte Carlo in Practice London Chapman amp Hall Gilks W R and Wild P 1992 Adaptive Rejection Sampling for Gibbs Sampling Applied Statist
40. and 75th percentile points but you can use the global PERCENT option to request specific percentile points BY Statement 2473 INTERVAL produces equal tail credible intervals and HPD intervals The default is to produce the 95 equal tail credible intervals and 95 HPD intervals but you can use the global ALPHA option to request intervals of any probabilities THINNING number THIN number controls the thinning of the Markov chain Only one in every k samples is used when THINNING K and if NBI n9 and NMC n the number of samples kept is GORG where a represents the integer part of the number a The default is THINNING 1 BY Statement BY variables You can specify a BY statement with PROC GENMOD to obtain separate analyses on observations in groups that are defined by the BY variables When a BY statement appears the procedure expects the input data set to be sorted in order of the BY variables If you specify more than one BY statement only the last one specified is used If your input data set is not sorted in ascending order use one of the following alternatives e Sort the data by using the SORT procedure with a similar BY statement e Specify the NOTSORTED or DESCENDING option in the BY statement for the GENMOD procedure The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups according to values of the BY variables and that these groups are not necessar
41. chi square divided by the degrees of freedom Covariances standard errors and p values are computed for the estimated parameters based on the asymptotic normality of maximum likelihood estimators A number of popular link functions and probability distributions are available in the GENMOD procedure The built in link functions are as follows e identity g u u logit g u log u 1 u e probit g u 71 u where is the standard normal cumulative distribution function fu if 0 eE ES log u ifA 0 e log g u log y The GENMOD Procedure 2433 e complementary log log g u log log 1 u The available distributions and associated variance functions are as follows normal V u 1 e binomial proportion V jz u 1 u e Poisson V u u e gamma V u ju e inverse Gaussian V w p gt e negative binomial V w u ku e geometric V w u u e multinomial e zero inflated Poisson e zero inflated negative binomial The negative binomial and zero inflated negative binomial are distributions with an additional parameter k in the variance function PROC GENMOD estimates k by maximum likelihood or you can optionally set it to a constant value See McCullagh and Nelder 1989 Hilbe 1994 Hilbe 2007 Long 1997 Cameron and Trivedi 1998 or Lawless 1987 for discussions of the negative binomial distribution The multinomial distribution is sometimes used to
42. computed from the chi square distribution with the numerator s degrees of freedom Wald Statistics for Type 3 Analysis If you specify the TYPE3 and WALD model options a table is displayed that contains the name of the effect the degrees of freedom of the effect the Wald statistic for testing the significance of the effect and the p value computed from the chi square distribution Parameter Information If you specify the ITPRINT COVB CORRB WALDCI or LRCI option in the MODEL statement or if you specify a CONTRAST statement a table is displayed that identifies parameters with numbers rather than names for use in tables and matrices where a compact identifier for parameters 2560 Chapter 37 The GENMOD Procedure is helpful For each parameter the table contains an index number that identifies the parameter and the parameter name including level information for effects containing classification variables Observation Statistics If you specify the OBSTATS option in the MODEL statement PROC GENMOD displays a table containing miscellaneous statistics Residuals and case deletion diagnostic statistics are not available for the multinomial distribution Case deletion diagnostics are not available for zero inflated models For each observation in the input data set the following are displayed the value of the response variable the predicted value of the mean the value of the linear predictor The value of an OFFSET variabl
43. effect X CLASS variable or effect SLICEFIT plot type Displays a curve of predicted values versus a continuous variable grouped by the levels of a CLASS effect PLOTB Y variable or CLASS effect SLICEB Y variable or CLASS effect X continuous variable For full details about the syntax and options of the EFFECTPLOT statement see the section EFFECTPLOT Statement on page 436 of Chapter 19 Shared Concepts and Topics ESTIMATE Statement 2481 ESTIMATE Statement ESTIMATE label effect values lt effect values gt lt options gt The ESTIMATE statement is similar to a CONTRAST statement except only one row L matrices are permitted In the case of zero inflated ZI models the statement syntax is ESTIMATE abel effect values lt effect values gt zero effect values lt effect values gt lt options gt where sets of effects values before the zero separator correspond to the regression part of the model and effects values after the zero separator correspond to the zero inflation part of the model In the case of ZI models a one row L matrix is created for the regression part of the model another one row L matrix is created for the zero inflation part of the model and separate estimates for the two L matrices are computed and displayed If you use the default less than full rank GLM CLASS variable parameterization each row is checked for estimability If PROC GENMO
44. estimator of the covariance matrix of B where ne pe Omi y v7 Cov Y V7 a i 1 It has the property of being a consistent estimator of the covariance matrix of even if the working correlation matrix is misspecified that is if Cov Y V See Zeger Liang and Albert 1988 Royall 1986 and White 1982 for further information about the robust variance estimate In computing Ze B and are replaced by estimates and Cov Y is replaced by the estimate Y mi Y ni 2536 Chapter 37 The GENMOD Procedure Multinomial GEEs Lipsitz Kim and Zhao 1994 and Miller Davis and Landis 1993 describe how to extend GEEs to multinomial data Currently only the independent working correlation is available for multinomial models in PROC GENMOD Alternating Logistic Regressions If the responses are binary that is they take only two values then there is an alternative method to account for the association among the measurements The alternating logistic regressions ALR algorithm of Carey Zeger and Diggle 1993 models the association between pairs of responses with log odds ratios instead of with correlations as ordinary GEEs do For binary data the correlation between the jth and kth response is by definition Pr Vij 1 Yik 1 Mij Mik Corr Vi Yik ss Vuy Wij Mik Mik The joint probability in the numerator satisfies the following bounds by elementary properties of pro
45. for X1 is more appropriate Under the revised model the p values for testing the functional forms of X2 and X3 are 0 20 and 0 63 respectively and the p value for testing the linearity of the model is 0 65 Thus the revised model seems reasonable Output 37 8 7 Multiple Regression Model with Log X1 The GENMOD Procedure Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Wald Parameter DF Estimate Error Confidence Limits Chi Square Pr gt ChiSq Intercept 1 0 1844 0 0504 0 0857 0 2832 13 41 0 0003 LogX1 1 0 9121 0 0491 0 8158 1 0083 345 05 lt 0001 X2 1 0 0095 0 0004 0 0088 0 0102 728 62 lt 0001 X3 1 0 0096 0 0003 0 0090 0 0101 1139 73 lt 0001 Scale 0 0 0434 0 0000 0 0434 0 0434 NOTE The scale parameter was estimated by the square root of Pearson s Chi Square DOF 2604 Chapter 37 The GENMOD Procedure Output 37 8 8 Cumulative Residual Plot with Log X1 Checking Functional Form for LogX1 Observed Path and First 20 Simulated Paths 0 02 2 3 N X 0 00 2 t 5 E O 0 02 Pr gt MaxAbsVal 0 4777 10000 Simulations 0 4 0 6 0 8 1 0 LogX1 Example 37 9 Assessment of a Marginal Model for Dependent Data This example illustrates the use of cumulative residuals to assess the adequacy of a marginal model for dependent data fit by generalized estimating equations GEEs The assessment methods are applied to CD4 count data from an AIDS clinical trial reported by Fisch
46. for numeric variables with no explicit format which are sorted by their unformatted internal values FREQ Descending frequency count levels with more observations come earlier in the order INTERNAL Unformatted value For more information about sorting order see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY group processing in SAS Language Reference Concepts PARAM keyword specifies the parameterization method for the classification variable or variables You can specify any of the keywords shown in the following table Design matrix columns are created from CLASS variables according to the corresponding coding schemes Value of PARAM Coding EFFECT Effect coding GLM Less than full rank reference cell coding this keyword can be used only in a global option ORDINAL Cumulative parameterization for an ordinal THERMOMETER CLASS variable POLYNOMIAL Polynomial coding POLY REFERENCE Reference cell coding REF ORTHEFFECT Orthogonalizes PARAM EFFECT coding ORTHORDINAL Orthogonalizes PARAM ORDINAL coding ORTHOTHERM ORTHPOLY Orthogonalizes PARAM POLYNOMIAL coding ORTHREF Orthogonalizes PARAM REFERENCE coding All parameterizations are full rank except for the GLM parameterization The REF option in the CLASS statement determines the reference level for EFFECT and REFERENCE coding and for their orthogonal parameterizations If PARAM ORTHPOLY or PARAM POLY and the classification
47. is less than 8 LOESS lt number gt LOWESS lt number gt requests model assessment based on loess smoothed residuals with optional number the BAYES Statement 2463 fraction of data used number must be between zero and one If number is not specified the default value one third is used NPATHS number NPATH number PATHS number PATH number specifies the number of simulated paths to plot in the default aggregate residuals plot The default value of number is twenty RESAMPLE lt number gt RESAMPLES lt number gt specifies that a p value be computed based on 1 000 simulated paths or number paths if number is specified SEED number specifies a seed for the normal random number generator used in creating simulated realizations of aggregates of residuals for plots and estimating p values Specifying a seed enables you to produce identical graphs and p values from one run of the procedure to the next run If a seed is not specified or if number is negative or zero a random number seed is derived from the time of day WINDOW lt number gt requests assessment based on a moving sum window of width number If number is not specified a value of one half of the range of the x coordinate is used BAYES Statement BAYES lt options gt The BAYES statement requests a Bayesian analysis of the regression model by using Gibbs sampling The Bayesian posterior samples also known as the chain for the regression
48. model as ei yi bi and let x be the value of the jth covariate in the model for observation i Then to check the functional form of the jth covariate consider the cumulative sum of residuals with respect to xij 1 n y I xij lt x ei vn where is the indicator function For any x W x is the sum of the residuals with values of x j less than or equal to x Wj x Denote the score or gradient vector by UP XAO B xi i vB i 1 where v r g7 r and 1 h r O COVO Let J be the Fisher information matrix dU B J B ap Define n 1 7 7 A meos YU iy lt x n I T B xi h x Bei Zi i 1 2542 Chapter 37 The GENMOD Procedure where v x np gt Tiy lt x i i 1 and Z are independent N 0 1 random variables Then the conditional distribution of x given yi Xi i 1 n under the null hypothesis Ho that the model for the mean is correct is the same asymptotically as n oo as the unconditional distribution of W x Lin Wei and Ying 2002 You can approximate realizations from the null hypothesis distribution of W x by repeatedly generating normal samples Z i 1 n while holding y x i 1 at their observed values and computing W x for each sample You can assess the functional form of covariate j by plotting a few realizations of W x on the same plot as the observed W x and visually comparing to see ho
49. or score statistic The Wald statistic for testing L B 0 is defined by S L B L ZL LB where B is the maximum likelihood estimate and is its estimated covariance matrix The asymptotic distribution of S is y2 where r is the rank of L Computed p values are based on this distribution If you specify a GEE model with the REPEATED statement is the empirical covariance matrix estimate DEVIANCE Statement DEVIANCE variable expression You can specify a probability distribution other than those available in PROC GENMOD by using the DEVIANCE and VARIANCE statements You do not need to specify the DEVIANCE or VARIANCE statement if you use the DIST MODEL statement option to specify a probability distribution The variable identifies the deviance contribution from a single observation to the procedure and it must be a valid SAS variable name that does not appear in the input data set The expression can be any arithmetic expression supported by the DATA step language and it is used 2480 Chapter 37 The GENMOD Procedure to define the functional dependence of the deviance on the mean and the response You use the automatic variables _MEAN_ and _RESP_ to represent the mean and response in the expression Alternatively the deviance function can be defined using programming statements see the section Programming Statements on page 2502 and assigned to a variable which is then listed as the expression This for
50. parameters are not tabulated The Bayesian posterior samples also known as the chain for the regression parameters can be output to a SAS data set Table 37 1 summarizes the options available in the BAYES statement Table 37 1 BAYES Statement Options Option Description Monte Carlo Options INITIAL Specifies the initial values of the chain INITIALMLE Specifies that maximum likelihood estimates be used as initial values of the chain METROPOLIS Specifies the use of a Metropolis step in the ARMS algo rithm NBI Specifies the number of burn in iterations NMC Specifies the number of iterations after burn in 2464 Chapter 37 The GENMOD Procedure Table 37 1 continued Option Description SAMPLING Specifies the algorithm used to sample the posterior distri bution SEED Specifies the random number generator seed THINNING Controls the thinning of the Markov chain Model and Prior Options COEFFPRIOR Specifies the prior of the regression coefficients DISPERSIONPRIOR Specifies the prior of the dispersion parameter PRECISIONPRIOR Specifies the prior of the precision parameter SCALEPRIOR Specifies the prior of the scale parameter Summary Statistics and Convergence Diagnostics DIAGNOSTICS Displays convergence diagnostics PLOTS Displays diagnostic plots STATISTICS Displays summary statistics of the posterior samples Posterior Samples OUTPOST Names a SAS data set for the posterior samples The following list descr
51. produce a data set that contains predicted values and residuals for each observation This data set can be useful for further analysis such as residual plotting The results from these statements are displayed in Output 37 2 1 Output 37 2 1 Log Linked Normal Regression The GENMOD Procedure Model Information Data Set WORK NOR Distribution Normal Link Function Log Dependent Variable y 2578 Chapter 37 The GENMOD Procedure Output 37 2 1 continued Criteria For Assessing Goodness Of Fit Criterion DF Value Value DF Deviance 14 52 3000 3 7357 Scaled Deviance 14 16 0000 1 1429 Pearson Chi Square 14 52 3000 3 7357 Scaled Pearson X2 14 16 0000 1 1429 Log Likelihood 32 1783 Full Log Likelihood 32 1783 AIC smaller is better 70 3566 AICC smaller is better 72 3566 BIC smaller is better 72 6743 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Wald Parameter DF Estimate Error Confidence Limits Chi Square Pr gt ChiSq Intercept 1 1 7214 0 0894 1 5461 1 8966 370 76 lt 0001 x 1 0 3496 0 0206 0 3091 0 3901 286 64 lt 0001 Scale 1 1 8080 0 3196 1 2786 2 5566 NOTE The scale parameter was estimated by maximum likelihood The PROC GENMOD scale parameter in the case of the normal distribution is the standard deviation By default the scale parameter is estimated by maximum likelihood You can specify a fixed standard deviation by using the NOSCALE and SCALE options in the MO
52. seed 1 coeffprior normal run Maximum likelihood estimates of the model parameters are computed by default These are shown in the Analysis of Maximum Likelihood Parameter Estimates table in Output 37 10 1 Output 37 10 1 Maximum Likelihood Parameter Estimates The GENMOD Procedure Bayesian Analysis Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Confidence Parameter DF Estimate Error Limits Intercept 1 2 4508 0 2284 2 0032 2 8984 X1 1 0 0044 0 0080 0 0201 0 0114 X2 1 0 0135 0 0024 0 0181 0 0088 X3 1 0 0029 0 0022 0 0072 0 0014 X4 1 0 2715 0 0795 0 4272 0 1157 x5 1 0 3215 0 0832 0 1585 0 4845 X6 1 0 2077 0 0827 0 0456 0 3698 Scale 0 1 0000 0 0000 1 0000 1 0000 NOTE The scale parameter was held fixed Noninformative independent normal prior distributions with zero means and variances of 10 were used in the initial analysis These are shown in Output 37 10 2 Output 37 10 2 Regression Coefficient Priors The GENMOD Procedure Bayesian Analysis Independent Normal Prior for Regression Coefficients Parameter Mean Precision Intercept 0 1E 6 x1 0 1E 6 X2 0 1E 6 X3 0 1E 6 X4 0 1E 6 x5 0 1E 6 X6 0 1E 6 2610 Chapter 37 The GENMOD Procedure Initial values for the Markov chain are listed in the Initial Values and Seeds table in Output 37 10 3 The random number seed is also listed so that you can reproduce the analysis Since no seed was specified th
53. statement GENMOD 2473 TRUNCATE option CLASS statement GENMOD 2476 TYPE option MODEL statement GENMOD 2498 TYPE3 option MODEL statement GENMOD 2498 TYPE option REPEATED statement GENMOD 2506 V6CORR option REPEATED statement GENMOD 2506 VARIANCE statement GENMOD procedure 2509 WALD option CONTRAST statement GENMOD 2479 MODEL statement GENMOD 2498 WALDCTI option MODEL statement GENMOD 2498 WEIGHT statement GENMOD procedure 2509 WITHIN option REPEATED statement GENMOD 2506 WITHINSUBJECT option REPEATED statement GENMOD 2506 XCONV option MODEL statement GENMOD 2487 XVARS option MODEL statement GENMOD 2499 YPAIR option REPEATED statement GENMOD 2506 ZDATA option REPEATED statement GENMOD 2507 ZEROMODEL statement GENMOD procedure 2510 ZROW option REPEATED statement GENMOD 2507 Your Turn We welcome your feedback e If you have comments about this book please send them to yourturn sas com Include the full title and page numbers if applicable e If you have comments about the software please send them to suggest sas com SAS Publishing Delivers Whether you are new to the work force or an experienced professional you need to distinguish yourself in this rapidly changing and competitive job market SAS Publishing provides you with a wide range of resources to help you set yourself apart Visit us online at support sas com bookstore
54. statement computes and plots using ODS Graphics model checking statistics based on aggregates of residuals See the section Assessment of Models Based on Aggregates of Residuals on page 2541 for details about the model assessment methods available in GENMOD The types of aggregates available are cumulative residuals moving sums of residuals and loess smoothed residuals If you do not specify which aggregate to use the assessments are based on cumulative sums PROC GENMOD uses ODS Graphics for graphical displays For specific information about the graphics available in PROC GENMOD see the section ODS Graphics on page 2572 You must specify either LINK or VAR in order to create an analysis LINK requests the assessment of the link function by performing the analysis with respect to the linear predictor VAR effect specifies that the functional form of a covariate be checked by performing the analysis with respect to the variable identified by the effect The effect must be specified in the MODEL statement and must contain only continuous variables variables not listed in a CLASS statement You can specify the following options after the slash CRPANEL requests that a plot with four panels showing just a few of the paths from the default aggregate plot to make it easier to compare simulated and observed paths The plot in each panel contains aggregates of the observed residuals and two simulated curves fewer if NPATHS
55. statistic as a function of observation number DFBETA plots the 8 deletion statistic as a function of observation number for each regression parameter in the model DFBETAS plots the standardized B deletion statistic as a function of observation number for each regression parameter in the model LEVERAGE plots the leverage as a function of observation number PREDICTED lt option gt plots predicted values with confidence limits as a function of observation number The PREDICTED plot request has the following option CLM includes confidence limits in the predicted value plot PZERO plots the zero inflation probability for zero inflated Poisson and negative binomial models as a function of observation number RESCHI lt options gt The RESCHI plot request has the following options INDEX plots as a function of observation number XBETA plots as a function of linear predictor If you do not specify an option Pearson residuals are plotted as a function of observation number RESDEV lt options gt plots deviance residuals The RESDEV plot request has the following options INDEX plots as a function of observation number XBETA plots as a function of linear predictor 2460 Chapter 37 The GENMOD Procedure If you do not specify an option deviance residuals are plotted as a function of observation number RESLIK lt options gt plots likelihood residuals The RESLIK plot request has the following options
56. the GEE model fit Figure 37 27 displays general information about the GEE model fit Figure 37 27 GEE Model Information The GENMOD Procedure GEE Model Information Correlation Structure Exchangeable Subject Effect case 16 levels Number of Clusters 16 Correlation Matrix Dimension 4 Maximum Cluster Size 4 Minimum Cluster Size 4 Figure 37 28 displays the parameter estimate covariance matrices specified by the COVB option Both model based and empirical covariances are produced Generalized Estimating Equations 2455 Figure 37 28 GEE Parameter Estimate Covariance Matrices Covariance Matrix Model Based Prml1 Prm2 Prm4 Prm5 Prml 5 74947 0 22257 0 53472 0 01655 Prm2 0 22257 0 45478 0 002410 0 01876 Prm4 0 53472 0 002410 0 05300 0 01658 Prm5 0 01655 0 01876 0 01658 0 19104 Covariance Matrix Empirical Prml1 Prm2 Prm4 Prm5 Prml 9 33994 0 85104 0 83253 0 16534 Prm2 0 85104 0 47368 0 05736 0 04023 Prm4 0 83253 0 05736 0 07778 0 002364 Prm5 0 16534 0 04023 0 002364 0 13051 The exchangeable working correlation matrix specified by the CORRW option is displayed in Figure 37 29 Figure 37 29 GEE Working Correlation Matrix Working Correlation Matrix Coll Co12 Col1l3 Col4 Rowl 1 0000 0 1648 0 1648 0 1648 Row2 0 1648 1 0000 0 1648 0 1648 Row3 0 1648 0 1648 1 0000 0 1648 Row4 0 1648 0 1648 0 1648 1 0000 The parameter estimates table displayed in Figure 37 30 contain
57. the EXP option an additional row is displayed with statistics for the exponentiated value of the contrast CONTRAST Coefficients If you specify the CONTRAST or ESTIMATE statement and you specify the E option a table titled Coefficients For Contrast label is displayed where label is the label specified in the CONTRAST statement The table contains the contrast label and the rows of the contrast matrix Iteration History for Contrasts If you specify the ITPRINT option an iteration history table is displayed for fitting the model with contrast constraints for each effect The table contains the contrast label the iteration number the ridge value the log likelihood and values of all parameters CONTRAST Statement Results If you specify a REPEATED statement the CONTRAST statement results apply to the specified GEE model Otherwise they apply to the specified generalized linear model A table is displayed that contains the contrast label the degrees of freedom for the contrast and the likelihood ratio score or Wald statistic for testing the significance of the contrast Score statistics are used in GEE models likelihood ratio statistics are used in generalized linear models and Wald Statistics are used in both Also displayed are the p value computed from the chi square distribution and the type of statistic computed for this contrast Wald LR or score If you specify either the SCALE DEVIANCE or SCALE PEARSON option for
58. the input data set FORMATTED External formatted value except for numeric variables with no explicit format which are sorted by their unformatted internal value FREQ Descending frequency count levels with the most observations come first in the order INTERNAL Unformatted value By default RORDER FORMATTED For RORDER FORMATTED and RORDER INTERNAL the sort order is machine dependent The DESCENDING option in the PROC GENMOD statement causes the response variable to be sorted in the reverse of the order displayed in the 2462 Chapter 37 The GENMOD Procedure previous table For more information about sorting order refer to the chapter on the SORT procedure in the Base SAS Procedures Guide The NOPRINT option which suppresses displayed output in other SAS procedures is not available in the PROC GENMOD statement However you can use the Output Delivery System ODS to suppress all displayed output store all output on disk for further analysis or create SAS data sets from selected output You can suppress all displayed output with the statement ODS SELECT NONE and turn displayed output back on with the statement ops SELECT ALL See Table 37 8 and Table 37 9 for the names of output tables available from PROC GENMOD For more information about ODS see Chapter 20 Using the Output Delivery System ASSESS Statement ASSESS VAR effect LINK lt options gt ASSESSMENT VAR effect LINK lt options gt The ASSESS
59. they arrived at the following model Y Bo 1X1 2X2 B3X3 where Y is the logarithm base 10 of the survival time X1 X2 X3 are blood clotting score prognostic index and enzyme function respectively and is a normal error term A listing of the SAS data set containing the data is shown in Output 37 8 1 The variables Y X1 X2 and X3 correspond to Y X1 X2 and X3 and LogX1 is log X1 The PROC GENMOD fit of the model is shown in Output 37 8 2 The analysis first focuses on the adequacy of the functional form of X14 blood clotting score 2598 Chapter 37 The GENMOD Procedure Output 37 8 1 Surgical Unit Example Data Obs wownodt nauk WNP aouunnubr Be KP PK HP PPP WWWWWWWWWWNNNNNNNHNNNNRPRPRPRPRPRPRPRP RP AWUN ER OW WAIHDHUOBPWNHRFP OW WAI HUBPWNHRF OHO WATHDUBPWNHRFOW WANHAUFWNEF OO NNNNNNNNNNNNFKRFPNNNNNNNNFKRFNEFNNNNNHNKRKFPNNNNPFPRYPNNNNEFNNNNRFP HF NNDNDND DN Y 3010 0043 3096 0043 7067 9031 9031 1038 3054 3075 5172 8129 9191 5185 2253 3365 9395 5315 3324 2355 0374 1335 8451 3424 4409 1584 2577 7589 8573 2504 8513 7634 0645 4698 0607 2648 0719 0792 1790 1703 9777 8751 6840 1847 2810 0899 4928 5999 1987 4914 0934 0969 2967 4955 X1 aAanaw inawunrunist WoarAIATUMUAAHAwWAWUUHAHATU DH m m OADANDNAWUPRPRHDAWHWWUUKBPDPNUUU OH OW UW UW ORF DHDOFWUUKRUADWAANF OWA WR WAOAITINADANNNDAWAOAANT
60. with the ith observation deleted or in the case of correlated data with the ith cluster deleted p is the dimension of the regression parameter vector B i Yi pi am Tpi is the standardized Pearson residual JONY where v is the variance of the ith response and h is the leverage defined in the section H LEVERAGE on page 2545 Vi is the variance of response i var Y V ui where V u is the variance function and is the dispersion parameter w is the prior weight of the ith observation specified with the WEIGHT statement If there is no WEIGHT statement w 1 for all i All unknown quantities are replaced by their estimated values in the following two sections Case Deletion Diagnostic Statistics 2545 Diagnostics for Ordinary Generalized Linear Models The following statistics are available for generalized linear models DFBETA The DFBETA statistic for measuring the influence of the ith observation is defined as the one step approximation to the difference in the MLE of the regression parameter vector and the MLE of the regression parameter vector without the ith observation This one step approximation assumes a Fisher scoring step and is given by RB 1 fi DFBETA X WX X W2 1 hi pi where h is the leverage defined in the section H LEVERAGE on page 2545 DFBETAS The standardized DFBETA statistic for assessing the influence of the ith observation on the jth regression par
61. 000 Iteration O O Posterior Density 0 10 20 30 40 50 1200 1000 800 600 400 Lag Intercept 2448 Chapter 37 The GENMOD Procedure Figure 37 21 Diagnostic Plots for logX1 Diagnostics for Logx1 Logx1 2000 4000 6000 8000 10000 12000 Iteration 1 0 0 5 0 0 Autocorrelation Posterior Density 1 0 0 10 20 30 40 50 0 100 200 300 Lag Logx1 Bayesian Analysis of a Linear Regression Model 2449 Figure 37 22 Diagnostic Plots for X2 Autocorrelation Diagnostics for x2 6 5 4 3 2 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 oO Q 0 0 o D 2 0 5 1 0 0 10 20 30 40 50 1 2 3 4 5 6 7 Lag x2 2450 Chapter 37 The GENMOD Procedure Figure 37 23 Diagnostic Plots for X3 x3 Autocorrelation Diagnostics for x3 6 5 4 3 2 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 Cc oO Q 0 0 D E 0 5 1 0 0 10 20 30 40 50 2 3 4 5 6 Lag x3 Bayesian Analysis of a Linear Regression Model 2451 Figure 37 24 Diagnostic Plots for X4 x4 Autocorrelation Diagnostics for x4 T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 oO Q 0 0 D 2 0 5 1 0 0 10 20 30 40 50 40 20 0 20 40 60 80 Lag x4 2452 Chapter 37 The GENMOD Procedure Figure 37 25 Diagnostic Plots for X5 Diagnostics for Dispersion 8000 6000 Di
62. 1 2 3 5 0 0 0 0 1 0 1 1 2 4 6 0 0 0 0 0 1 1 1 3 4 7 1 0 0 0 0 0 1 2 1 2 8 0 1 0 0 0 0 1 2 1 3 9 0 0 1 0 0 0 1 2 1 4 10 0 0 0 1 0 0 1 2 2 3 11 0 0 0 0 1 0 1 2 2 4 12 0 0 0 0 0 1 1 2 3 4 2592 Chapter 37 The GENMOD Procedure The following statements fit the model for fully parameterized clusters by fully specifying the z matrix The results are identical to those shown previously proc genmod data resp descend class id treatment ref P center ref 1 sex ref M baseline ref 0 param ref model outcome treatment center sex age baseline dist bin repeated subject id center logor zfull zdata zin zrow z1 z6 ypair yl y2 run Example 37 7 Log Linear Model for Count Data In this example the data from Thall and Vail 1990 concern the treatment of people suffering from epileptic seizure episodes These data are also analyzed in Diggle Liang and Zeger 1994 The data consist of the number of epileptic seizures in an eight week baseline period before any treatment and in each of four two week treatment periods in which patients received either a placebo or the drug Progabide in addition to other therapy A portion of the data is displayed in Table 37 12 See Gee Model for Count Data Exchangeable Correlation in the SAS STAT Sample Program Library for the complete data set Table 37 12 Epileptic Seizure Data Patient ID Treatment Baseline Visit Visit2 Visit3 Visit4 104 Placebo 11 5 3 3 3
63. 4 Alpha3 2 3 Alpha4 2 4 Alphas 3 4 Alpha6 LOGORVAR variable specifies log odds ratios by cluster The argument variable is a variable name that defines the block effects between clusters The log odds ratios are constant within clusters but they take a different value for each different value of the variable For example if Center is a variable in the input data set taking a different value for k treatment centers then specifying LOGOR LOGORVAR Center requests a model with different log odds ratios for each of the k centers constant within center NESTK specifies k nested log odds ratios You must also specify the SUB CLUST variable option to define subclusters within clusters Within each cluster PROC GENMOD computes a log odds ratio parameter for 2538 Chapter 37 The GENMOD Procedure NEST1 ZFULL ZREP ZREP matrix pairs having the same value of variable for both members of the pair and one log odds ratio parameter for each unique combination of different values of variable specifies 1 nested log odds ratios You must also specify the SUB CLUST variable option to define subclusters within clusters There are two log odds ratio parameters for this model Pairs having the same value of variable correspond to one parameter pairs having different values of variable correspond to the other parameter For example if clusters are hospitals and subclusters are wards within hospitals then patients within th
64. 5566 3 2109 5 3929 x3 1 4 0309 0 4996 3 0517 5 0100 x4 1 18 1377 12 0721 5 5232 41 7986 Scale 1 59 8591 5 7599 49 5705 72 2832 NOTE The scale parameter was estimated by maximum likelihood Since no prior distributions for the regression coefficients were specified the default noninformative uniform distributions shown in the Uniform Prior for Regression Coefficients table in Figure 37 10 are used Noninformative priors are appropriate if you have no prior knowledge of the likely range of values of the parameters and if you want to make probability statements about the parameters or functions of the parameters See for example Ibrahim Chen and Sinha 2001 for more information about choosing prior distributions Figure 37 10 Regression Coefficient Priors The GENMOD Procedure Bayesian Analysis Uniform Prior for Regression Coefficients Parameter Prior Intercept Constant Logx1 Constant x2 Constant x3 Constant x4 Constant 2444 Chapter 37 The GENMOD Procedure The default noninformative gamma prior distribution for the normal scale parameter is shown in the Independent Prior Distributions for Model Parameters table in Figure 37 11 Figure 37 11 Scale Parameter Prior Independent Prior Distributions for Model Parameters Prior Hyperparameters Parameter Distribution Shape Scale Dispersion Inverse Gamma 2 001 0 0001 By default the maximum likelihood estimates of the regress
65. 6 X3 0 0324 0 0036 0 0033 0 0160 X4 0 0309 0 0056 0 0053 0 0115 x5 0 0402 0 0015 0 0111 0 0123 X6 0 0696 0 0047 0 0024 0 0006 2612 Chapter 37 The GENMOD Procedure The p values for the Geweke test statistics shown in Output 37 10 9 all indicate convergence of the MCMC See the section Assessing Markov Chain Convergence on page 155 for more information about convergence diagnostics and their interpretation Output 37 10 9 Geweke Diagnostic Statistics Geweke Diagnostics Parameter z Pr gt z Intercept 0 9855 0 3244 X1 1 0835 0 2786 X2 0 3847 0 7005 X3 0 6715 0 5019 x4 0 1328 0 8943 x5 1 0698 0 2847 X6 0 1647 0 8692 The effective sample sizes for each parameter are shown in Output 37 10 10 Output 37 10 10 Effective Sample Sizes Effective Sample Sizes Autocorrelation Parameter ESS Time Efficiency Intercept 9245 8 1 0816 0 9246 X1 8179 5 1 2226 0 8179 X2 8067 8 1 2395 0 8068 X3 9390 6 1 0649 0 9391 x4 9157 6 1 0920 0 9158 x5 9665 2 1 0346 0 9665 X6 8778 7 1 1391 0 8779 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2613 Trace autocorrelation and density plots for the seven model parameters are shown in Output 37 10 11 through Output 37 10 17 All indicate satisfactory convergence of the Markov chain Output 37 10 11 Diagnostic Plots for Intercept Diagnostics for Intercept Intercept 1 5 T T T T 2000 4000 6000 8000 10000 12000 It
66. 6 2 00148 4 6 5 73 41 2 01 101 2 0043 1 87180 5 7 8 65 115 4 30 509 2 7067 2 05412 6 5 8 38 72 1 42 80 1 9031 1 75786 7 5 7 46 63 1 91 80 1 9031 1 74047 8 3 7 68 81 2 57 127 2 1038 1 30833 9 6 0 67 93 2 50 202 2 3054 1 79176 10 3 7 76 94 2 40 203 2 3075 1 30833 11 6 3 84 83 4 13 329 2 5172 1 84055 12 6 7 51 43 1 86 65 1 8129 1 90211 13 5 8 96 114 3 95 830 2 9191 1 75786 14 5 8 83 88 3 95 330 2 5185 1 75786 15 7 7 62 67 3 40 168 2 2253 2 04122 16 7 4 74 68 2 40 217 2 3365 2 00148 17 6 0 85 28 2 98 87 1 9395 1 79176 18 3 7 51 41 1 55 34 1 5315 1 30833 19 7 3 68 74 3 56 215 2 3324 1 98787 20 5 6 57 87 3 02 172 2 2355 1 72277 Consider the model Y Bo Bi LogX1 2X2 B3X3 b4X4 e where Y is the survival time LogX1 is log blood clotting score X2 is a prognostic index X3 is an enzyme function test score X4 is a liver function test score and is an N 0 o error term A question of scientific interest is whether blood clotting score has a positive effect on survival time Using PROC GENMOD you can obtain a maximum likelihood estimate of the coefficient and construct a null point hypothesis to test whether 61 is equal to 0 However if you are interested in finding the probability that the coefficient is positive Bayesian analysis offers a convenient alternative You can use Bayesian analysis to directly estimate the conditional probability Pr B gt 0 Y using the posterior distribution samples which are produced a
67. 6 Diagnostic Plots for X5 X5 Autocorrelation Diagnostics for X5 0 0 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 c oO Q 0 0 5 D 2 0 5 1 0 0 10 20 30 40 50 0 0 0 2 0 4 0 6 Lag X5 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2619 Output 37 10 17 Diagnostic Plots for X6 X6 Autocorrelation Diagnostics for X6 T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 O oO Q 0 0 o D 2 0 5 a 1 0 0 10 20 30 40 50 0 0 0 2 0 4 Lag X6 In order to illustrate the use of an informative prior distribution suppose that researchers expect that a unit increase in body mass index X1 will be associated with an increase in the mean number of nodes of between 10 and 20 and they want to incorporate this prior knowledge in the Bayesian analysis For log linear models the mean and linear predictor are related by log u x If X11 and X12 are two values of body mass index j11 and u2 are the two mean values and all other covariates remain equal for the two values of X1 then Ph exp B X11 X12 H2 so that for a unit change in X1 ZL exp 8 H2 If 1 1 lt a lt 1 2 then 1 1 lt exp B lt 1 2 or 0 095 lt 6 lt 0 182 This gives you guidance in specifying a prior distribution for the 6 for body mass index Taking the mean of the prior normal 2620 Chapter 37 The GENMOD Procedure distribution to be the m
68. 60 23 406 1054 1935 561 348 130 13 230 250 317 304 79 1793 536 12 9 256 201 733 510 660 2580 Chapter 37 The GENMOD Procedure 122 27 273 1231 182 289 667 761 1096 43 44 87 405 998 1409 61 278 407 113 25 940 28 848 41 646 575 219 303 304 38 195 1061 174 377 388 10 246 323 198 234 39 308 55 729 813 1216 1618 539 6 1566 459 946 764 794 35 181 147 116 141 19 380 609 546 run data lifdat set A B run The following SAS statements use PROC GENMOD to compute Type 3 statistics to test for differ ences between the two manufacturers in machine part life Type 3 statistics are identical to Type 1 statistics in this case since there is only one effect in the model The log link function is selected to ensure that the mean is positive proc genmod data lifdat class mfg model lifetime mfg dist gamma link log type3 run Example 37 3 Gamma Distribution Applied to Life Data 2581 The output from these statements is displayed in Output 37 3 1 Output 37 3 1 Gamma Model of Life Data Parameter Intercept mfg mfg Scale NOTE The Criterion DF Value Value DF Deviance 199 287 0591 1 4425 Scaled Deviance 199 237 5335 1 1936 Pearson Chi Square 199 211 6870 1 0638 Scaled Pearson X2 199 175 1652 0 8802 Log Likelihood 1432 4177 Full Log Likelihood 1432 4177 AIC smaller is better 2870 8353 AICC smaller is better 2870 9572 BIC smaller is better 2880 7453 The GENMOD Procedure Model Information
69. 86 0 1 1 61 28 7201 52 9178 37 286 1 0 1 6 21 3669 61 6603 54 143 0 1 1 6 23 7332 42 2904 0 571 1 0 1 21 More lines 19 1327 65 3425 2 571 1 0 0 1 17 3010 51 4493 4 429 1 0 0 6 run The primary interest is in prediction of the number of cancerous liver nodes when a patient enters the trials by using six other baseline characteristics The number of nodes is modeled by a Poisson regression model with the six baseline characteristics as covariates The response and regression variables are as follows Y Number of Cancerous Liver Nodes X1 Body Mass Index X2 Age in Years X3 Time Since Diagnosis of Disease in Weeks X4 Two Biochemical Markers each classified as normal 1 or abnormal 0 X5 Anti Hepatitis B Antigen X6 Associated Jaundice yes 1 no 0 Two analyses are performed using PROC GENMOD The first analysis uses noninformative normal prior distributions and the second analysis uses an informative normal prior for one of the regression parameters In the following BAYES statement COEFFPRIOR NORMAL specifies a noninformative indepen dent normal prior distribution with zero mean and variance 10 for each parameter The initial analysis is performed using PROC GENMOD to obtain Bayesian estimates of the regression coefficients by using the following SAS statements Example 37 10 Bayesian Analysis of a Poisson Regression Model 2609 ods graphics ON proc genmod data Liver model Y X1 X6 dist Poisson link log bayes
70. ANAWANIOCKAN AAAI WAIHONT OOM FF AI X2 62 59 57 73 65 38 46 68 67 76 84 51 96 83 62 74 85 51 68 57 52 83 26 67 59 61 52 76 54 76 64 45 59 72 58 51 74 61 52 49 28 86 56 77 40 73 86 67 82 77 85 59 78 X3 81 66 83 41 115 72 63 81 93 94 83 43 114 88 67 68 28 41 74 87 76 53 68 86 100 73 86 90 56 59 65 23 73 93 70 99 86 119 76 88 72 99 88 77 93 84 106 101 77 103 46 40 85 72 oooooo0ooooo0ooo0oo0oo0o0o0o0o0o0o0o0oo0orcoco0o0co0ce0ce0o0o0o0o0eo0o0aoa0ao0aoeaeoe0eo0eo0oaoaoao0ao0aeNd ao LogX1 82607 70757 86923 81291 89209 76343 75587 56820 77815 56820 79934 82607 76343 76343 88649 86923 77815 56820 86332 74819 71600 53148 82607 76343 79934 76343 71600 04922 71600 76343 50515 93952 69897 76343 73239 72428 41497 63347 68124 73239 71600 55630 94448 81291 53148 81291 65321 68124 70757 59106 81954 80618 80618 94448 Example 37 8 Model Assessment of Multiple Regression Using Aggregates of Residuals 2599 In order to assess the adequacy of the fitted multiple regression model the ASSESS statement in the following SAS statements is used to create the plots of cumulative residuals against X1 shown in Output 37 8 3 and Output 37 8 4 and the summary table in Output 37 8 5 ods graphics on proc genmod data Surg model
71. C Procedure The following discussion of exact Poisson regression also called exact conditional Poisson regression uses the notation given in that section Note that in exact logistic regression the coefficients C t are the counts of the number of possible response vectors y that generate t C t y y X t However when performing an exact Poisson regression this value is replaced by n N CH gt II J Q ie where Q y y X t and N exp o is the exponential of the offset o for observation i The probability density function pdf for T is created by summing over all binary sequences y that generate an observable t C t exp t B I exp N e if Pr T t 2554 Chapter 37 The GENMOD Procedure However the conditional likelihood of Ty given Tx ty is the same as that for exact logistic regression For details about hypothesis testing and estimation see the sections Hypothesis Tests on page 3976 and Inference for a Single Parameter on page 3977 of Chapter 51 The LOGISTIC Procedure See the section Computational Resources for Exact Logistic Regression on page 3985 of Chapter 51 The LOGISTIC Procedure for some computational notes about exact analyses The offset variable 0 is required for exact Poisson regression computationally to provide an stopping point for the algorithm Denote N exp o In exact logistic binary regression there are a finite
72. CENDING option CLASS statement GENMOD 2474 DEVIANCE statement GENMOD procedure 2479 2503 DIAGNOSTICS option MODEL statement GENMOD 2493 DIST option MODEL statement GENMOD 2493 DSCALE MODEL statement GENMOD 2497 E option CONTRAST statement GENMOD 2479 ESTIMATE statement GENMOD 2481 ECORRB option REPEATED statement GENMOD 2504 ECOVB option REPEATED statement GENMOD 2504 EFFECTPLOT statement GENMOD procedure 2480 ERR option MODEL statement GENMOD 2493 ESTIMATE option EXACT statement GENMOD 2482 ESTIMATE statement GENMOD procedure 2481 EXACT statement GENMOD procedure 2482 EXACTONLY option PROC GENMODstatement 2457 EXACTOPTIONS statement GENMOD procedure 2484 EXP option ESTIMATE statement GENMOD 2481 EXPECTED option MODEL statement GENMOD 2493 FCONV option MODEL statement GENMOD 2485 FREQ statement GENMOD procedure 2487 FWDLINK statement GENMOD procedure 2487 2503 GENMOD procedure syntax 2456 GENMOD procedure ASSESS statement 2462 GENMOD PROCEDURE BAYES statement 2463 GENMOD procedure BAYES statement STATISTICS option 2472 THINNING option 2473 GENMOD procedure BY statement 2473 GENMOD procedure CLASS statement 2474 CPREFIX option 2474 DESCENDING option 2474 LPREFIX option 2474 MISSING option 2474 ORDER option 2474 PARAME option 2475 REF option 2476 TRUNCATE option 2476 GENMOD procedure CONTRAST statement 2477 E o
73. D displays a table containing miscellaneous statistics For each observation in the input data set the following are displayed the value of the response variable and all other variables in the model denoted by the variable names the predicted value of the mean the value of the linear predictor the standard error of the linear predictor confidence limits for the predicted values raw residual Pearson residual cluster number leverage cluster leverage cluster Cook s distance statistic studentized cluster Cook s distance statistic individual observation Cook s distance statistic cluster DFBETA statistic for each parameter cluster standardized DFBETA statistic for each parameter individual observation DFBETA statistic for each parameter individual observation standardized DFBETA statistic for each parameter Displayed Output for Bayesian Analysis If a Bayesian analysis is requested with a BAYES statement the displayed output includes the following Displayed Output for Bayesian Analysis 2565 Model Information The Model Information table displays the two level data set name the number of burn in iterations the number of iterations after the burn in the number of thinning iterations the response distribution the link function the response variable name the offset variable name the frequency variable name the scale weight variable name the number of observations used the number of events if events t
74. D finds a contrast to be nonestimable it displays missing values in corresponding rows in the results See Searle 1971 for a discussion of estimable functions The actual estimate L y its approximate standard error and its confidence limits are displayed A Wald chi square test that L B 0 is also displayed The approximate standard error of the estimate is computed as the square root of L L where is the estimated covariance matrix of the parameter estimates If you specify a GEE model in the REPEATED statement X is the empirical covariance matrix estimate If you specify the EXP option then exp L B its standard error and its confidence limits are also displayed The construction of the L vector and the checking for estimability for an ESTIMATE statement follow the same rules as listed under the CONTRAST statement You can specify the following options in the ESTIMATE statement after a slash ALPHA number requests that a confidence interval be constructed with confidence level 1 number The value of number must be between 0 and 1 the default value is 0 05 requests that the L matrix coefficients be displayed EXP requests that exp L B its standard error and its confidence limits be computed If you specify the EXP option standard errors and confidence intervals are computed using the delta method SINGULAR number EPSILON number tunes the estimability checking as described for the CONTRAST state
75. DEL statement proc print data Residuals run Output 37 2 2 Data Set of Predicted Values and Residuals Obs x y Pred Reschi Resraw Resdev Stdreschi Stdresdev Reslik 10 5 5 5921 0 59212 0 59212 0 59212 0 34036 0 34036 0 34036 2 0 7 5 5921 1 40788 1 40788 1 40788 0 80928 0 80928 0 80928 3 0 9 5 5921 3 40788 3 40788 3 40788 1 95892 1 95892 1 95892 4 1 7 7 9324 0 93243 0 93243 0 93243 0 54093 0 54093 0 54093 5 1 10 7 9324 2 06757 2 06757 2 06757 1 19947 1 19947 1 19947 6 1 8 7 9324 0 06757 0 06757 0 06757 0 03920 0 03920 0 03920 7 2 11 11 2522 0 25217 0 25217 0 25217 0 14686 0 14686 0 14686 8 2 9 11 2522 2 25217 2 25217 2 25217 1 31166 1 31166 1 31166 9 3 16 15 9612 0 03878 0 03878 0 03878 0 02249 0 02249 0 02249 10 3 13 15 9612 2 96122 2 96122 2 96122 1 71738 1 71738 1 71738 11 3 14 15 9612 1 96122 1 96122 1 96122 1 13743 1 13743 1 13743 12 4 25 22 6410 2 35897 2 35897 2 35897 1 37252 1 37252 1 37252 13 4 24 22 6410 1 35897 1 35897 1 35897 0 79069 0 79069 0 79069 14 5 34 32 1163 1 88366 1 88366 1 88366 1 22914 1 22914 1 22914 15 5 32 32 1163 0 11634 0 11634 0 11634 0 07592 0 07592 0 07592 16 5 30 32 1163 2 11634 2 11634 2 11634 1 38098 1 38098 1 38098 Example 37 3 Gamma Distribution Applied to Life Data 2579 The data set of predicted values and residuals Output 37 2 2 is created by the OUTPUT statement You can use the PLOTS option in the PROC GENMOD statement to create plots of
76. EAN_ to represent the mean in the preceding expression INVLINK Statement INVLINK variable expression If you define a link function in the FWDLINK statement then you must define the inverse link function by using the INVLINK statement If you use the MODEL statement option LINK to specify a link function you do not need to use the INVLINK statement The variable identifies the inverse link function to the procedure The expression can be any arithmetic expression supported by the DATA step language and it is used to define the functional dependence on the linear predictor Alternatively the inverse link function can be defined using programming statements see the section Programming Statements on page 2502 and assigned to a variable which is then listed as the expression The second form is convenient for using complex statements such as IF THEN ELSE clauses The automatic variable _XBETA_ represents the linear predictor in the preceding expression LSMEANS Statement LSMEANS lt model effects gt lt options gt The LSMEANS statement computes and compares least squares means LS means of fixed effects LS means are predicted population margins that is they estimate the marginal means over a balanced population In a sense LS means are to unbalanced designs as class and subclass arithmetic means are to balanced designs Table 37 3 summarizes important options in the LSMEANS statement If you specify the BAYE
77. ENMOD procedure 2479 estimation dispersion parameter GENMOD 2432 maximum likelihood GENMOD 2516 regression parameters GENMOD 2432 events trials format for response GENMOD procedure 2491 2513 exact conditional logistic regression see exact logistic regression exact conditional Poisson regression see exact Poisson regression exact logistic regression GENMOD procedure 2482 2553 exact Poisson regression GENMOD procedure 2482 2507 2553 2622 exponential distribution GENMOD procedure 2582 F statistics GENMOD procedure 2526 2527 Fisher s scoring method GENMOD procedure 2498 2517 gamma distribution GENMOD procedure 2512 GEE see generalized estimating equations Generalized Estimating Equations GEE 2453 generalized estimating equations GEE 2503 2532 2586 2592 generalized linear model GENMOD procedure 2431 theory GENMOD 2510 GENMOD procedure adjusted residuals 2529 AIC 2519 Akaike s information criterion 2519 aliasing 2438 Bayesian analysis linear regression 2440 Bayesian information criterion 2519 BIC 2519 binomial distribution 2513 built in link function 2432 built in probability distribution 2433 case deletion diagnostics 2544 classification variables 2522 confidence intervals 2492 continuous variables 2522 contrasts 2481 convergence criterion 2484 2485 2492 2503 correlated data 2429 2532 correlation matrix 2492 2517 covariance matrix 2492 2517 crossed effec
78. INDEX plots as a function of observation number XBETA plots as a function of linear predictor If you do not specify an option likelihood residuals are plotted as a function of observa tion number RESRAW lt options gt plots raw residuals The RESRAW plot request has the following options INDEX plots as a function of observation number XBETA plots as a function of linear predictor If you do not specify an option raw residuals are plotted as a function of observation number STDRESCHI lt options gt plots standardized Pearson residuals The STDRESCHI plot request has the following options INDEX plots as a function of observation number XBETA plots as a function of linear predictor If you do not specify an option standardized Pearson residuals are plotted as a function of observation number STDRESDEV lt options gt plots standardized deviance residuals The STDRESDEYV plot request has the following options INDEX plots as a function of observation number XBETA plots as a function of linear predictor If you do not specify an option standardized deviance residuals are plotted as a function of observation number PROC GENMOD Statement 2461 If you fit a model by using generalized estimating equations GEEs the following additional plot requests are available CLEVERAGE plots the cluster leverage as a function of ordered cluster CLUSTERCOOKSD DCLS plots the cluster Cook s
79. L statement GENMOD 2493 INFO option STRATA statement GENMOD 2509 INITIAL option MODEL statement GENMOD 2493 REPEATED statement GENMOD 2504 INTERCEPT option MODEL statement GENMOD 2494 REPEATED statement GENMOD 2504 INVLINK statement GENMOD procedure 2488 2503 ITPRINT option MODEL statement GENMOD 2494 JOINT option EXACT statement GENMOD 2483 JOINTONLY option EXACT statement GENMOD 2483 keyword option OUTPUT statement GENMOD 2499 LINK option MODEL statement GENMOD 2494 ZEROMODEL statement GENMOD 2510 LOGOR2 option REPEATED statement GENMOD 2504 LPREFIX option CLASS statement GENMOD 2474 LRCTI option MODEL statement GENMOD 2495 LSMESTIMATE statement GENMOD procedure 2489 MAXIT option MODEL statement GENMOD 2495 MAXITER option REPEATED statement GENMOD 2505 MCORRB option REPEATED statement GENMOD 2505 MCOVB option REPEATED statement GENMOD 2505 MIDPFACTOR option EXACT statement GENMOD 2483 MISSING option CLASS statement GENMOD 2474 STRATA statement GENMOD 2508 MODEL statement GENMOD procedure 2491 MODELSE option REPEATED statement GENMOD 2505 NAMELEN2 option PROC GENMOD statement 2457 NOINT option MODEL statement GENMOD 2495 NOLOGSCALE option MODEL statement GENMOD 2486 NOSCALE option MODEL statement GENMOD 2495 NOSUMMARY option STRATA statement GENMOD 2509 OBSTATS option MODEL statem
80. NMOD Procedure Bayesian Analysis Posterior Autocorrelations Parameter Lag 1 Lag 5 Lag 10 Lag 50 Intercept 0 0050 0 0023 0 0138 0 0032 Logx1 0 0030 0 0063 0 0070 0 0034 x2 0 0113 0 0046 0 0235 0 0139 x3 0 0019 0 0064 0 0073 0 0047 x4 0 0001 0 0084 0 0050 0 0084 Dispersion 0 0019 0 0088 0 0297 0 0025 Figure 37 18 Geweke Diagnostic Statistics Geweke Diagnostics Parameter z Pr gt z Intercept 0 8783 0 3798 Logx1 1 4800 0 1389 x2 0 0438 0 9651 x3 0 1000 0 9204 x4 0 8893 0 3739 Dispersion 0 1011 0 9195 Figure 37 19 Effective Sample Sizes Effective Sample Sizes Autocorrelation Parameter ESS Time Efficiency Intercept 10000 0 1 0000 1 0000 Logx1 10000 0 1 0000 1 0000 x2 10232 2 0 9773 1 0232 x3 10000 0 1 0000 1 0000 x4 10000 0 1 0000 1 0000 Dispersion 10000 0 1 0000 1 0000 Trace autocorrelation and density plots for the seven model parameters shown in Figure 37 20 through Figure 37 25 are useful in diagnosing whether the Markov chain of posterior samples has converged These plots show no evidence that the chain has not converged See the section Visual Analysis via Trace Plots on page 155 for help with interpreting these diagnostic plots Bayesian Analysis of a Linear Regression Model 2447 Figure 37 20 Diagnostic Plots for Intercept Intercept Autocorrelation Diagnostics for Intercept 400 600 800 1000 T 2000 4000 6000 8000 10000 12
81. NMOD procedure 2433 working correlation matrix GENMOD procedure 2504 2506 2532 zero inflated models GENMOD 2530 zero inflated negative binomial distribution GENMOD 2513 zero inflated Poisson distribution GENMOD 2513 Syntax Index ABSFCONV option MODEL statement GENMOD 2484 AGGREGATE option MODEL statement GENMOD 2491 ALPHA option ESTIMATE statement GENMOD 2481 EXACT statement GENMOD 2482 MODEL statement GENMOD 2492 ALPHAINIT option REPEATED statement GENMOD 2503 ASSESS statement GENMOD procedure 2462 BAYES statement GENMOD procedure 2463 BY statement GENMOD procedure 2473 CHECKDEPENDENCY option STRATA statement GENMOD 2508 CICONV option MODEL statement GENMOD 2492 CL option MODEL statement GENMOD 2492 CLASS statement GENMOD procedure 2474 CLTYPE option EXACT statement GENMOD 2482 CODING option MODEL statement GENMOD 2492 CONTRAST statement GENMOD procedure 2477 CONVERGE option MODEL statement GENMOD 2492 REPEATED statement GENMOD 2503 CONVHE option MODEL statement GENMOD 2492 CORR option REPEATED statement GENMOD 2506 CORRB option MODEL statement GENMOD 2492 REPEATED statement GENMOD 2504 CORRW option REPEATED statement GENMOD 2504 COVB option MODEL statement GENMOD 2492 REPEATED statement GENMOD 2504 CPREFIX option CLASS statement GENMOD 2474 DATA option PROC GENMOD statement 2457 DES
82. OK A SL ee aba ce 2487 FWDLINK Statement 2 03044 4s ee AG Sow eR A wa ee eS 2487 INVEINK Statement nonse 4 2 4 eee ee ao OR ee ee ee ee a 2488 LSMEANS Statement 2 0 0 0 eee ee eee 2488 LSMESTIMATE Statement 2 2 ee ee ee 2489 MODEL Statement 6 i006 2 248 wed Sa we Be Be eS Be ee E 2491 OUTPUT Statement o y age ea YoY Bee ee ee oS 2499 Programming Statements 000 000004 2502 REPEATED Statement 2 ee ee E ee a 2503 SEICE Statement 408 goa dee a a aa ch we Ae a SS hoe ce ee a 2507 STORE Statement s 24 6 a cea bed Ha eS ew OER 4 Ee wa Roe es 2507 STRATA Sfatemient 605 64 4 6 Ae ee ee SI ER ae we eee ee 2507 VARIANCE Statement oos ece ea e 02 0002 eee eee eee 2509 WEIGHT Statement 2 0 02 002 ee eee eee 2509 2428 Chapter 37 The GENMOD Procedure ZEROMODEL Statement gt lt oco ocne era mae e 2510 Details GENMOD Procedure ooa 2 eee ee 2510 Generalized Linear Models Theory oaoa a 2510 Specification f BMects srac bob we e Ae aa ade Be eee ed 2522 Parameterization Used in PROC GENMOD 2523 Pypel Amalysis 244 2 2B eS EWE BRENS Be ES Se S 2523 Types Amalysig s cee 054A 4084446024 Ge 64 E Aw BEE 2524 Confidence Intervals for Parameters os o c erore 2525 PF SWUSUCS oe vw eb ewe ee baa i wa eee paw de ee hs 2526 Lagrange Multiplier Statistics s sco oso e 44 4s684 5 65 SER SOE ES 2527 Predicted Values of the Mean soe see ee n ea ee ee 2527 Residual
83. Residuals and other diagnostic statistics are not available for the multinomial distribution The estimated linear predictor its standard error estimate and the predicted values and their con fidence intervals are computed for all observations in which the explanatory variables are all non missing even if the response is missing By adding observations with missing response values to the input data set you can compute these statistics for new observations or for settings of the explanatory variables not present in the data without affecting the model fit The following list explains specifications in the OUTPUT statement OUT SAS data set specifies the output data set If you omit the OUT option the output data set is created and given a default name that uses the DATAn convention keyword name specifies the statistics to be included in the output data set and names the new variables that contain the statistics Specify a keyword for each desired statistic see the following list of 2500 Chapter 37 The GENMOD Procedure keywords an equal sign and the name of the new variable or variables to contain the statistic You can list only one variable after the equal sign for all the statistics except for the case deletion diagnostics for individual parameter estimates DFBETA DFBETAS DFBETAC and DFBETACS You can list variables enclosed in parentheses to correspond to the variables in the model or you can specify the keyword _all_
84. S statement the ADJUST STEPDOWN and LINES options are ignored The PLOTS option is not available for a maximum likelihood analysis it is available only for a Bayesian analysis If you specify a zero inflated model that is a model for either the zero inflated Poisson or the zero inflated negative binomial distribution then the least squares means are computed only for effects in the model for the distribution mean and not for effects in the zero inflation probability part of the model LSMESTIMATE Statement 2489 Table 37 3 Important LSMEANS Statement Options Option Description Construction and Computation of LS Means AT Modifies the covariate value in computing LS means BYLEVEL Computes separate margins DIFF Requests differences of LS means OM Specifies the weighting scheme for LS means computation as de termined by the input data set SINGULAR Tunes estimability checking Degrees of Freedom and p values ADJUST Determines the method for multiple comparison adjustment of LS means differences ALPHA a Determines the confidence level 1 STEPDOWN Adjusts multiple comparison p values further in a step down fashion Statistical Output CL Constructs confidence limits for means and mean differences CORR Displays the correlation matrix of LS means COV Displays the covariance matrix of LS means E Prints the L matrix LINES Produces a Lines display for pairwise LS means differences MEANS Prints the LS m
85. Trivedi 1998 for more information about zero inflated models The population is considered to consist of two types of individuals The first type gives Poisson or negative binomial distributed counts which might contain zeros The second type always gives a zero count Let be Zero Inflated Models 2531 the underlying distribution mean and w be the probability of an individual being of the second type The parameter w is called here the zero inflation probability and is the probability of zero counts in excess of the frequency predicted by the underlying distribution You can request that the zero inflation probability be displayed in an output data set with the PZERO keyword The probability distribution of a zero inflated Poisson random variable Y is given by o l ow e fory 0 oZ for y 1 2 Pr Y y and the probability distribution of a zero inflated negative binomial random variable Y is given by P Y wo l w 1 ka for y 0 Mr y TQ 1 k ku rytnra7e Greate fory 1 2 where k is the negative binomial dispersion parameter You can model the parameters w and A in GENMOD with the regression models h a zy gi x B II where A is one of the binary link functions logit probit or complementary log log The link function h is the logit link by default or the link function option specified in the ZEROMODEL statement The link function g is the log link function by default or the
86. Z Intercept 0 9266 0 4513 1 8111 0 0421 2 05 0 0400 treatment A 1 2611 0 3406 0 5934 1 9287 3 70 0 0002 center 2 0 6287 0 3486 0 0545 1 3119 1 80 0 0713 sex F 0 1024 0 4362 0 7526 0 9575 0 23 0 8144 age 0 0162 0 0125 0 0407 0 0084 1 29 0 1977 baseline 1 1 8980 0 3404 1 2308 2 5652 5 58 lt 0001 Alphal 1 6109 0 4892 0 6522 2 5696 3 29 0 0010 Alpha2 1 0771 0 4834 0 1297 2 0246 2 23 0 0259 Alpha3 1 5875 0 4735 0 6594 2 5155 3 35 0 0008 Alpha4 2 1224 0 5022 1 1381 3 1068 4 23 lt 0001 Alpha5 1 8818 0 4686 0 9634 2 8001 4 02 lt 0001 Alpha6 2 1046 0 4949 1 1347 3 0745 4 25 lt 0001 Output 37 6 2 Model Fit Criteria GEE Fit Criteria QIc 511 8589 QICu 499 6516 Example 37 6 Log Odds Ratios and the ALR Algorithm 2591 You can fit the same model by fully specifying the z matrix The following statements create a data set containing the full z matrix data zin keep id center z1 z6 yl y2 array zin 6 z1 z6 set resp by center id if first id then do t 0 do m 1 to 4 do n m 1 to 4 do j 1 to 6 zin j 0 end yl m y2 n t 1 zin t 1 output end end end run proc print data zin obs 12 Output 37 6 3 displays the full z matrix for the first two clusters The z matrix is identical for all clusters in this example Output 37 6 3 Full z Matrix Data Set Obs z1 z2 z3 z4 z5 z6 center id y1 y2 1 1 0 0 0 0 0 1 1 1 2 2 0 1 0 0 0 0 1 1 1 3 3 0 0 1 0 0 0 1 1 1 4 4 0 0 0 1 0 0 1
87. a Equal Tail Interval HPD Interval Intercept 0 050 908 7 551 0 906 2 549 2 Logx1 0 050 92 4773 251 6 94 2813 253 0 x2 0 050 3 1062 5 4839 3 1747 5 5328 x3 0 050 2 9812 5 1041 2 9532 5 0612 x4 0 050 7 2646 43 6506 5 9839 44 6427 Dispersion 0 050 2569 0 5548 5 2389 4 5308 8 Figure 37 16 Posterior Sample Correlation Matrix Posterior Correlation Matrix Parameter Intercept Logx1 x2 x3 x4 Dispersion Intercept 1 000 0 856 0 580 0 712 0 579 0 002 Logx1 0 856 1 000 0 285 0 490 0 636 0 009 x2 0 580 0 285 1 000 0 302 0 492 0 007 x3 0 712 0 490 0 302 1 000 0 616 0 004 x4 0 579 0 636 0 492 0 616 1 000 0 002 Dispersion 0 002 0 009 0 007 0 004 0 002 1 000 Since noninformative prior distributions were used the posterior sample means standard deviations and interval statistics shown in Figure 37 13 and Figure 37 14 are consistent with the maximum likelihood estimates shown in Figure 37 9 By default PROC GENMOD computes three convergence diagnostics the lag1 lag5 lag10 and lag50 autocorrelations Figure 37 17 Geweke diagnostic statistics Figure 37 18 and effective sample sizes Figure 37 19 There is no indication that the Markov chain has not converged 2446 Chapter 37 The GENMOD Procedure See the section Assessing Markov Chain Convergence on page 155 for more information about convergence diagnostics and their interpretation Figure 37 17 Posterior Sample Autocorrelations The GE
88. a status line in the SAS log after every number of Monte Carlo samples when the METHOD NETWORKMC option is specified The number of samples taken and the current exact p value for testing the significance of the model are displayed You can use this status line to track the progress of the computation of the exact conditional distributions FREQ Statement 2487 STATUSTIME seconds specifies the time interval in seconds for printing a status line in the LOG You can use this status line to track the progress of the computation of the exact conditional distributions The time interval you specify is approximate the actual time interval varies By default no status reports are produced XCONV value specifies the relative parameter convergence criterion Convergence requires a small relative parameter change in subsequent iterations max 5 lt value j where o EE E 197I lt 00 D g Be 5 otherwise P and pP is the estimate of the jth parameter at iteration i By default XCONV 1E 4 You can also specify the ABSFCONV and FCONV criteria if more than one criterion is specified then optimizations are terminated as soon as one criterion is satisfied FREQ Statement FREQ variable FREQUENCY variable The variable in the FREQ statement identifies a variable in the input data set containing the frequency of occurrence of each observation PROC GENMOD treats each observation as if it appears n times where n is the valu
89. aluated at u e the standardized Pearson residual e deviance residual defined as the square root of the deviance contribution for the observa tion with sign equal to the sign of the raw residual e the standardized deviance residual e the likelihood residual e a Cook distance type statistic for assessing the influence of individual observations on overall model fit e observation leverage e DFBETA defined as an approximation to B Bii for each parameter estimate B where Aj is the parameter estimate with the ith observation deleted e standardized DFBETA defined as DFBETA normalized by its standard deviation e zero inflation probability for zero inflated models e the mean of a zero inflated response The following additional cluster deletion diagnostic statistics are created and displayed for each cluster if a REPEATED statement is specified e a Cook distance type statistic for assessing the influence of entire clusters on overall model fit e astudentized Cook distance for assessing influence of clusters e cluster leverage e cluster DFBETA for assessing the influence of entire clusters on individual parameter estimates e cluster DFBETA normalized by its standard deviation If you specify the multinomial distribution only regression variable values response values predicted values confidence limits for the predicted values and the linear predictor are displayed in the table Residuals and other diagnostic statis
90. alue of the response variable for the multinomial model with ordinal data If there is an offset it is included in x The keywords in the following list apply only to models specified with a REPEATED statement fit by generalized estimating equations GEEs CH CLUSTERH CLEVERAGE represents the leverage of a cluster CLUSTER represents the numerical cluster index in order of sorted clusters DCLS CLUSTERCOOKD CLUSTERCOOKSD represents the Cook distance type statis tic to measure the influence of deleting an entire cluster on the overall model fit DFBETAC DBETAC represents the effect of deleting an entire cluster on parameter es timates If you specify the keyword _all_ after the equal sign variables named DFBETAC_ParameterName will be included in the output data set to contain the values of the diagnostic statistic to measure the influence of deleting the cluster on the individual parameter estimates Parameter Name is the name of the regression model parameter formed from the input variable names concatenated with the appropriate levels if classification variables are involved DFBETACS DBETACS represents the effect of deleting an entire cluster on normalized parameter estimates If you specify the keyword _all_ after the equal sign variables named DFBETACS_ParameterName will be included in the 2502 Chapter 37 The GENMOD Procedure output data set to contain the values of the diagnostic statistic to measure the in
91. alue of zero for both the parameter estimate and its standard error This table includes a row for a scale parameter even though there is no free scale parameter in the Poisson distribution See the section Response Probability Distributions on page 2510 for the form of the Poisson probability distribution PROC GENMOD allows the specification of a scale parameter to fit overdispersed Poisson and binomial distributions In such cases the SCALE row indicates the value of the overdispersion scale parameter used in adjusting output statistics See the section Overdispersion on page 2521 for more about overdispersion and the meaning of the SCALE parameter output by the GENMOD procedure PROC GENMOD displays a note indicating that the scale parameter is fixed that is not estimated by the iterative fitting process Poisson Regression 2439 It is usually of interest to assess the importance of the main effects in the model Type 1 and Type 3 analyses generate statistical tests for the significance of these effects You can request these analyses with the TYPE1 and TYPE3 options in the MODEL statement as follows proc genmod data insure class car age model c car age dist poisson link log offset ln typel type3 run The results of these analyses are summarized in the figures that follow Figure 37 5 Type 1 Analysis The GENMOD Procedure LR Statistics For Type 1 Analysis Chi Source Deviance DF Square Pr g
92. ames concatenated with the appropriate levels if classification variables are involved DFBETAS DBETAS represents the effect of deleting an observation on standardized pa rameter estimates If you specify the keyword _all_ after the equal sign variables named DFBETAS_ParameterName will be included in the output data set to contain the values of the diagnostic statistic to measure the influ ence of deleting a single observation on the individual parameter estimates ParameterName is the name of the regression model parameter formed from the input variable names concatenated with the appropriate levels if classification variables are involved DOBS COOKD COOKSD represents the Cook distance type statistic to measure the influence of deleting a single observation on the overall model fit HESSWGT represents the diagonal element of the weight matrix used in computing the Hessian matrix H LEVERAGE represents the leverage of a single observation LOWER L represents the lower confidence limit for the predicted value of the mean or the lower confidence limit for the probability that the response is less than or equal to the value of Level or Value The confidence coefficient is determined by the ALPHA number option in the MODEL statement as 1 number x 100 The default confidence coefficient is 95 PREDICTED PRED PROB IP_ represents the predicted value of the mean of the response or the predicted probability that the response va
93. ameter DF Estimate Error Limits Chi Square Pr gt ChiSq Intercept 1 6 1391 0 0775 5 9904 6 2956 6268 10 lt 0001 Scale 1 0 8274 0 0714 0 6959 0 9762 NOTE The scale parameter was estimated by maximum likelihood The intercept is the estimated log mean of the fitted gamma distribution so that the mean life of the parts is u exp INTERCEPT exp 6 1391 463 64 The SCALE parameter used in PROC GENMOD is the inverse of the gamma dispersion parameter and it is sometimes called the gamma index parameter See the section Response Probability Distributions on page 2510 for the definition of the gamma probability density function A value of 1 for the index parameter corresponds to the exponential distribution The estimated value of the scale parameter is 0 8274 The 95 profile likelihood confidence interval for the scale parameter is 0 6959 0 9762 which does not contain 1 The hypothesis of an exponential distribution for the data is therefore rejected at the 0 05 level A confidence interval for the mean life is exp 5 99 exp 6 30 399 57 542 18 Example 37 4 Ordinal Model for Multinomial Data This example illustrates how you can use the GENMOD procedure to fit a model to data measured on an ordinal scale The following statements create a SAS data set called Icecream The data set contains the results of a hypothetical taste test of three brands of ice cream The three brands are rated for taste on a five p
94. ameter is defined as the DFBETA statistic for the jth parameter divided by its estimated standard deviation where the standard deviation is estimated from all the data DFBETAS DFBETA DOBS COOKD COOKSD In normal linear regression the influence of observation i can be measured by Cook s distance Cook and Weisberg 1982 A measure of influence of observation i for generalized linear models that is equivalent to Cook s distance for normal linear regression is given by DOBS p h 1 hi r3 where h is the leverage defined in the section H LEVERAGE on page 2545 This measure is the one step approximation to 2p L B L B ti where L is the log likelihood evaluated at H LEVERAGE The Fisher scores or expected weight for observation i is Wei VGN Ga Let W be the diagonal matrix with we as the ith diagonal The leverage h of the ith observation is defined as the ith diagonal element of the hat matrix H W2 X WX Ww Diagnostics for Models Fit by Generalized Estimating Equations GEEs The diagnostic statistics in this section were developed by Preisser and Qaqish 1996 See the section Generalized Estimating Equations on page 2532 for further information and notation for generalized estimating equations GEEs The following additional notation is used in this section 2546 Chapter 37 The GENMOD Procedure Partition the design matrix X and response vector Y by clu
95. ameters 0 depend only on the means of the response j4 which are related to the regression parameters f through the link function g j4 x B The additional parameter is the dispersion parameter f y exp c y The GENMOD procedure estimates the regression parameters and the scale parameter o p2 by maximum likelihood However the GENMOD procedure can also provide Bayesian estimates of the regression parameters and either the scale o the dispersion or the precision t by sampling from the posterior distribution Except where noted the following discussion applies to either o or T although is used to illustrate the formulas Note that the Poisson and binomial distributions do not have a dispersion parameter and the dispersion is considered to be fixed at o 1 The ASSESS CONTRAST ESTIMATE OUTPUT and REPEATED statements if specified are ignored Also ignored are the PLOTS option in the PROC GENMOD statement and the following options in the MODEL statement ALPHA CORRB COVB TYPE1 TYPE3 SCALE DEVIANCE DSCALE SCALE PEARSON PSCALE OBSTATS RESIDUALS X VARS PREDICTED DIAGNOSTICS and SCALE for Poisson and binomial distributions The multinomial and zero inflated Poisson distributions are not available for Bayesian analysis See the section Assessing Markov Chain Convergence on page 155 for information about assessing the convergence of the chain of posterior samples Several algorithms specif
96. an trials with binomial distribution Class Level Information If you use classification variables in the model PROC GENMOD displays the levels of classification variables specified in the CLASS statement and in the MODEL statement The levels are displayed in the same sorted order used to generate columns in the design matrix Response Profile If you specify an ordinal model for the multinomial distribution a table titled Response Profile is displayed containing the ordered values of the response variable and the number of occurrences of the values used in the model Displayed Output for Classical Analysis 2557 Iteration History for Parameter Estimates If you specify the ITPRINT model option PROC GENMOD displays a table containing the following for each iteration in the Newton Raphson procedure for model fitting the iteration number the ridge value the log likelihood and values of all parameters in the model Criteria for Assessing Goodness of Fit In the Criteria for Assessing Goodness of Fit table PROC GENMOD displays the degrees of freedom for deviance and Pearson s chi square equal to the number of observations minus the number of regression parameters estimated the deviance the deviance divided by degrees of freedom the scaled deviance the scaled deviance divided by degrees of freedom Pearson s chi square Pearson s chi square divided by degrees of freedom the scaled Pearson s chi square the scale
97. any of the full rank PARAM CLASS variable options all parameters are directly estimable and rows of L are not checked for estimability If an effect is not specified in the CONTRAST statement all of its coefficients in the L matrix are set to 0 If too many values are specified for an effect the extra ones are ignored If too few values are specified the remaining ones are set to 0 PROC GENMOD handles missing level combinations of classification variables in the same manner as the GLM and MIXED procedures Parameters corresponding to missing level combinations are not included in the model This convention can affect the way in which you specify the L matrix in your CONTRAST statement If you specify the WALD option the test of hypothesis is based on a Wald chi square statistic If you omit the WALD option the test statistic computed depends on whether an ordinary generalized linear model or a GEE type model is specified For an ordinary generalized linear model the CONTRAST statement computes the likelihood ratio statistic This is defined to be twice the difference between the log likelihood of the model unconstrained by the contrast and the log likelihood with the model fitted under the constraint that the linear function of the parameters defined by the contrast is equal to 0 A p value is computed based on the asymptotic chi square distribution of the chi square statistic DEVIANCE Statement 2479 If you specify a GEE model
98. arameters requested with the LRCI option Wald confidence intervals are displayed by default Profile likelihood confidence intervals are considered to be more accurate than Wald intervals see Aitkin et al 1989 especially with small sample sizes You can specify the confidence coefficient with the ALPHA option in the MODEL statement The default value of 0 05 corresponding to 95 confidence limits is used here See the section Confidence Intervals for Parameters on page 2525 for a discussion of profile likelihood confidence intervals Example 37 2 Normal Regression Log Link Consider the following data where x is an explanatory variable and y is the response variable It appears that y varies nonlinearly with x and that the variance is approximately constant A normal distribution with a log link function is chosen to model these data that is log u x B so that pi exp x B Example 37 2 Normal Regression Log Link 2577 data nor input x y datalines NONU or or e oO WWNHNHR Ee DO amp be OB W uno BWWWNHDNDRPRP RP OOO w me o ea a run The following SAS statements produce the analysis with the normal distribution and log link proc genmod data nor model y x dist normal link log output out Residuals pred Pred resraw Resraw reschi Reschi resdev Resdev stdreschi Stdreschi stdresdev Stdresdev reslik Reslik run The OUTPUT statement is specified to
99. arately The plot requests include the following ALL specifies all types of plots PLOTS ALL is equivalent to specifying PLOTS TRACE AUTOCORR DENSITY BAYES Statement 2471 AUTOCORR displays the autocorrelation function plots for the parameters DENSITY displays the kernel density plots for the parameters NONE suppresses all diagnostic plots TRACE displays the trace plots for the parameters See the section Visual Analysis via Trace Plots on page 155 for details SAMPLING option specifies an algorithm used to sample the posterior distribution The fololowing options are available ARMS GIBBS use the ARMS algorithm This is the default method except for the normal distribution with a conjugate prior In this case a closed form for the posterior distribution is available and samples are obtained directly from the posterior distribution GAMERMAN GAM use the Gamerman algorithm IM Use the independent Metropolis algorithm SCALEPRIOR GAMMA lt options gt IMPROPER SPRIOR GAMMA lt options gt IMPROPER specifies that Gibbs sampling be performed on the generalized linear model scale parameter and the prior distribution for the scale parameter if there is a scale parameter in the model For models that do not have a scale parameter the Poisson and binomial this option is ignored Note that you can specify Gibbs sampling on either the dispersion parameter the scale parameter o 2 or the p
100. are explanatory variables in this experiment The number of responses r is modeled as a binomial random variable for each combination of the explanatory variable values with the binomial number of trials parameter equal to the number of subjects n and the binomial probability equal to the probability of a response The following DATA step creates the data set data drug input drug x rn datalines A 1 1 10 A 23 2 12 A 67 1 9 B 2 3 13 B 3 4 15 B 45 5 16 B 78 5 13 C 04 O 10 C 15 O 11 C 56 1 12 cC 7 2 12 D 34 5 10 D 6 5 9 D 7 8 10 E 2 12 20 E 34 15 20 E 56 13 15 E 8 17 20 run A logistic regression for these data is a generalized linear model with response equal to the binomial proportion r n The probability distribution is binomial and the link function is logit For these data drug and x are explanatory variables The probit and the complementary log log link functions are also appropriate for binomial data PROC GENMOD performs a logistic regression on the data in the following SAS statements proc genmod data drug class drug model r n x drug dist bin link logit lrci run Since these data are binomial you use the events trials syntax to specify the response in the MODEL statement Profile likelihood confidence intervals for the regression parameters are computed using the LRCI option Example 37 1 Logistic Regression 2575 General model and data information is produced in Output 37 1 1
101. as if they were reference or dummy coded The REFERENCE or GLM parameterization might be more appropriate for such problems CONTRAST Statement CONTRAST abel contrast specification lt options gt The CONTRAST statement provides a means of obtaining a test of a specified hypothesis concerning the model parameters This is accomplished by specifying a matrix L for testing the hypothesis L B 0 You must be familiar with the details of the model parameterization that PROC GENMOD uses For more information see the section Parameterization Used in PROC GENMOD on page 2523 and the section CLASS Statement on page 2474 Computed statistics are based on the asymptotic chi square distribution of the likelihood ratio statistic or the generalized score statistic for GEE models with degrees of freedom determined by the number of linearly independent rows in the L matrix You can request Wald chi square statistics with the Wald option in the CONTRAST statement There is no limit to the number of CONTRAST statements that you can specify but they must appear after the MODEL statement and after the ZEROMODEL statement for zero inflated models Statistics for multiple CONTRAST statements are displayed in a single table The elements of the CONTRAST statement are as follows label identifies the contrast on the output A label is required for every contrast specified Labels can be up to 20 characters and must be enclosed in s
102. ation and density function panel Trace and density panel Trace panel Trace plot Zero inflation probabili ties ODS Graphics 2573 Statement Option PROC MODEL RE PEATED PROC PROC PROC PROC PROC PROC PROC PROC PROC PROC PROC PROC PROC PROC BAYES BAYES BAYES BAYES BAYES PROC PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS PLOTS TRACE AUTOCORR Default PLOTS TRACE DENSITY PLOTS TRACE PLOTS UNPACK TRACE PLOTS 2574 Chapter 37 The GENMOD Procedure Examples GENMOD Procedure The following examples illustrate some of the capabilities of the GENMOD procedure These are not intended to represent definitive analyses of the data sets presented here You should refer to the texts cited in the references for guidance on complete analysis of data by using generalized linear models Example 37 1 Logistic Regression In an experiment comparing the effects of five different drugs each drug is tested on a number of different subjects The outcome of each experiment is the presence or absence of a positive response in a subject The following artificial data represent the number of responses r in the n subjects for the five different drugs labeled A through E The response is measured for different levels of a continuous covariate x for each drug The drug type and the continuous covariate x
103. ation accuracy is the standard error of the posterior mean estimate and is calculated as the posterior standard deviation divided by the square root of the effective sample size See the section Standard Error of the Mean Estimate on page 169 for details RAFTERY lt raftery options gt computes the Raftery and Lewis diagnostics that evaluate the accuracy of the estimated quantile 60 for a given Q e 0 1 of a chain bo can achieve any degree of accuracy when the chain is allowed to run for a long time A stopping criterion is when the BAYES Statement 2467 estimated probability Po Pr 0 lt 60 reaches within R of the value Q with probability S that is Pr Q R lt Po lt Q R S The following raftery options enable you to specify Q R S and a precision level e for the test QUANTILE Q value specifies the order a value between 0 and 1 of the quantile of interest The default is 0 025 ACCURACY R value specifies a small positive number as the margin of error for measuring the accuracy of estimation of the quantile The default is 0 005 PROBABILITY S value specifies the probability of attaining the accuracy of the estimation of the quantile The default is 0 95 EPSILON EPS value specifies the tolerance level a small positive number for the stationary test The default is 0 001 See the section Raftery and Lewis Diagnostics on page 165 for details DISPERSIONPRIOR GAMMA lt g optio
104. ays the degrees of freedom for the parameter the estimate value the standard error the Wald chi square value the p value based on the chi square distribution and the confidence limits Wald or profile likelihood for parameters Lagrange Multiplier Statistics If you specify that either the model intercept or the scale parameter is fixed for those distributions that have a distribution scale parameter the GENMOD procedure displays a table of Lagrange multiplier or score statistics for testing the validity of the constrained parameter that contains the test statistic and the p value Estimated Covariance Matrix If you specify the model option COVB the GENMOD procedure displays the estimated covariance matrix defined as the inverse of the information matrix at the final iteration This is based on the expected information matrix if the EXPECTED option is specified in the MODEL statement Otherwise it is based on the Hessian matrix used at the final iteration This is by default the observed Hessian unless altered by the SCORING option in the MODEL statement Estimated Correlation Matrix If you specify the CORRB model option PROC GENMOD displays the estimated correlation matrix This is based on the expected information matrix if the EXPECTED option is specified in the MODEL statement Otherwise it is based on the Hessian matrix used at the final iteration This is by default the observed Hessian unless altered by the SCORING optio
105. bability since wi Pr Vi 1 max 0 Mij Mik 1 lt Pr Vij 1 Yig 1 lt min yij Mik The correlation therefore is constrained to be within limits that depend in a complicated way on the means of the data The odds ratio defined as Pr Yy 1 Yik 1 Pr iy 0 Yin 0 Pr Yi 1 Yik 0 Pr Vij 0 Yik 1 OR Yi Yik is not constrained by the means and is preferred in some cases to correlations for binary data The ALR algorithm seeks to model the logarithm of the odds ratio y log OR Vi Yix as Vijk Z ike where is a q x 1 vector of regression parameters and 2 is a fixed specified vector of coefficients The parameter y can take any value in oo 00 with y 0 corresponding to no association The log odds ratio when modeled in this way with a regression model can take different values in subgroups defined by z For example z can define subgroups within clusters or it can define block effects between clusters You specify a GEE model for binary data that uses log odds ratios by specifying a model for the mean as in ordinary GEEs and a model for the log odds ratios You can use any of the link functions appropriate for binary data in the model for the mean such as logistic probit or complementary log log The ALR algorithm alternates between a GEE step to update the model for the mean and a logistic regression step to update the log odds ratio model Upon c
106. be performed This consists of sequentially fitting models beginning with the null intercept term only model and continuing up to the model specified in the MODEL statement The likelihood ratio statistic between each successive pair of models is computed and displayed in a table A Type 1 analysis is not available for GEE models since there is no associated likelihood requests that statistics for Type 3 contrasts be computed for each effect specified in the MODEL statement The default analysis is to compute likelihood ratio statistics for the contrasts or score statistics for GEEs Wald statistics are computed if the WALD option is also specified requests Wald statistics for Type 3 contrasts You must also specify the TYPE3 option in order to compute Type 3 Wald statistics WALDCI requests that two sided Wald confidence intervals for all model parameters be computed based on the asymptotic normality of the parameter estimators This computation is not as time consuming as the LRCI method since it does not involve an iterative procedure However it is thought to be less accurate especially for small sample sizes The confidence coefficient can be selected with the ALPHA option in the same way as for the LRCI option OUTPUT Statement 2499 XVARS requests that the regression variables be included in the OBSTATS table OUTPUT Statement OUTPUT lt OUT SAS data set gt lt keyword name keyword name gt The OUTPUT sta
107. category probability Pearson s chi square statistic is defined as 2 wi yi bi Oe Vp and the scaled Pearson s chi square is X7 The scaled version of both of these statistics under certain regularity conditions has a limiting chi square distribution with degrees of freedom equal to the number of observations minus the number of parameters estimated The scaled version can be used as an approximate guide to the goodness of fit of a given model Use caution before applying these statistics to ensure that all the conditions for the asymptotic distributions hold McCullagh and Nelder 1989 advise that differences in deviances for nested models can be better approximated by chi square distributions than the deviances can themselves In cases where the dispersion parameter is not known an estimate can be used to obtain an approxi mation to the scaled deviance and Pearson s chi square statistic One strategy is to fit a model that contains a sufficient number of parameters so that all systematic variation is removed estimate from this model and then use this estimate in computing the scaled deviance of submodels The deviance or Pearson s chi square divided by its degrees of freedom is sometimes used as an estimate of the dispersion parameter For example since the limiting chi square distribution of the scaled deviance D D has n p degrees of freedom where n is the number of observations and p is the number of
108. cates USA registration Other brand and product names are registered trademarks or trademarks of their respective companies Chapter 37 The GENMOD Procedure Contents Overview GENMOD Procedure 2 2 2 22 ee ee 2429 What Is a Generalized Linear Model 200 2430 Examples of Generalized Linear Models 2431 The GENMOD Procedure 02 002022 eee eee 2432 Getting Started GENMOD Procedure 00000004 2435 Poisson Regression lt e450 Wea be hb ee an Seed a PRA Oe 2435 Bayesian Analysis of a Linear Regression Model 2440 Generalized Estimating Equations 0 0 2453 Syntax GENMOD Procedure secs e yh ee es Se Oreo eae eee a 2456 PROC GENMOD Statements i es 26 544 4088 2044 6 cw bee wea 2457 ASSESS Statement s a pea 425 464506 08045 654454 464 445 2462 BAYES Statement lt pip os de ek Soap ee ab a ee Ae eS BOA 2463 BY Statement eade cb deed w a ween ad ewd waa d dow ees 2473 CLASS Statement ie d 82 amp Baka ww ewe RE gw ww Be 2474 CONTRAST Statement o s oe toaca ae bw ae e eae ba ee 2477 DEVIANCE Statement i so ea ba dasa ee ee ee RE Eee eS 2479 ERFECTPLOT Statement poi ooe 4 sed 4G amp Bae Be ee Hae ae i 2480 ESTIMATE Statement 054 5 s Aw coe we 5 ee Bee ee Re ee 2481 EXACT Statement p44 eap Se Sth Ge OO eg Ow ew ee a 2482 EXACTOPTIONS Statement 2 2 02 20200 2484 PREO Statement ts 6 7 5 1 ht Soe A Abaca A ee et ee
109. credible intervals correlation matrix and convergence diagnostics autocorrelations Gelman Rubin Geweke Raftery Lewis and Heidelberger and Welch tests the effective sample size and Monte Carlo standard errors are computed for each parameter as well as the correlation matrix and the covariance matrix of the posterior sample Trace plots posterior density plots and autocorrelation function plots that are created using ODS Graphics are also provided for each parameter The GENMOD procedure enables you to perform exact logistic regression also called exact condi tional binary logistic regression and exact Poisson regression also called exact conditional Poisson regression by specifying one or more EXACT statements You can test individual parameters or conduct a joint test for several parameters The procedure computes two exact tests the exact conditional score test and the exact conditional probability test You can request exact estimation of specific parameters and corresponding odds ratios where appropriate Point estimates standard errors and confidence intervals are provided The GENMOD procedure now uses ODS Graphics to create graphs as part of its output For general information about ODS Graphics see Chapter 21 Statistical Graphics Using ODS What Is a Generalized Linear Model A traditional linear model is of the form Yi x B i where y is the response variable for the ith observation The quantity x is a c
110. ct results you can obtain an estimate for the intercept parameter by specifying the INTERCEPT keyword in the EXACT statement You should also remove the JOINT option to reduce the amount of time and memory consumed References Agresti A 2002 Categorical Data Analysis Second Edition New York John Wiley amp Sons Aitkin M Anderson D Francis B and Hinde J 1989 Statistical Modelling in GLIM Oxford Oxford Science Publications Akaike H 1979 A Bayesian Extension of the Minimum AIC Procedure of Autoregressive Model Fitting Biometrika 66 237 242 Akaike H 1981 Likelihood of a Model and Information Criteria Journal of Econometrics 16 3 14 Boos D 1992 On Generalized Score Tests The American Statistician 46 327 333 Cameron A C and Trivedi P K 1998 Regression Analysis of Count Data Cambridge Cambridge University Press Carey V Zeger S L and Diggle P 1993 Modelling Multivariate Binary Data with Alternating Logistic Regressions Biometrika 80 517 526 Collett D 2003 Modelling Binary Data Second Edition London Chapman amp Hall Cook R D and Weisberg S 1982 Residuals and Influence in Regression New York Chapman amp Hall Cox D R and Snell E J 1989 The Analysis of Binary Data Second Edition London Chapman amp Hall Davison A C and Snell E J 1991 Residuals and Diagnostics in D V Hinkley N Reid
111. curves were created from simple forms of model misspecification by using simulated data The mean models of the data and the fitted model are shown in Table 37 15 2602 Chapter 37 The GENMOD Procedure Output 37 8 6 Typical Cumulative Residual Patterns Covariate Misspecification a Data log X Model X b Data X X Model X 10 400 E 5 5 g 200 o X gs i amp a E 200 5 10 j 400 c Data X X X Model X X d Data I X gt 5 Model X 750 T 5 3 500 B 0 yw 290 5 0 5 10 E 250 6 15 500 20 2 4 6 8 10 2 4 6 8 10 x x Table 37 15 Model Misspecifications Plot Data E Y Fitted Model E Y a log X X b X 4X X X X X X xX A I X gt 5 X The observed cumulative residual pattern in Output 37 8 3 and Output 37 8 4 most resembles the behavior of the curve in plot a of Output 37 8 6 indicating that log X1 might be a more appropriate term in the model than X1 Example 37 8 Model Assessment of Multiple Regression Using Aggregates of Residuals 2603 The following SAS statements fit a model with LogX1 in place of X1 and request a model assessment proc genmod data Surg model Y LogX1 X2 X3 scale Pearson assess var LogX1 resample 10000 seed 603708000 run ods graphics off The revised model fit is shown in Output 37 8 7 the p value from the simulation is 0 4777 and the cumulative residuals plotted in Output 37 8 8 show no systematic trend The log transformation
112. d Pearson s chi square divided by degrees of freedom the log likelihood excludes factorial terms the full log likelihood the Akaike information criterion the corrected Akaike information criterion and the Bayesian information criterion The information in this table is valid only for maximum likelihood model fitting and the table is not printed if the REPEATED statement is specified Last Evaluation of the Gradient If you specify the model option ITPRINT the GENMOD procedure displays the last evaluation of the gradient vector Last Evaluation of the Hessian If you specify the model option ITPRINT the GENMOD procedure displays the last evaluation of the Hessian matrix Analysis of Initial Parameter Estimates The Analysis of Initial Parameter Estimates table contains the results from fitting a generalized linear model to the data If you specify the REPEATED statement these GLM parameter estimates are used as initial values for the GEE solution and are displayed only if the PRINTMLE option in the REPEATED statement is specified For each parameter in the model PROC GENMOD displays the parameter name as follows e the variable name for continuous regression variables e the variable name and level for classification variables and interactions involving classification variables e SCALE for the scale variable related to the dispersion parameter 2558 Chapter 37 The GENMOD Procedure In addition PROC GENMOD displ
113. dds ratio parameter for each unique pair of responses within clusters and all clusters are parameterized identically The following statements fit the same regression model for the mean as in Example 37 5 but use a regression model for the log odds ratios instead of a working correlation The LOGOR FULLCLUST option specifies a fully parameterized log odds ratio model proc genmod data resp descend class id treatment ref P center ref 1 sex ref M baseline ref 0 param ref model outcome treatment center sex age baseline dist bin repeated subject id center logor fullclust run 2590 Chapter 37 The GENMOD Procedure The results of fitting the model are displayed in Output 37 6 1 along with a table that shows the correspondence between the log odds ratio parameters and the within cluster pairs Model goodness of fit criteria are shown in Output 37 6 2 The QIC for the ALR model shown in Output 37 6 2 is 511 86 whereas the QIC for the unstructured working correlation model shown in Output 37 5 4 is 512 34 indicating that the ALR model is a slightly better fit Output 37 6 1 Results of Model Fitting The GENMOD Procedure Log Odds Ratio Parameter Information Parameter Group Alphal 1 2 Alpha2 1 3 Alpha3 1 4 Alpha4 2 3 Alpha5 2 4 Alpha6 3 4 Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95 Confidence Parameter Estimate Error Limits Z Pr gt
114. del s is the score statistic of Cameron and Trivedi 1998 for testing for overdispersion in a Poisson model against alternatives of the form V u u kp See Rao 1973 p 417 for more details Predicted Values of the Mean Predicted Values A predicted value or fitted value of the mean u corresponding to the vector of covariates x is given by i g x B 2528 Chapter 37 The GENMOD Procedure where g is the link function regardless of whether x corresponds to an observation or not That is the response variable can be missing and the predicted value is still computed for valid x In the case where x does not correspond to a valid observation x is not checked for estimability You should check the estimability of x in this case in order to ensure the uniqueness of the predicted value of the mean If there is an offset it is included in the predicted value computation Confidence Intervals on Predicted Values Approximate confidence intervals for predicted values of the mean can be computed as follows The variance of the linear predictor n x B is estimated by 2a aS Ox X UX where is the estimated covariance of B The robust estimate of the covariance is used for X in the case of models fit with GEEs Approximate 100 1 w confidence intervals are computed as a Th where Zp is the 100pth percentile of the standard normal distribution and g is the link function If either endpoint in the argument i
115. del and is therefore applicable to a wider range of data analysis problems A generalized linear model consists of the following components e The linear component is defined just as it is for traditional linear models ni xB e A monotonic differentiable link function g describes how the expected value of y is related to the linear predictor n g ui x B e The response variables y are independent for i 1 2 and have a probability distribution from an exponential family This implies that the variance of the response depends on the mean 4 through a variance function V pV ui var yi a L where is a constant and w is a known weight for each observation The dispersion parameter is either known for example for the binomial or Poisson distribution 1 or must be estimated See the section Response Probability Distributions on page 2510 for the form of a probability distribution from the exponential family of distributions As in the case of traditional linear models fitted generalized linear models can be summarized through statistics such as parameter estimates their standard errors and goodness of fit statistics You can also make statistical inference about the parameters by using confidence intervals and hypothesis tests However specific inference procedures are usually based on asymptotic considerations since exact distribution theory is not available or is not practical for all generalized l
116. distance statistic as a function of ordered cluster CLUSTERDFIT MCLS plots the studentized cluster Cook s distance statistic as a function of ordered cluster DFBETAC plots the cluster deletion statistic as a function of ordered cluster for each regression parameter in the model DFBETACS plots the standardized cluster deletion statistic as a function of ordered cluster for each regression parameter in the model RORDER keyword specifies the sorting order for the levels of the response variable This order deter mines which intercept parameter in the model corresponds to each level in the data If RORDER FORMATTED for numeric variables for which you have supplied no explicit format the levels are ordered by their internal values Note that this represents a change from previous releases for how class levels are ordered Before SAS 8 numeric class levels with no explicit format were ordered by their BEST12 formatted values and to revert to the previous order you can specify this format explicitly for the response variable The change was implemented because the former default behavior for RORDER FORMATTED often resulted in levels not being ordered numerically and usually required the user to intervene with an explicit format or RORDER INTERNAL to get the more natural ordering The following table displays the valid keywords and describes how PROC GENMOD interprets them RORDER keyword Levels Sorted by DATA Order of appearance in
117. e In some cases the marginal sample size can be too small to admit accurate estimation of a particular statistic a note is printed in the SAS log when a marginal sample size is less than 100 Increasing n increases the number of samples used in a marginal distribution however if you want to control the sample size exactly you can either specify the BUILDSUBSETS option or do both of the following e Remove the JOINT option from the EXACT statement e Create dummy variables in a DATA step to represent the levels of a CLASS variable and specify them as independent variables in the MODEL statement NOLOGSCALE specifies that computations for the exact conditional models be computed by using normal scaling Log scaling can handle numerically larger problems than normal scaling however computations in the log scale are slower than computations in normal scale ONDISK uses disk space instead of random access memory to build the exact conditional distribution Use this option to handle larger problems at the cost of slower processing SEED seed specifies the initial seed for the random number generator used to take the Monte Carlo samples when the METHOD NETWORKMC option is specified The value of the SEED option must be an integer If you do not specify a seed or if you specify a value less than or equal to zero then PROC GENMOD uses the time of day from the computer s clock to generate an initial seed STATUSN number prints
118. e functions of the Pearson residual Yij Bij yY Hij Wij If you specify the working correlation as Ro I which is the identity matrix the GEE reduces to the independence estimating equation eij Following are the structures of the working correlation supported by the GENMOD procedure and the estimators used to estimate the working correlations Working Correlation Structure Fixed Corr Y ij Yik Tjk where rj is the jkth element of a constant Estimator The working correlation is not esti mated in this case user specified correlation matrix Ro Independent Corr Yi Yik J 4 The working correlation is not esti J mated in this case m dependent 1 t 0 Corr i Yi j t a t 1 2 m Ot K 0 t gt m apy Li t jeni ijeij t K TE t Exchangeable 1 j k i K Corr Y Yik y 2 amp Np Wixi Lj lt k amp ijeik N 0 5 04 n n 1 Unstructured 1 j k A K Corr Y ij Yik aj Ak Qjk Taa A Cijeik Autoregressive AR 1 Corr i Yi j 1 g a 1 K fort 0 1 2 ni j Kp i j lt n 1 ij ij 1 Ki Yj ni 1 2534 Chapter 37 The GENMOD Procedure Dispersion Parameter The dispersion parameter is estimated by I K ni a poa i 1j 1 where N yi ni is the total number of measurements and p is the number of regression parameters The square root of is reported by PROC GENMOD as the scale parameter in the Analysis of GEE Parameter Estimates
119. e GENMOD Procedure Descriptive Statistics of the Posterior Samples The Descriptive Statistics of the Posterior Sample table contains the size of the sample the mean the standard deviation and the quartiles for each model parameter Interval Estimates for Posterior Sample The Interval Estimates for Posterior Sample table contains the HPD intervals and the credible intervals for each model parameter Correlation Matrix of the Posterior Samples The Correlation Matrix of the Posterior Samples table is produced if you include the CORR suboption in the SUMMARY option in the BAYES statement This table displays the sample correlation of the posterior samples Covariance Matrix of the Posterior Samples The Covariance Matrix of the Posterior Samples table is produced if you include the COV suboption in the SUMMARY option in the BAYES statement This table displays the sample covariance of the posterior samples Autocorrelations of the Posterior Samples The Autocorrelations of the Posterior Samples table displays the lag1 lag5 lag10 and lag50 autocorrelations for each parameter Gelman and Rubin Diagnostics The Gelman and Rubin Diagnostics table is produced if you include the GELMAN suboption in the DIAGNOSTIC option in the BAYES statement This table displays the estimate of the potential scale reduction factor and its 97 5 upper confidence limit for each parameter Geweke Diagnostics The
120. e GENMOD Procedure Figure 37 6 Type 3 Analysis LR Statistics For Type 3 Analysis Chi Source DF Square Pr gt ChiSq car 2 72 82 lt 0001 age 1 104 64 lt 0001 The Type 3 analysis results in the same conclusions as the Type 1 analysis The Type 3 chi square value for the car variable for example is twice the difference between the log likelihood for the model with the variables Intercept car and age included and the log likelihood for the model with the car variable excluded The hypothesis tested in this case is the significance of the variable car given that the variable age is in the model In other words it tests the additional contribution of car in the model The values of the Type 3 likelihood ratio statistics for the car and age variables indicate that both of these factors are highly significant in determining the claims performance of the insurance policyholders Bayesian Analysis of a Linear Regression Model Neter et al 1996 describe a study of 54 patients undergoing a certain kind of liver operation in a surgical unit The data set Surg contains survival time and certain covariates for each patient Observations for the first 20 patients in the data set Surg are shown in Figure 37 7 Bayesian Analysis of a Linear Regression Model 2441 Figure 37 7 Surgical Unit Data Obs x1 x2 x3 x4 y logy Logx1 1 6 7 62 81 2 59 200 2 3010 1 90211 2 5 1 59 66 1 70 101 2 0043 1 62924 3 7 4 57 83 2 16 204 2 309
121. e Inverse Gaussian fie app 2 3 i pes E isa EE 2 Vib Wi Gamma Wi Wi Vi Wi Vi Wi ll 1 1 i 1 T oz 22 Oe RN o 3 e Negative binomial k k Tyi wi k li yj log yi w k log K log n k k TOi wi k anma E o soona 14 E ap e Poisson Wi li yi log ui Hi p ll wilyi log ui ni log yi e Binomial l zh log pi ni ri log pi T loe 2 ri log pi n ri log 1 p 2516 Chapter 37 The GENMOD Procedure e Multinomial k categories k je li J yij log uis k ll willog m gt vi log uiz logis D j l e Zero inflated Poisson wi log 1 i exp A yi 0 wj log 1 wi yi log Ai Ai log yi yi gt 0 e Zero inflated negative binomial logla 1 0 A yi 0 log 1 wi yi log 2 yi 3 log 1 za Toit tiog meii an Maximum Likelihood Fitting The GENMOD procedure uses a ridge stabilized Newton Raphson algorithm to maximize the log likelihood function L y 4 with respect to the regression parameters By default the procedure also produces maximum likelihood estimates of the scale parameter as defined in the section Response Probability Distributions on page 2510 for the normal inverse Gaussian negative binomial and gamma distributions On the rth iteration the algorithm updates th
122. e also computed for each effect Confidence Intervals for Parameters 2525 Options for handling the dispersion parameter are the same as for a Type 1 analysis The dispersion parameter can be specified to be a known value estimated from the deviance or Pearson s chi square divided by degrees of freedom or estimated by maximum likelihood individually for the unconstrained and constrained models By default PROC GENMOD estimates scale by maximum likelihood for each model fit The results of this type of analysis do not depend on the order in which the terms are specified in the MODEL statement A Type 3 analysis can consume considerable computation time since a constrained model is fitted for each effect Wald statistics for Type 3 contrasts are computed if you specify the WALD option Wald statistics for contrasts use less computation time than likelihood ratio statistics but might be less accurate indicators of the significance of the effect of interest The Wald statistic for testing L B 0 where L is the contrast matrix is defined by S L Y U LL where is the maximum likelihood estimate and amp is its estimated covariance matrix The asymptotic distribution of S is chi square with r degrees of freedom where r is the rank of L See Chapter 39 The GLM Procedure and Chapter 15 The Four Types of Estimable Functions for more information about Type III estimable functions Also refer to Littell Freund and Spec
123. e is added to the linear predictor the estimated standard error of the linear predictor the value of the negative of the weight in the Hessian matrix at the final iteration This is the expected weight if the EXPECTED option is specified in the MODEL statement Otherwise it is the weight used in the final iteration That is it is the observed weight unless the SCORING option has been specified approximate lower and upper endpoints for a confidence interval for the predicted value of the mean raw residual e Pearson residual deviance residual standardized Pearson residual standardized deviance residual likelihood residual leverage Cook s distance statistic e DFBETA statistic for each parameter standardized DFBETA statistic for each parameter zero inflation probability for zero inflated models response mean for zero inflated models Displayed Output for Classical Analysis 2561 ESTIMATE Statement Results If you specify a REPEATED statement the ESTIMATE statement results apply to the specified GEE model Otherwise they apply to the specified generalized linear model For each ESTIMATE statement the table contains the contrast label the estimated value of the contrast the standard error of the estimate the significance level a 1 a x 100 confidence intervals for contrast the Wald chi square statistic for the contrast and the p value computed from the chi square distribution If you specify
124. e mean jz and dispersion parameter instead of the natural parameter 0 The probability distributions that are available in the GENMOD procedure are shown in the following list The zero inflated Poisson and zero inflated negative binomial distributions are not generalized linear models However the zero inflated distributions are included in PROC GENMOD since they are useful extensions of generalized linear models See Long 1997 for a discussion of the zero inflated Poisson and zero inflated negative binomial distributions The PROC GENMOD scale parameter and the variance of Y are also shown e Normal 1 1 y Hy fO a ES for lt y lt o scale Var Y o 2512 Chapter 37 The GENMOD Procedure e Inverse Gaussian 1 Lye py fy exp 2 gt for0 lt y lt J 2ny30 2y po o 0 scale Var Y op e Gamma 1 yv yv v scale v u Var Y v e Geometric This is a special case of the negative binomial with k 1 u fO G ptt for y 0 1 2 1 Var Y wl p e Negative binomial rO 1 k ku f 0 1 2 IO TO4 Drd H d kupte PY dispersion k Var Y wtkp e Poisson Ve u fo for y 0 1 2 1 Var Y u Generalized Linear Models Theory 2513 e Binomial n r n r r fo 7 u 1 p POR at ee Olin 1 1 vay 2 n e Multinomial m y RA k Y1 Y2 VK fiy y uea ee ee 1 e Zero inflated Poi
125. e of the FREQ variable for the observation If it is not an integer the frequency value is truncated to an integer If it is less than 1 or missing the observation is not used In the case of models fit with generalized estimating equations GEEs the frequencies apply to the subject cluster and therefore must be the same for all observations within each subject FWDLINK Statement FWDLINK variable expression You can define a link function other than a built in link function by using the FWDLINK statement If you use the MODEL statement option LINK to specify a link function you do not need to use the FWDLINK statement The variable identifies the link function to the procedure The expression 2488 Chapter 37 The GENMOD Procedure can be any arithmetic expression supported by the DATA step language and it is used to define the functional dependence on the mean Alternatively the link function can be defined by using programming statements see the section Programming Statements on page 2502 and assigned to a variable which is then listed as the expression The second form is convenient for using complex statements such as IF THEN ELSE clauses The GENMOD procedure automatically computes derivatives of the link function required for iterative fitting You must specify the inverse of the link function in the INVLINK statement when you specify the FWDLINK statement to define the link function You use the automatic variable _M
126. e parameter vector B with Br 1 Br H s where H is the Hessian second derivative matrix and s is the gradient first derivative vector of the log likelihood function both evaluated at the current value of the parameter vector That is n m and Generalized Linear Models Theory 2517 In some cases the scale parameter is estimated by maximum likelihood In these cases elements corresponding to the scale parameter are computed and included in s and H If n x is the linear predictor for observation i and g is the link function then nj g j1 so that u g7 x B is an estimate of the mean of the ith observation obtained from an estimate of the parameter vector The gradient vector and Hessian matrix for the regression parameters are given by y Wi Vi Mi Xi 7 Via Hid H X W X where X is the design matrix x is the transpose of the ith row of X and V is the variance function The matrix W is diagonal with its ith diagonal element V ui g wi V Cuig Hi Vui g ui 9 Woi Wei wi yi Hi where PV ri 8 ui The primes denote derivatives of g and V with respect to u The negative of H is called the observed information matrix The expected value of W is a diagonal matrix We with diagonal values we If you replace Wy with We then the negative of H is called the expected information matrix We is the weight matrix for the Fisher scoring method of fitting Either Wo o
127. e same ward have one log odds ratio parameter and patients from different wards have the other parameter specifies the full z matrix You must also specify a SAS data set con taining the z matrix with the ZDATA data set name option Each ob servation in the data set corresponds to one row of the z matrix You must specify the ZDATA data set as if all clusters are complete that is as if all clusters are the same size and there are no missing ob servations The ZDATA data set has K max max 1 2 observa tions where K is the number of clusters and nmax is the maximum cluster size If the members of cluster i are ordered as 1 2 n then the rows of the z matrix must be specified for pairs in the order 1 2 1 3 1 7 2 3 2 7 n 1 n The variables specified in the REPEATED statement for the SUBJECT effect must also be present in the ZDATA data set to identify clusters You must specify variables in the data set that define the columns of the z matrix by the ZROW variable list option If there are q columns q variables in variable list then there are q log odds ratio parameters You can optionally specify variables indicating the cluster pairs corresponding to each row of the z matrix with the YPAIR variable1 variable2 option If you specify this option the data from the ZDATA data set are sorted within each cluster by variable and variable2 See Example 37 6 for an example of specifying a full z matr
128. e seed shown was derived from the time of day Output 37 10 3 MCMC Initial Values and Seeds Initial Values of the Chain Chain Seed Intercept X1 X2 X3 x4 1 1 2 450813 0 00435 0 01347 0 00291 0 27149 Initial Values of the Chain x5 X6 0 321507 0 207713 Summary statistics for the posterior sample are displayed in the Fit Statistics Descriptive Statistics for the Posterior Sample Interval Statistics for the Posterior Sample and Posterior Correlation Matrix tables in Output 37 10 4 Output 37 10 5 Output 37 10 6 and Output 37 10 7 respectively Since noninformative prior distributions for the regression coefficients were used the mean and standard deviations of the posterior distributions for the model parameters are close to the maximum likelihood estimates and standard errors Output 37 10 4 Fit Statistics Fit Statistics DIC smaller is better 829 729 pD effective number of parameters 6 966 Output 37 10 5 Descriptive Statistics The GENMOD Procedure Bayesian Analysis Posterior Summaries Standard Percentiles Parameter N Mean Deviation 25 50 75 Intercept 10000 2 4520 0 2268 2 2997 2 4521 2 6053 X1 10000 0 00473 0 00801 0 0100 0 00465 0 000759 X2 10000 0 0134 0 00236 0 0150 0 0134 0 0118 x3 10000 0 00309 0 00220 0 00455 0 00305 0 00158 x4 10000 0 2705 0 0792 0 3241 0 2697 0 2172 x5 10000 0 3193 0 0834 0 2629 0 3180 0 3762 x6 10000 0 2095 0
129. e the cumulative category probabilities note that P 1 The ordinal model is g Pir ur x B for r 1 2 k 1 where u1 42 uk are intercept terms that depend only on the categories and x is a vector of covariates that does not include an intercept term The logit probit and complementary log log link functions g are available These are obtained by specifying the MODEL statement options DIST MULTINOMIAL and LINK CUMLOGIT cumulative logit LINK CUMPROBIT cumula tive probit or LINK CUMCLL cumulative complementary log log Alternatively Pir F ur x B for r 1 2 k 1 where F g distribution is a cumulative distribution function for the logistic normal or extreme value PROC GENMOD estimates the intercept parameters 41 U2 Hk 1 and regression parameters B by maximum likelihood The subpopulations 7 are defined by constant values of the AGGREGATE variable This has no effect on the parameter estimates but it does affect the deviance and Pearson chi square statis tics it also affects parameter estimate standard errors if you specify the SCALE DEVIANCE or SCALE PEARSON option Zero Inflated Models Count data that have an incidence of zeros greater than expected for the underlying probability distribution of counts can be modeled with a zero inflated distribution In GENMOD the underlying distribution can be either Poisson or negative binomial See Lambert 1992 Long 1997 and Cameron and
130. e the cumulative residual plot in Output 37 9 1 and compute a p value for the model These graphical displays are requested by specifying the ODS GRAPHICS statement and the ASSESS statement For general information about ODS Graphics see Chapter 21 Statistical Graphics Using ODS For specific information about the graphics available in the GENMOD procedure see the section ODS Graphics on page 2572 Here the SAS data set variables Time Time2 TrtTime and TrtTime2 correspond to T Tai RiT ik and R Ti respectively The variable Id identifies individual patients ods graphics on proc genmod data cd4 class Id model Y Time Time2 TrtTime TrtTime2 repeated sub Id assess var Time resample seed 603708000 run ods graphics off 2606 Chapter 37 The GENMOD Procedure Output 37 9 1 Cumulative Residual Plot for Quadratic Time Fit Checking Functional Form for Time Observed Path and First 20 Simulated Paths 300 j 200 Ww T 100 7p a X 2 i w 5 O 1004 200 4 Pr gt MaxAbsVal 0 1830 300 l l a Simulations 0 10 20 30 40 Time The cumulative residual plot in Output 37 9 1 displays cumulative residuals versus time for the model and 20 simulated realizations The associated p value also shown in Output 37 9 1 is 0 18 These results indicate that a more satisfactory model might be possible The observed cumulative residual pattern most resembles plot c in Out
131. e time of day STATISTICS lt global options gt ALL NONE keyword keyword list STATS lt global options gt ALL NONE keyword keyword list controls the number of posterior statistics produced Specifying STATISTICS ALL is equiv alent to specifying STATISTICS SUMMARY INTERVAL COV CORR If you do not want any posterior statistics you specify STATISTICS NONE The default is STATIS TICS SUMMARY INTERVAL See the section Summary Statistics on page 169 for details The global options include the following ALPHA numeric list controls the probabilities of the credible intervals The ALPHA values must be between 0 and 1 Each ALPHA value produces a pair of 100 1 ALPHA equal tail and HPD intervals for each parameters The default is the value of the ALPHA option in the MODEL statement or 0 05 if that option is not specified yielding the 95 credible intervals for each parameter PERCENT numeric list requests the percentile points of the posterior samples The PERCENT values must be between 0 and 100 The default is PERCENT 25 50 75 which yield the 25th 50th and 75th percentile points respectively for each parameter The list of keywords includes the following CORR produces the posterior correlation matrix COV produces the posterior covariance matrix SUMMARY produces the means standard deviations and percentile points for the posterior samples The default is to produce the 25th 50th
132. eans PLOTS Requests ODS statistical graphics of means and mean comparisons SEED Specifies the seed for computations that depend on random numbers Generalized Linear Modeling EXP Exponentiates and displays estimates of LS means or LS means differences ILINK Computes and displays estimates and standard errors of LS means but not differences on the inverse linked scale ODDSRATIO Reports simple differences of least squares means in terms of odds ratios if permitted by the link function For details about the syntax of the LSMEANS statement see the section LSMEANS Statement on page 479 of Chapter 19 Shared Concepts and Topics LSMESTIMATE Statement LSMESTIMATE model effect lt label gt values lt divisor n gt lt lt label gt values lt divisor n gt gt lt options gt 2490 Chapter 37 The GENMOD Procedure The LSMESTIMATE statement provides a mechanism for obtaining custom hypothesis tests among least squares means Table 37 4 summarizes important options in the LSMESTIMATE statement Table 37 4 Important LSMESTIMATE Statement Options Option Description Construction and Computation of LS Means AT Modifies covariate values in computing LS means BYLEVEL Computes separate margins DIVISOR Specifies a list of values to divide the coefficients OM Specifies the weighting scheme for LS means computation as de termined by a data set SINGULAR Tunes estimability check
133. ection Predicted Values of the Mean on page 2527 the section Residuals on page 2528 and the section Case Deletion Diagnostic Statistics on page 2544 Residuals and fit diagnostics are not computed for multinomial models For each observation the following items are displayed e the value of the response variable variables if the data are binomial frequency and weight variables e the values of the regression variables e predicted mean g n where n x B is the linear predictor and g is the link function If there is an offset it is included in x e estimate of the linear predictor x B If there is an offset it is included in x B i e standard error of the linear predictor xB e the value of the Hessian weight at the final iteration 2496 Chapter 37 The GENMOD Procedure e lower confidence limit of the predicted value of the mean The confidence coefficient is specified with the ALPHA option See the section Confidence Intervals on Predicted Values on page 2528 for the computational method e upper confidence limit of the predicted value of the mean e raw residual defined as Y u e Pearson or chi residual defined as the square root of the contribution for the observation to the Pearson chi square that is Y VV u w where Y is the response jz is the predicted mean w is the value of the prior weight variable specified ina WEIGHT statement and V u is the variance function ev
134. ee Sas SAS STAT User s Guide The GENMOD Procedure Book Excerpt This document is an individual chapter from SAS STAT 9 22 User s Guide The correct bibliographic citation for the complete manual is as follows SAS Institute Inc 2010 SAS STAT 9 22 User s Guide Cary NC SAS Institute Inc Copyright 2010 SAS Institute Inc Cary NC USA All rights reserved Produced in the United States of America For a Web download or e book Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication U S Government Restricted Rights Notice Use duplication or disclosure of this software and related documentation by the U S government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52 227 19 Commercial Computer Software Restricted Rights June 1987 SAS Institute Inc SAS Campus Drive Cary North Carolina 27513 Ist electronic book May 2010 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential For more information about our e books e learning products CDs and hard copy books visit the SAS Publishing Web site at support sas com publishing or call 1 800 727 3228 SAS and all other SAS Institute Inc product or service names are registered trademarks or trademarks of SAS Institute Inc in the USA and other countries indi
135. el Information Correlation Structure Exchangeable Subject Effect id 58 levels Number of Clusters 58 Correlation Matrix Dimension 5 Maximum Cluster Size 5 Minimum Cluster Size 5 Output 37 7 4 GEE Parameter Estimates Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95 Confidence Parameter Estimate Error Limits Z Pr gt Z Intercept 1 3476 0 1574 1 0392 1 6560 8 56 lt 0001 x1 0 1108 0 1161 0 1168 0 3383 0 95 0 3399 trt 0 1080 0 1937 0 4876 0 2716 0 56 0 5770 xl trt 0 3016 0 1712 0 6371 0 0339 1 76 0 0781 2596 Chapter 37 The GENMOD Procedure Table 37 14 displays the regression coefficients standard errors and normalized coefficients that result from fitting the model with independent and exchangeable working correlation matrices Table 37 14 Results of Model Fitting Variable Correlation Structure Coef Std Error Coef S E Intercept Exchangeable 1 35 0 16 8 56 Independent 1 35 0 03 39 52 Visit x1 Exchangeable 0 11 0 12 0 95 Independent 0 11 0 05 2 36 Treat x2 Exchangeable 0 11 0 19 0 56 Independent 0 11 0 05 2 22 X1 X2 Exchangeable 0 30 0 17 1 76 Independent 0 30 0 07 4 32 The fitted exchangeable correlation matrix is specified with the CORRW option and is displayed in Output 37 7 5 Output 37 7 5 Working Correlation Matrix Working Correlation Matrix Coll Col2 Col3 Col4 Col5 Rowl 1 0000 0 5941 0 5941 0 5941 0 5941
136. elihood 23 7343 AIC smaller is better 59 4686 AICC smaller is better 67 1050 BIC smaller is better 64 8109 2576 Chapter 37 The GENMOD Procedure In the Analysis Of Parameter Estimates table displayed in Output 37 1 4 chi square values for the explanatory variables indicate that the parameter values other than the intercept term are all significant The scale parameter is set to 1 for the binomial distribution When you perform an overdispersion analysis the value of the overdispersion parameter is indicated here See the section Overdispersion on page 2521 for a discussion of overdispersion Output 37 1 4 Parameter Estimates Parameter Intercept x drug drug drug drug drug Scale AOA wD Pp Analysis Of Maximum Likelihood Parameter Estimates DF COPRPRRHRHE Estimate 2792 9794 8955 0162 7952 8548 0000 0000 Standard Error 4196 7660 6092 4052 6655 4838 0000 0000 oooo0o00c 0 Likelihood Ratio 95 Confidence Limits 0 5336 1 0 5038 3 4 2280 1 2 8375 1 5 3111 2 1 8072 0 0 0000 0 1 0000 1 NOTE The scale parameter was held fixed 1190 5206 7909 2435 6261 1028 0000 0000 Wald Chi Square 0 6 22 24 32 3 44 68 59 76 53 12 Pr gt ChiSq OAAAOGO 5057 0098 0001 0001 0001 0773 The preceding table contains the profile likelihood confidence intervals for the explanatory variable p
137. ement enables you to perform zero inflated Poisson regression or zero inflated negative binomial regression when those respective distributions are specified by the DIST option in the MODEL statement The effects in the ZEROMODEL statement consist of explanatory variables or combinations of variables for the zero inflation probability regression model in a zero inflated model The same effects can be used in both the ZEROMODEL statement and the MODEL statement or effects can be used in one statement or the other separately Explanatory variables can be continuous or classification variables Classification variables can be character or numeric Explanatory variables representing nominal or classification data must be declared in a CLASS statement Interactions between variables can also be included as effects Columns of the design matrix are automatically generated for classification variables and interactions The syntax for specification of effects is the same as for the GLM procedure See the section Specification of Effects on page 2522 for more information Also refer to Chapter 39 The GLM Procedure You can specify the following option in the ZEROMODEL statement after a slash LINK keyword specifies the link function to use in the model The keywords and their associated link functions are as follows LINK Link Function CLOGLOG CLL Complementary log log LOGIT Logit PROBIT Probit If no LINK option is supplied
138. ement to specify the response level ordering Responses for the Poisson distribution must be all nonnegative but they can be noninteger values The effects in the MODEL statement consist of an explanatory variable or combination of variables Explanatory variables can be continuous or classification variables Classification variables can be character or numeric Explanatory variables representing nominal or classification data must be declared in a CLASS statement Interactions between variables can also be included as effects Columns of the design matrix are automatically generated for classification variables and interactions The syntax for specification of effects is the same as for the GLM procedure See the section Specification of Effects on page 2522 for more information Also refer to Chapter 39 The GLM Procedure You can specify the following options in the MODEL statement after a slash AGGREGATE variable list AGGREGATE variable AGGREGATE specifies the subpopulations on which the Pearson chi square and the deviance are calculated This option applies only to the multinomial distribution or the binomial distribution with binary single trial syntax response It is ignored if specified for other cases Observations with common values in the given list of variables are regarded as coming from the same subpopulation This affects the computation of the deviance and Pearson chi square statistics Variables in the li
139. ent GENMOD 2495 OFFSET option MODEL statement GENMOD 2497 ONESIDED option EXACT statement GENMOD 2483 ORDER option CLASS statement GENMOD 2474 PROC GENMOD statement 2457 OUT option OUTPUT statement GENMOD 2499 OUTDIST option EXACT statement GENMOD 2483 OUTPUT statement GENMOD procedure 2499 PARAME option CLASS statement GENMOD 2475 PLOTS option PROC GENMOD statement 2458 PRED option MODEL statement GENMOD 2497 PREDICTED option MODEL statement GENMOD 2497 PROC GENMOD statement see GENMOD procedure PSCALE MODEL statement GENMOD 2497 REF option CLASS statement GENMOD 2476 REPEATED statement GENMOD procedure 2453 2503 RESIDUALS option MODEL statement GENMOD 2497 RORDER option PROC GENMOD statement 2461 RUPDATE option REPEATED statement GENMOD 2505 SCALE option MODEL statement GENMOD 2497 SCORING option MODEL statement GENMOD 2498 SCWGT statement GENMOD procedure 2509 SINGULAR option CONTRAST statement GENMOD 2479 ESTIMATE statement GENMOD 2481 MODEL statement GENMOD 2498 SLICE statement GENMOD procedure 2507 SORTED option REPEATED statement GENMOD 2505 STATISTICS option BAYES statement GENMOD 2472 STORE statement GENMOD procedure 2507 STRATA statement GENMOD procedure 2507 SUBCLUSTER option REPEATED statement GENMOD 2505 SUBJECT option REPEATED statement GENMOD 2503 THINNING option BAYES
140. eration 1 0 D S 0 5 2 is T o 5 g 5 Ww 05 1 0 0 10 20 30 40 50 1 5 2 0 2 5 3 0 3 5 Lag Intercept 2614 Chapter 37 The GENMOD Procedure Output 37 10 12 Diagnostic Plots for X1 X1 Autocorrelation Diagnostics for X1 0 02 0 04 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 Cc oO Q 0 0 5 D E 0 5 1 0 0 10 20 30 40 50 0 04 0 02 0 00 0 02 Lag x1 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2615 Output 37 10 13 Diagnostic Plots for X2 Autocorrelation Diagnostics for X2 0 005 0 010 0 015 0 020 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 0 5 D o m 0 0 D g 0 5 1 0 0 10 20 30 40 50 0 025 0 020 0 015 0 010 0 005 Lag X2 2616 Chapter 37 The GENMOD Procedure Output 37 10 14 Diagnostic Plots for X3 X3 Autocorrelation Diagnostics for X3 0 005 0 000 0 005 0 010 T T T T 2000 4000 6000 8000 10000 12000 Iteration 1 0 D 0 5 7 Cc oO Q 0 0 S D 2 0 5 1 0 0 10 20 30 40 50 0 010 0 005 0 000 0 005 Lag X3 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2617 Output 37 10 15 Diagnostic Plots for X4 X4 Autocorrelation Diagnostics for X4 T 2000 4000 6000 8000 10000 12000 Iteration O O Posterior Density Lag X4 2618 Chapter 37 The GENMOD Procedure Output 37 10 1
141. ession on page 2553 for information about the use of the offset in the exact Poisson model PREDICTED PRED P requests that predicted values the linear predictor its standard error and the Hessian weight be displayed see the OBSTATS option RESIDUALS R requests that residuals and standardized residuals be displayed Residuals and other diagnostic statistics are not available for the multinomial distribution see the OBSTATS option SCALE number SCALE PEARSON SCALE P PSCALE SCALE DEVIANCE SCALE D DSCALE sets the value used for the scale parameter where the NOSCALE option is used For the binomial and Poisson distributions which have no free scale parameter this can be used to specify an overdispersed model In this case the parameter covariance matrix and the likelihood function are adjusted by the scale parameter See the section Dispersion Parameter on page 2520 and the section Overdispersion on page 2521 for more information If the NOSCALE option is not specified then number is used as an initial estimate of the scale parameter 2498 Chapter 37 The GENMOD Procedure Specifying SCALE PEARSON or SCALE P is the same as specifying the PSCALE option This fixes the scale parameter at the value in the estimation procedure After the parameter estimates are determined the exponential family dispersion parameter is assumed to be given by Pearson s chi square statistic divided by the degrees of freedom a
142. f is the formatted length of the CLASS variable DESCENDING DESC reverses the sorting order of the classification variable If both the DESCENDING and ORDER2 options are specified PROC GENMOD orders the categories according to the ORDER option and then reverses that order LPREFIX n specifies that at most the first n characters of a CLASS variable label be used in creating labels for the corresponding design variables The default is 256 min 256 max 2 f where f is the formatted length of the CLASS variable MISSING treats missing values A Z for numeric variables and blanks for character variables as valid values for the CLASS variable ORDER DATA FORMATTED FREQ INTERNAL specifies the sorting order for the levels of classification variables This ordering determines which parameters in the model correspond to each level in the data so the ORDER option can be useful when you use the CONTRAST statement By default ORDER FORMATTED For ORDER FORMATTED and ORDER INTERNAL the sort order is machine dependent When ORDER FORMATTED is in effect for numeric variables for which you have supplied no explicit format the levels are ordered by their internal values The following table shows how PROC GENMOD interprets values of the ORDER option CLASS Statement 2475 Value of ORDER Levels Sorted By DATA Order of appearance in the input data set FORMATTED External formatted values except
143. f variables in any effect that results from bar evaluation by specifying the maximum number preceded by an sign For example A B C 2 results in effects that involve two or fewer variables A B C A B A C B C Parameterization Used in PROC GENMOD 2523 Parameterization Used in PROC GENMOD Design Matrix The linear predictor part of a generalized linear model is n XB where f is an unknown parameter vector and X is a known design matrix By default all models automatically contain an intercept term that is the first column of X contains all 1s Additional columns of X are generated for classification variables regression variables and any interaction terms included in the model It is important to understand the ordering of classification variable parameters when you use the ESTIMATE or CONTRAST statement The ordering of these parameters is displayed in the CLASS Level Information table and in tables displaying the parameter estimates of the fitted model When you specify an overparameterized model with the PARAM GLM option in the CLASS statement some columns of X can be linearly dependent on other columns For example when you specify a model consisting of an intercept term and a classification variable the column corresponding to any one of the levels of the classification variable is linearly dependent on the other columns of X The columns of X X are checked in the order in which the model is specified for dependence
144. fluence of deleting the cluster on the individual parameter estimates normalized by their standard errors ParameterName is the name of the regression model parameter formed from the input variable names concate nated with the appropriate levels if classification variables are involved MCLS CLUSTERDFIT represents the studentized Cook distance type statistic to measure the influence of deleting an entire cluster on the overall model fit Programming Statements Although the most commonly used link and probability distributions are available as built in functions the GENMOD procedure enables you to define your own link functions and response probability distributions by using the FWDLINK INVLINK VARIANCE and DEVIANCE statements The variables assigned in these statements can have values computed in programming statements These programming statements can occur anywhere between the PROC GENMOD statement and the RUN statement Variable names used in programming statements must be unique Variables from the input data set can be referenced in programming statements The mean linear predictor and response are represented by the automatic variables MEAN_ XBETA_ and _RESP_ respectively which can be referenced in your programming statements Programming statements are used to define the functional dependencies of the link function the inverse link function the variance function and the deviance function on the mean linear predictor and res
145. generalized linear models columns are displayed that contain the contrast label the likelihood ratio statistic for testing the significance of the contrast the F statistic for testing the significance of the contrast the numerator degrees of freedom the denominator degrees of freedom the p value based on the F distribution and the p value computed from the chi square distribution with numerator degrees of freedom 2562 Chapter 37 The GENMOD Procedure LSMEANS Coefficients If you specify the LSMEANS statement and you specify the E option the Coefficients for effect Least Squares Means table is displayed where effect is the effect specified in the LSMEANS statement The table contains the effect names and the rows of least squares means coefficients Least Squares Means If you specify the LSMEANS statement the Least Squares Means table is displayed The table contains for each effect the following the effect name and for each level of each effect the following the least squares mean estimate e standard error e chi square value e p value computed from the chi square distribution If you specify the DIFF option a table titled Differences of Least Squares Means is displayed containing corresponding statistics for the differences between the least squares means for the levels of each effect GEE Model Information If you specify the REPEATED statement the GEE Model Information table displays the correlat
146. h generalized estimating equations GEEs observations with missing values within a cluster are not used and all available pairs are used in estimating the working correlation matrix Clusters with fewer observations than the full cluster size are treated as having missing observations occurring at the end of the cluster You can specify the order of missing observations with the WITHINSUBJECT option See the section Missing Data on page 2534 for more information about missing values in GEEs Displayed Output for Classical Analysis The following output is produced by the GENMOD procedure Note that some of the tables are optional and appear only in conjunction with the REPEATED statement and its options or with options in the MODEL statement For details see the section ODS Table Names on page 2568 Model Information The Model Information table displays the two level data set name the response distribution the link function the response variable name the offset variable name the frequency variable name the scale weight variable name the number of observations used the number of events if events trials format is used for response the number of trials if events trials format is used for response the sum of frequency weights the number of missing values in data set and the number of invalid observations for example negative or 0 response values with gamma distribution or number of observations with events greater th
147. h to sort the levels of the classification variables which are specified in the CLASS statement The ORDER option can be useful when you use the CONTRAST or ESTIMATE statement because it determines which parameters in the model correspond to each level in the data This option applies to the levels for all classification variables except when you use the default ORDER FORMATTED option with numeric classification variables that have no explicit format With this option the levels of such variables are ordered by their internal value The ORDER option can take the following values Value of ORDER Levels Sorted By DATA Order of appearance in the input data set FORMATTED External formatted value except for numeric variables with no explicit format which are sorted by their unformatted internal value 2458 Chapter 37 The GENMOD Procedure Table 37 0 continued Value of ORDER Levels Sorted By FREQ Descending frequency count levels with the most observa tions come first in the order INTERNAL Unformatted value By default ORDER FORMATTED For FORMATTED and INTERNAL the sort order is machine dependent For more information about sorting order see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY group processing in SAS Language Reference Concepts PLOTS lt global plot option gt plot request lt options gt PLOTS lt global plot options gt lt plot request lt
148. he deviance for the model including the effect and all previous effects the numerator degrees of freedom the denominator degrees of freedom the chi square statistic for testing the significance of the effect the p value computed from the chi square distribution with numerator degrees of freedom the F statistic for testing the significance of the effect and the p value based on the F distribution Iteration History for Type 3 Contrasts If you specify the model options ITPRINT and TYPE3 an iteration history table is displayed for fitting the model with Type 3 contrast constraints for each effect that contains the effect name the iteration number the ridge value the log likelihood and values of all parameters LR Statistics for Type 3 Analysis If you specify the TYPE3 model option a table is displayed that contains for each effect in the model the name of the effect the likelihood ratio statistic for testing the significance of the effect the degrees of freedom for the effect and the p value computed from the chi square distribution If you specify either the SCALE DEVIANCE or SCALE PEARSON option in the MODEL statement columns are displayed that contain the name of the effect the likelihood ratio statistic for testing the significance of the effect the F statistic for testing the significance of the effect the numerator degrees of freedom the denominator degrees of freedom the p value based on the F distribution and the p value
149. he keyword RESLIK in the OUTPUT statement Multinomial Models This type of model applies to cases where an observation can fall into one of k categories Binary data occur in the special case where k 2 If there are m observations in a subpopulation 7 then 2530 Chapter 37 The GENMOD Procedure the probability distribution of the number falling into the k categories yj yi1 Yi2 Yik can be modeled by the multinomial distribution defined in the section Response Probability Distributions on page 2510 with J j Yij mi The multinomial model is an ordinal model if the categories have a natural order Residuals are not available in the OBSTATS table or the output data set for multinomial models By default and consistently with binomial models the GENMOD procedure orders the response categories for ordinal multinomial models from lowest to highest and models the probabilities of the lower response levels You can change the way PROC GENMOD orders the response levels with the RORDER option in the PROC GENMOD statement The order that PROC GENMOD uses is shown in the Response Profiles output table described in the section Response Profile on page 2556 The GENMOD procedure supports only the ordinal multinomial model If pi1 Pi2 pix are the category probabilities the cumulative category probabilities are modeled with the same link functions used for binomial data Let Pj ear Pijar 1 2 k 1 b
150. hecking Typically a message about a singular informa tion matrix is displayed if you have dependent variables Dependent variables can be identified after the analysis by noting any missing parameter estimates COVARIATES checks dependence between covariates and an added intercept Dependent covariates are removed from the analysis However covariates that are linear functions of the strata variable might not be removed which results in a singular information matrix message being displayed in the SAS log This is the default VARIANCE Statement 2509 ALL checks dependence between all the strata and covariates This option can adversely affect performance if you have a large number of strata NOSUMMARY suppresses the display of the Strata Summary table INFO displays the Strata Information table which includes the stratum number levels of the STRATA variables that define the stratum and the total frequency for each stratum Since the number of strata can be very large this table is displayed only by request VARIANCE Statement VARIANCE variable expression You can specify a probability distribution other than the built in distributions by using the VARIANCE and DEVIANCE statements The variable name variable identifies the variance function to the procedure The expression is used to define the functional dependence on the mean and it can be any arithmetic expression supported by the DATA step language You use
151. i likelihood information criteria QIC for goodness of fit of models fit with GEEs The w are prior weights if any specified with the WEIGHT or FREQ statements Note that the definition of the quasi likelihood for the negative binomial differs from that given in McCullagh and Nelder 1989 The definition used here allows the negative binomial quasi likelihood to approach the Poisson as k 0 e Normal 1 Qij 5Wij Wij Hij e Inverse Gaussian Wis Mij 5yij ss Hij e Gamma sans apie I Oe 7 Qij Wij log 4i Hij e Negative binomial Q r pe r yy ee ene ij wij lo j z lo A ij 10 2 ij ij 081 Vij k 8 k Jij 108 1 ktij k S 1 kij e Poisson Qij e Binomial wij Vij log Mij Hij II Qij wij rij log pij nij rij log pij e Multinomial s categories S Qij wij gt Vijk log Mijk k 1 Generalized Score Statistics Boos 1992 and Rotnitzky and Jewell 1990 describe score tests applicable to testing L B 0 in GEEs where L is a user specified r x p contrast matrix or a contrast for a Type 3 test of hypothesis Let B be the regression parameters resulting from solving the GEE under the restricted model L B 0 and let S B be the generalized estimating equation values at The generalized score statistic is T 8 8 EmL L EL LZ S A where is the model based covariance estimate and e is the empirical covariance esti
152. ibes these options and their suboptions COEFFPRIOR JEFFREYS lt option gt NORMAL lt options gt UNIFORM COEFF JEFFREYS lt options gt NORMAL lt options gt UNIFORM CPRIOR JEFFREYS lt options gt NORMAL lt options gt UNIFORM specifies the prior distribution for the regression coefficients The default is COEFF PRIOR UNIFORM which specifies the noninformative and improper prior of a constant Jeffreys prior is specified by COEFFPRIOR JEFFREYS which can be followed by the following option in parentheses Jeffreys prior is proportional to B z where I is the Fisher information matrix See the section Jeffreys Prior on page 2551 and Ibrahim and Laud 1991 for more details CONDITIONAL specifies that the Jeffreys prior conditional on the current Markov chain value of the 1 generalized linear model precision parameter T is proportional to tI B The normal prior is specified by COEFFPRIOR NORMAL which can be followed by one of the following options enclosed in parentheses However if you do not specify an option the normal prior N 0 1061 where I is the identity matrix is used See the section Normal Prior on page 2551 for more details 3 CONDITIONAL specifies that the normal prior conditional on the current Markov chain value of the generalized linear model precision parameter t is N x t 1 where u and are the mean and covariance of the normal prio
153. ics 41 337 348 Hardin J W and Hilbe J M 2003 Generalized Estimating Equations Boca Raton FL Chapman amp Hall CRC Hilbe J 1994 Log Negative Binomial Regression Using the GENMOD Procedure in Proceed ings of the Nineteenth Annual SAS Users Group International Conference Cary NC SAS Institute Inc Hilbe J M 2007 Negative Binomial Regression New York Cambridge University Press Hilbe J M 2009 Logistic Regression Models London Chapman amp Hall CRC Hirji K F Mehta C R and Patel N R 1987 Computing Distributions for Exact Logistic Regression Journal of the American Statistical Association 82 1110 1117 Ibrahim J G Chen M H and Lipsitz S R 1999 Monte Carlo EM for Missing Covariates in Parametric Regression Models Biometrics 55 591 596 Ibrahim J G Chen M H and Sinha D 2001 Bayesian Survival Analysis New York Springer Verlag Ibrahim J G and Laud P W 1991 On Bayesian Analysis of Generalized Linear Models Using Jeffreys Prior Journal of the American Statistical Association 86 981 986 Lambert D 1992 Zero Inflated Poisson Regression Models with an Application to Defects in Manufacturing Technometrics 34 1 14 Lawless J E 1987 Negative Binomial and Mixed Poisson Regression The Canadian Journal of Statistics 15 209 225 Lawless J F 2003 Statistical Model and Methods for Lifetime Da
154. idrange of the values of 6 and taking u 20 to be the extremes of the range an N 0 1385 0 0005 is the resulting prior distribution The second analysis uses this informative normal prior distribution for the coefficient of X1 and uses independent noninformative normal priors with zero means and variances equal to 10 for the remaining model regression parameters In the following BAYES statement COEFFPRIOR NORMAL INPUT NormalPrior specifies the normal prior distribution for the regression coefficients with means and variances contained in the data set NormalPrior An analysis is performed using PROC GENMOD to obtain Bayesian estimates of the regression coefficients by using the following SAS statements data NormalPrior input _type_ Intercept X1 xX6 datalines Var 1e6 0 0005 1e6 1e6 1e6 1e6 1e6 Mean 0 0 0 1385 0 0 0 0 0 0 0 0 0 0 run proc genmod data Liver model Y X1 X6 dist Poisson link log bayes seed 1 plots none coeffprior normal input NormalPrior run ods graphics off The prior distributions for the regression parameters are shown in Output 37 10 18 Output 37 10 18 Regression Coefficient Priors The GENMOD Procedure Bayesian Analysis Independent Normal Prior for Regression Coefficients Parameter Mean Precision Intercept 0 1E 6 X1 0 1385 2000 X2 0 1E 6 x3 0 1E 6 X4 0 1E 6 X5 0 1E 6 x6 0 1E 6 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2621 Initial values fo
155. ied with the SAMPLING option in the BAYES statement are available in GENMOD for drawing samples from the posterior distribution Bayesian Analysis 2549 ARMS Algorithm for Gibbs Sampling This section provides details for Bayesian analysis by Gibbs sampling in generalized linear models See the section Gibbs Sampler on page 151 for a general discussion of Gibbs sampling See Gilks Richardson and Spiegelhalter 1996 for a discussion of applications of Gibbs sampling to a number of different models including generalized linear models Let 0 01 0k be the parameter vector For generalized linear models the 6 s are the regression coefficients 6 s and the dispersion parameter Let L D be the likelihood function where D is the observed data Let z 0 be the prior distribution The full conditional distribution of 0 0 7 A j is proportional to the joint distribution that is m 6 0 i j D x L D O pO For instance the one dimensional conditional distribution of 0 given 0 07 2 lt j lt k is computed as 0110 07 2 lt j lt k D L D 0 01 03 02 pO 1 OF 0 Suppose you have a set of arbitrary starting values 6 saad 97 Using the ARMS adaptive rejection Metropolis sampling algorithm of Gilks and Wild 1992 and Gilks Best and Tan 1995 you can do the following draw 6 from 0110 07 draw OS from 42 0 0 0 draw a from 0 ieee go This completes one i
156. ihood function or asymptotic normality syntax similar to that of PROC GLM for the specification of the response and model effects including interaction terms and automatic coding of classification variables ability to fit GEE models for clustered response data ability to perform Bayesian analysis by Gibbs sampling Getting Started GENMOD Procedure 2435 Getting Started GENMOD Procedure Poisson Regression You can use the GENMOD procedure to fit a variety of statistical models A typical use of PROC GENMOD is to perform Poisson regression You can use the Poisson distribution to model the distribution of cell counts in a multiway contingency table Aitkin et al 1989 have used this method to model insurance claims data Suppose the following hypothetical insurance claims data are classified by two factors age group with two levels and car type with three levels data insure input n c car age in log n datalines 500 42 small 1 1200 37 medium 1 100 1 large 1 400 101 small 2 500 73 medium 2 300 14 large 2 run In the preceding data set the variable n represents the number of insurance policyholders and the variable c represents the number of insurance claims The variable car is the type of car involved classified into three groups and the variable age is the age group of a policyholder classified into two groups You can use PROC GENMOD to perform a Poisson regression analysis of these data with a log
157. ily in alphabetical or increasing numeric order e Create an index on the BY variables by using the DATASETS procedure in Base SAS software For more information about B Y group processing see the discussion in SAS Language Reference Concepts For more information about the DATASETS procedure see the discussion in the Base SAS Procedures Guide 2474 Chapter 37 The GENMOD Procedure CLASS Statement CLASS variable lt options gt lt variable lt options gt gt lt options gt The CLASS statement names the classification variables to be used in the analysis The CLASS statement must precede the MODEL statement Most options can be specified either as individual variable options or as global options You can specify options for each variable by enclosing the options in parentheses after the variable name You can also specify global options for the CLASS statement by placing the options after a slash Global options are applied to all the variables specified in the CLASS statement If you specify more than one CLASS statement the global options specified in any one CLASS statement apply to all CLASS statements However individual CLASS variable options override the global options The following options are available CPREFIX n specifies that at most the first n characters of a CLASS variable name be used in creating names for the corresponding design variables The default is 32 min 32 max 2 f where
158. in the matrix H Q Wei and is summarized by the trace of Hi chi tr H The leverage h of the tth observation in the ith cluster is the tth diagonal element of H DFBETAC The effect of deleting cluster i on the estimated parameter vector is given by the following one step approximation for B fy DBETAC X WX X W5 Q E Case Deletion Diagnostic Statistics 2547 DFBETACS The cluster deletion statistic DFBETAC can be standardized using the variances of B based on the complete data The standardized one step approximation for the change in f due to deletion of cluster i is DBETAC j Pl XWX YF DBETACS DFBETAO Partition the matrices We and V as We Weit Weittt Weittie Weitt Vir Vi V w7 it it t er Vite Vite and let Ej Bit Yit fit and Eig Bii i The effect of deleting the tth observation from the ith cluster is given by the following one step approximation to B By z ies it Wal Oit eit DBETAO X WX X where Xit Xit Views Ving Xith Oir Xin X WX 1X and Ey Ei Viet Vip Fite Note that Weit Oit and Ej are scalars DFBETAOS The observation deletion statistic DFBETAO can be standardized using the variances of B based on the complete data The standardized one step approximation for the change in f due to deletion of observation in cluster i is DBETAOii BUX WX DBETAOSit
159. including the scale parameter if there is one The interval endpoints are displayed in a table as well as the values of the remaining parameters at the solution Wald Confidence Intervals You can request that PROC GENMOD produce Wald confidence intervals for the parameters The 1 a 100 Wald confidence interval for a parameter p is defined as B Zi a 20 where Zp is the 100pth percentile of the standard normal distribution B is the parameter estimate and G is the estimate of its standard error F Statistics Suppose that Do is the deviance resulting from fitting a generalized linear model and that D is the deviance from fitting a submodel Then under appropriate regularity conditions the asymptotic distribution of D Do is chi square with r degrees of freedom where r is the difference in the number of parameters between the two models and is the dispersion parameter If is unknown and is an estimate of based on the deviance or Pearson s chi square divided by degrees of freedom then under regularity conditions n p has an asymptotic chi square distribution with n p degrees of freedom Here n is the number of observations and p is the number of Lagrange Multiplier Statistics 2527 parameters in the model that is used to estimate Thus the asymptotic distribution of _ D Do rp is the F distribution with r and n p degrees of freedom assuming that D Do and n
160. inear models Examples of Generalized Linear Models You construct a generalized linear model by deciding on response and explanatory variables for your data and choosing an appropriate link function and response probability distribution Some examples of generalized linear models follow Explanatory variables can be any combination of continuous variables classification variables and interactions Traditional Linear Model e response variable a continuous variable e distribution normal e link function identity g u u 2432 Chapter 37 The GENMOD Procedure Logistic Regression e response variable a proportion e distribution binomial e link function logit g u log H Poisson Regression in Log Linear Model e response variable a count e distribution Poisson e link function log g u log Gamma Model with Log Link e response variable a positive continuous variable e distribution gamma e link function log g w log u The GENMOD Procedure The GENMOD procedure fits a generalized linear model to the data by maximum likelihood estimation of the parameter vector B There is in general no closed form solution for the maximum likelihood estimates of the parameters The GENMOD procedure estimates the parameters of the model numerically through an iterative fitting process The dispersion parameter is also estimated by maximum likelihood or optionally by the residual deviance or by Pearson s
161. ing Degrees of Freedom and p values ADJUST Determines the method for multiple comparison adjustment of LS means differences ALPHA a Determines the confidence level 1 a LOWER Performs one sided lower tailed inference STEPDOWN Adjusts multiple comparison p values further in a step down fash ion TESTVALUE Specifies values under the null hypothesis for tests UPPER Performs one sided upper tailed inference Statistical Output CL Constructs confidence limits for means and mean differences CORR Displays the correlation matrix of LS means COV Displays the covariance matrix of LS means E Prints the L matrix ELSM Prints the K matrix JOINT Produces a joint F or chi square test for the LS means and LS means differences PLOTS Requests ODS statistical graphics of means and mean comparisons SEED Specifies the seed for computations that depend on random numbers Generalized Linear Modeling CATEGORY Specifies how to construct estimable functions with multinomial data EXP Exponentiates and displays LS means estimates ILINK Computes and displays estimates and standard errors of LS means but not differences on the inverse linked scale For details about the syntax of the LSMESTIMATE statement see the section LSMESTIMATE Statement on page 496 of Chapter 19 Shared Concepts and Topics MODEL Statement 2491 MODEL Statement MODEL response lt effects gt lt options gt MODEL events trials
162. ingle quotes contrast specification identifies the effects and their coefficients from which the L matrix is formed The contrast specification can be specified in two different ways The first method applies to all models except the zero inflated ZI distributions zero inflated Poisson and zero inflated negative binomial and the syntax is effect values lt effect values gt The second method of specifying a contrast applies only to ZI models and the syntax is effect values lt effect values gt zero effect values lt effect values gt Specification of sets of effect values before the zero separator results in a row of the L matrix with coefficients for effects in the regression part of the model set to values and with the coefficients for the zero inflation part of the model set to zero Specification of sets of effect values after the zero separator results in a row of the L matrix with the coefficients for the regression part of the model set to zero and with the coefficients of effects in the zero inflation part of the model set to values For example the statements CLASS A MODEL y A CONTRAST Labell1 A 1 1 2478 Chapter 37 The GENMOD Procedure specify an L matrix with one row with coefficients 1 for the first level of A and 1 for the second level of A The statements CLASS A B MODEL y A Dist ZIP ZEROMODEL B CONTRAST Label2 A 1 1 ZERO B 1 1 specify an L matrix wi
163. ion structure of the working correlation matrix or the log odds ratio structure the within subject effect the subject effect the number of clusters the correlation matrix dimension and the minimum and maximum cluster size Log Odds Ratio Parameter Information If you specify the REPEATED statement and specify a log odds ratio model for binary data with the LOGOR2 option then the Log Odds Ratio Parameter Information table is displayed showing the correspondence between data pairs and log odds ratio model parameters Iteration History for GEE Parameter Estimates If you specify the REPEATED statement and the MODEL statement option ITPRINT the Itera tion History For GEE Parameter Estimates table is displayed The table contains the parameter identification number the iteration number and values of all parameters Displayed Output for Classical Analysis 2563 Last Evaluation of the Generalized Gradient and Hessian If you specify the REPEATED statement and select ITPRINT as a model option PROC GENMOD displays the Last Evaluation Of The Generalized Gradient And Hessian table GEE Parameter Estimate Covariance Matrices If you specify the REPEATED statement and the COVB option PROC GENMOD displays the Covariance Matrix Model Based and Covariance Matrix Empirical tables GEE Parameter Estimate Correlation Matrices If you specify the REPEATED statement and the CORRB option PROC GENMOD displays the
164. ion matrix GEE model based covari ance matrix GEE empirical correlation matrix GEE empirical covariance matrix GEE working correlation matrix Iteration history for con trasts Iteration history for likeli hood ratio confidence inter vals Iteration history for parame ter estimates Iteration history for GEE pa rameter estimates Iteration history for Type 3 statistics Likelihood ratio confidence intervals Coefficients for least squares means Least squares means differ ences Least squares means Lagrange statistics Last evaluation of the gener alized gradient and Hessian Last evaluation of the gradi ent and Hessian ODS Table Names 2569 Statement ESTIMATE REPEATED REPEATED REPEATED REPEATED REPEATED REPEATED REPEATED REPEATED REPEATED REPEATED MODEL CON TRAST MODEL MODEL MODEL REPEATED MODEL MODEL LSMEANS LSMEANS LSMEANS MODEL MODEL REPEATED MODEL Option E Default Default LOGOR Default MODELSE MCORRB MCOVB ECORRB ECOVB CORRW ITPRINT LRCI ITPRINT ITPRINT ITPRINT TYPE3 ITPRINT LRCI ITPRINT E DIFF Default NOINT NOSCALE ITPRINT ITPRINT Table 37 8 continued 2570 Chapter 37 The GENMOD Procedure ODS Table Name Description Statement Option LinDep Linearly dependent rows of CONTRAST Default contrasts ModelInfo Model information MODEL Default Modelfit Goodness of fit statistics MODEL Default withou
165. ion parameters are updated SORTED specifies that the input data are grouped by subject and sorted within subject If this option is not specified then the procedure internally sorts by subject effect and within subject effect if a within subject effect is specified SUBCLUSTER variable SUBCLUST variable specifies a variable defining subclusters for the 1 nested or k nested log odds ratio association modeling structures This variable must be listed in the CLASS statement 2506 Chapter 37 The GENMOD Procedure TYPE correlation structure keyword CORR correlation structure keyword specifies the structure of the working correlation matrix used to model the correlation of the responses from subjects Table 37 6 displays the correlation structure keywords and the corresponding correlation structures The default working correlation type is the independent CORR IND See the section Details GENMOD Procedure on page 2510 for definitions of the correlation matrix types You should specify LOGOR or TYPE but not both Table 37 6 Correlation Structure Types Keyword Correlation Matrix Type AR AR 1 Autoregressive 1 EXCH CS Exchangeable IND Independent MDEP number m dependent with m number UNSTR UN Unstructured USER FIXED matrix Fixed user specified correlation matrix For example you can specify a fixed 4 x 4 correlation matrix with the following option TYPE USER wo O wo OorOoO O oOoOrFO ow O W
166. ion parameters are used as the starting values for the simulation when noninformative prior distributions are used These are listed in the Initial Values and Seeds table in Figure 37 12 Figure 37 12 MCMC Initial Values and Seeds Initial Values of the Chain Chain Seed Intercept Logx1 x2 x3 x4 1 1 730 559 171 8758 4 301896 4 030878 18 1377 Initial Values of the Chain Dispersion 3223 694 Summary statistics for the posterior sample are displayed in the Fit Statistics Descriptive Statistics for the Posterior Sample Interval Statistics for the Posterior Sample and Posterior Correlation Matrix tables in Figure 37 13 Figure 37 14 Figure 37 15 and Figure 37 16 respectively Figure 37 13 Fit Statistics Fit Statistics DIC smaller is better 608 411 pD effective number of parameters 6 571 Bayesian Analysis of a Linear Regression Model 2445 Figure 37 14 Descriptive Statistics The GENMOD Procedure Bayesian Analysis Posterior Summaries Standard Percentiles Parameter N Mean Deviation 25 50 75 Intercept 10000 730 1 91 0133 789 6 729 6 670 5 Logx1 10000 171 7 40 3792 144 3 171 8 198 6 x2 10000 4 3000 0 5989 3 8990 4 2932 4 6951 x3 10000 4 0310 0 5354 3 6645 4 0265 4 3910 x4 10000 18 0888 12 8949 9 4919 18 0430 26 7881 Dispersion 10000 3795 9 770 4 3247 6 3694 7 4238 2 Figure 37 15 Interval Statistics Posterior Intervals Parameter Alph
167. irs method in which all nonmissing pairs of data are used in the moment Generalized Estimating Equations 2535 estimators of the working correlation parameters defined previously The resulting covariances and standard errors are valid under the missing completely at random MCAR assumption For example for the unstructured working correlation model 2 1 Mie K po X eijeik where the sum is over the units that have nonmissing measurements at times j and k and K is the number of units with nonmissing measurements at j and k Estimates of the parameters for other working correlation types are computed in a similar manner using available nonmissing pairs in the appropriate moment estimators The contribution of the ith unit to the w update equation is computed by omitting the elements of Y u the columns of D gt B and the rows and columns of V corresponding to missing measurements Parameter Estimate Covariances The model based estimator of Cov B is given by Zm B Ip where Op v H 5 T This is the GEE equivalent of the inverse of the Fisher information matrix that is often used in generalized linear models as an estimator of the covariance estimate of the maximum likelihood estimator of It is a consistent estimator of the covariance matrix of B if the mean model and the working correlation matrix are correctly specified The estimator y Lh is called the empirical or robust
168. isit j 1 4 and Mij E yij represents the mean of the respiratory status Since the response data are binary you can use the variance function for the binomial distribution v wij Mij 1 Hij and the logit link function g Wij log uij 1 wij The model for the mean is g 1i x j B where is a vector of regression parameters to be estimated Output 37 5 1 Respiratory Disorder Data t r b e a o c a s v v v v u e t e i i i i v t n m 1 s s s s i c fo t e s a i i i i i s b e i n e g n t t t t i m s r d t x e e 1 2 3 4 t e 1 1 1 P M 46 0 0 0 0 0 1 0 2 1 1 P M 46 0 0 0 0 0 2 0 3 1 1 P M 46 0 0 0 0 0 3 0 4 1 1 P M 46 0 0 0 0 0 4 0 5 1 2 P M 28 0 0 0 0 0 1 0 6 1 2 P M 28 0 0 0 0 0 2 0 7 1 2 P M 28 0 0 0 0 0 3 0 8 1 2 P M 28 0 0 0 0 0 4 0 9 1 3 A M 23 1 1 1 1 1 1 1 10 1 3 A M 23 1 1 1 1 1 2 1 11 1 3 A M 23 1 1 1 1 1 3 1 12 1 3 A M 23 1 1 1 1 1 4 1 13 1 4 P M 44 1 1 1 1 0 1 1 14 1 4 P M 44 1 1 1 1 0 2 1 15 1 4 P M 44 1 1 1 1 0 3 1 16 1 4 P M 44 1 1 1 1 0 4 0 17 1 5 P F 13 1 1 1 1 1 1 1 18 1 5 P F 13 1 I 1 1 1 2 1 19 1 5 P F 13 1 1 1 1 1 3 1 20 1 5 P F 13 1 1 1 1 1 4 1 The GFE solution is requested with the REPEATED statement in the GENMOD procedure The option SUBJECT ID CENTER specifies that the observations in a single cluster be uniquely 2588 Chapter 37 The GENMOD Procedure identified by center and id within center The option TYPE UNSTR specifies the unstructured working correlati
169. ith only an intercept term and then to include one additional explanatory variable in each successive 2434 Chapter 37 The GENMOD Procedure model You can measure the importance of the additional explanatory variable by the difference in deviances or fitted log likelihoods between successive models Asymptotic tests computed by the GENMOD procedure enable you to assess the statistical significance of the additional term The GENMOD procedure enables you to fit a sequence of models up through a maximum number of terms specified ina MODEL statement A table summarizes twice the difference in log likelihoods between each successive pair of models This is called a Type 1 analysis in the GENMOD procedure because it is analogous to Type I sequential sums of squares in the GLM procedure As with the PROC GLM Type I sums of squares the results from this process depend on the order in which the model terms are fit The GENMOD procedure also generates a Type 3 analysis analogous to Type II sums of squares in the GLM procedure A Type 3 analysis does not depend on the order in which the terms for the model are specified A GENMOD procedure Type 3 analysis consists of specifying a model and computing likelihood ratio statistics for Type III contrasts for each term in the model The contrasts are defined in the same way as they are in the GLM procedure The GENMOD procedure optionally computes Wald statistics for Type III contrasts This is computationall
170. ith the plus sign for odd r and minus sign for even r Exact Logistic and Poisson Regression 2553 Dispersion Scale or Precision Parameter i Let be the generalized linear model parameter you choose to sample either the dispersion scale or precision parameter Note that the Poisson and binomial distributions do not have this additional parameter For the first chain that the summary statistics and regression diagnostics are based on the default initial values are estimates of the mode of the posterior distribution If the INITIALMLE option is specified the initial values are the maximum likelihood estimates that is AM The initial values of the rth chain r gt 2 are given by F 4 2 45 A 10 he i Je with the plus sign for odd r and minus sign for even r OUTPOST Output Data Set The OUTPOST data set contains the generated posterior samples There are 3 variables where n is the number of model parameters The variable Iteration represents the iteration number the variable LogLike contains the log of the likelihood and the variable LogPost contains the log of the posterior The other n variables represent the draws of the Markov chain for the model parameters Exact Logistic and Poisson Regression The theory of exact logistic regression also called exact conditional logistic regression is described in the section Exact Conditional Logistic Regression on page 3974 of Chapter 51 The LOGISTI
171. ix specifies a replicated z matrix You specify z matrix data exactly as you do for the ZFULL case except that you specify only one complete cluster The z matrix for the one cluster is replicated for each cluster The number of observations in the ZDATA data set is nosia where nmax 1S the size of a complete cluster a cluster with no missing observations specifies direct input of the replicated z matrix You specify the z matrix for one cluster with the syntax LOGOR ZREP y1 y2 Z1 Z2 Zg where y and yz are numbers represent ing a pair of observations and the values Z1 Z2 Z make up the corresponding row of the z matrix The number of rows specified is nmax nmax 1 where nmax is the size of a complete cluster a cluster with no missing observations For example Generalized Estimating Equations 2539 LOGOR ZREP 1 2 1 3 1 4 2 3 2 4 3 4 PRPRPRPPRP Pe FRPROOO ws ys NSN specifies the ial 6 rows of the z matrix for a cluster of size 4 with q 2 log odds ratio parameters The log odds ratio for the pairs 1 2 1 3 1 4 is a1 and the log odds ratio for the pairs 2 3 2 4 3 4 is ay Q2 Quasi likelihood Information Criterion The quasi likelihood information criterion QIC was developed by Pan 2001 as a modification of the Akaike information criterion AIC to apply to models fit by GEEs Define the quasi likelihood under the independence working correlation assumption eval
172. ject case type exch covb corrw run The CLASS statement and the MODEL statement specify the model for the mean of the wheeze variable response as a logistic regression with city age and smoke as independent variables just as for an ordinary logistic regression The REPEATED statement invokes the GEE method specifies the correlation structure and controls the displayed output from the GEE model The option SUBJECT CASE specifies that individual subjects be identified in the input data set by the variable case The SUBJECT variable case must be listed in the CLASS statement Measurements on individual subjects at ages 9 10 11 and 12 are in the proper order in the data set so the WITHINSUBJECT option is not required The TYPE EXCH option specifies an exchangeable working correlation structure the COVB option specifies that the parameter estimate covariance matrix be displayed and the CORRW option specifies that the final working correlation be displayed Initial parameter estimates for iterative fitting of the GEE model are computed as in an ordinary generalized linear model as described previously Results of the initial model fit displayed as part of the generated output are not shown here Statistics for the initial model fit such as parameter estimates standard errors deviances and Pearson chi squares do not apply to the GEE model and are valid only for the initial model fit The following figures display information that applies to
173. kelihoods used in likelihood ratio tests are divided by The profile likelihood function used in computing confidence intervals is also divided by If you specify a WEIGHT statement is divided by the value of the WEIGHT variable for each observation This has the effect of multiplying the contributions of the log likelihood function the gradient and the Hessian by the value of the WEIGHT variable for each observation The SCALE option in the MODEL statement enables you to specify a value of o for the binomial and Poisson distributions If you specify the SCALE DEVIANCE option in the MODEL statement the procedure uses the deviance divided by degrees of freedom as an estimate of and all statistics are adjusted appropriately You can use Pearson s chi square instead of the deviance by specifying the SCALE PEARSON option The function obtained by dividing a log likelihood function for the binomial or Poisson distribu tion by a dispersion parameter is not a legitimate log likelihood function It is an example of a quasi likelihood function Most of the asymptotic theory for log likelihoods also applies to quasi likelihoods which justifies computing standard errors and likelihood ratio statistics by using quasi likelihoods instead of proper log likelihoods See McCullagh and Nelder 1989 Chapter 9 McCullagh 1983 and Hardin and Hilbe 2003 for details on quasi likelihood functions Although the estimate of the dispersion parame
174. l Richman and Hansen 1990 and reanalyzed by Lin Wei and Ying 2002 The study randomly assigned 360 HIV patients to the drug AZT and 351 patients to placebo CD4 counts were measured repeatedly over the course of the study The data used here are the 4328 measurements taken in the first 40 weeks of the study The analysis focuses on the time trend of the response The first model considered is E yik Bo BiTiz b213 B3RiTix BaRiT where 7 is the time in weeks of the kth measurement on the ith patient y is the CD4 count at 7 for the ith patient and R is the indicator of AZT for the ith patient Normal errors and an independent working correlation are assumed Example 37 9 Assessment of a Marginal Model for Dependent Data 2605 The following statements create the SAS data set cd4 data cd4 input Id Y Time Time2 TrtTime TrtTime2 Time3 Time2 x Time TrtTime3 TrtTime2 x Time datalines 1 264 00024 0 28571 0 08163 0 28571 0 08163 1 175 00070 4 14286 17 16327 4 14286 17 16327 1 306 00150 8 14286 66 30612 8 14286 66 30612 1 331 99835 12 14286 147 44898 12 14286 147 44898 1 309 99929 16 14286 260 59184 16 14286 260 59184 1 185 00077 28 71429 824 51020 28 71429 824 51020 1 175 00070 40 14286 1611 44898 40 14286 1611 44898 more lines 711 488 00224 12 14286 147 44898 12 14286 147 44898 711 240 00026 18 14286 329 16327 18 14286 329 16327 run The following SAS statements fit the preceding model creat
175. l distribution 2511 observed information matrix 2517 offset 2497 2560 offset variable 2436 ordering of effects 2457 ordinal data 2582 output data sets 2553 2554 output ODS Graphics table names 2572 output table names 2568 overdispersion 2521 Pearson residuals 2529 Pearson s chi square 2491 2517 2519 Poisson distribution 2512 Poisson regression 2435 polynomial effects 2522 profile likelihood confidence intervals 2495 2525 programming statements 2502 QIC 2539 quasi likelihood 2521 quasi likelihood functions 2539 quasi likelihood information criterion 2539 raw residuals 2528 regression parameters estimation 2432 regressor effects 2522 repeated measures 2429 2532 residuals 2497 2528 2529 _RESP_ automatic variable 2502 scale parameter 2514 scaled deviance 2517 2518 score Statistics 2527 singular contrast matrix 2479 stratified exact logistic regression 2507 stratified exact Poisson regression 2507 subpopulation 2491 suppressing output 2462 Type 1 analysis 2434 2523 Type 3 analysis 2434 2524 user defined link function 2487 variance function 2433 Wald confidence intervals 2498 2526 working correlation matrix 2504 2506 2532 _XBETA_ automatic variable 2502 zero inflated models 2530 zero inflated negative binomial distribution 2513 zero inflated Poisson distribution 2513 geometric distribution GENMOD procedure 2512 goodness of fit GENMOD procedure
176. le lists the parameters and their observed sufficient statistics Monte Carlo Conditional Exact Tests This table tests the hypotheses that the parameters of interest are insignificant See the section Exact Logistic and Poisson Regression on page 2553 for details 2568 Chapter 37 The GENMOD Procedure Monte Carlo Exact Parameter Estimates Displays if you specify the ESTIMATE option in the EXACT statement This table gives individual parameter estimates for each variable conditional on the values of all the other parameters in the model confidence limits and a two sided p value twice the one sided p value for testing that the parameter is zero See the section Exact Logistic and Poisson Regression on page 2553 for details Monte Carlo Exact Odds Ratios Displays if you specify the ESTIMATE ODDS or ESTIMATE BOTH option in the EXACT statement See the section Exact Logistic and Poisson Regression on page 2553 for details Strata Summary Displays if a STRATA statement is also specified Shows the pattern of the number of events and the number of nonevents or of the number of observations in a stratum See the section STRATA Statement on page 2507 for more information Strata Information Displays if a STRATA statement is specified with the INFO option ODS Table Names PROC GENMOD assigns a name to each table that it creates You can use these names to reference the table when using the Outp
177. le of the chi square distribution with one degree of freedom The endpoints of the confidence interval can be found by solving numerically for values of 6 that satisfy equality in the preceding relation PROC GENMOD solves this by starting at the maximum likelihood estimate of B The log likelihood function is approximated with a quadratic surface for which an exact solution is possible The process is iterated until convergence to an endpoint is attained The process is repeated for the other endpoint Convergence is controlled by the CICONV option in the MODEL statement Suppose e is the number specified in the CICONV option The default value of e is 1074 Let the parameter of interest be fj and define r uj the unit vector with a 1 in position j and Os elsewhere Convergence is declared on the current iteration if the following two conditions are satisfied I B lol lt s Ar H s Ar lt e where 8 s and H are the log likelihood the gradient and the Hessian evaluated at the current parameter vector and A is a constant computed by the procedure The first condition for convergence means that the log likelihood function must be within of the correct value and the second condition means that the gradient vector must be proportional to the restriction vector r When you specify the LRCI option in the MODEL statement PROC GENMOD computes profile likelihood confidence intervals for all parameters in the model
178. leverage by clus PROC PLOTS ter number CooksDPlot Cook s distance PROC PLOTS CumResidPanel Panel of aggregates of ASSESS CRPANEL residuals CumulativeResiduals Model assessment based ASSESS Default on aggregates of residu als DevianceResidBy XBeta Deviance residuals by PROC PLOTS linear predictor DevianceResidualPlot Deviance values PROC PLOTS DFBETAByCluster Cluster DFBeta by clus PROC PLOTS ter number DFBETAPIot DFBeta PROC PLOTS Table 37 11 ODS Table Name continued DiagnosticPlot LeveragePlot LikeResidByXBeta LikeResidualPlot PearsonResidByXBeta PearsonResidualPlot PredictedByObservation RawResidBy XBeta RawResidualPlot StdDevianceResidBy XBeta StdDevianceResidualPlot StdDFBETAByCluster StdDFBETAPlot StdPearsonResidBy XBeta StdPearsonResidualPlot TAPanel TADPanel TDPanel TracePanel TracePlot ZeroInflationProbPlot Description Panel of residuals in fluence and diagnostic statistics Leverage Likelihood residuals by linear predictor Likelihood residuals Pearson residuals by lin ear predictor Pearson residuals Predicted values Raw residuals by linear predictor Raw residuals Standardized deviance residuals by linear predictor Standardized deviance residuals Standardized cluster DF Beta by cluster number Standardized DFBeta Standardized Pearson residuals by linear predictor Standardized Pearson residuals Trace and autocorrela tion function panel Trace autocorrel
179. link function This type of model is sometimes called a log linear model Assume that the number of claims c has a Poisson probability distribution and that its mean j1 is related to the factors car and age for observation i by l log n x B log ni Bo car 1 61 car 2 62 car 3 63 age 1 B4 age 2 Bs log ui The indicator variables car j and age j are associated with the jth level of the variables car and 2436 Chapter 37 The GENMOD Procedure age for observation i 1 ifcar j 7 oari 7 0 ifcar j The fs are unknown parameters to be estimated by the procedure The logarithm of the variable n is used as an offset that is a regression variable with a constant coefficient of 1 for each observation A log linear relationship between the mean and the factors car and age is specified by the log link function The log link function ensures that the mean number of insurance claims for each car and age group predicted from the fitted model is positive The following statements invoke the GENMOD procedure to perform this analysis proc genmod data insure class car age model c car age dist poisson link log offset ln run The variables car and age are specified as CLASS variables so that PROC GENMOD automatically generates the indicator variables associated with car and age The MODEL statement specifies c as the response variable and car and age as explanatory variables An i
180. link function specified in the MODEL statement for both the Poisson and the negative binomial The covariates z for observation i are determined by the model specified in the ZEROMODEL statement and the covariates x are determined by the model specified in the MODEL statement The regression parameters y and are estimated by maximum likelihood The mean and variance of Y for the zero inflated Poisson are given by II E Y Var Y u l ow O 92 b p l o and for the zero inflated negative binomial by E Y w 1l o a w k 2 pee ee 0 l o 1l II Var Y II You can request that the mean of Y be displayed for each observation in an output data set with the PRED keyword 2532 Chapter 37 The GENMOD Procedure Generalized Estimating Equations Let yij j 1 ni i 1 K represent the jth measurement on the ith subject There are ni Measurements on subject i and Ei n total measurements Correlated data are modeled using the same link function and linear predictor setup systematic component as the independence case The random component is described by the same variance functions as in the independence case but the covariance structure of the correlated measurements must also be modeled Let the vector of measurements on the ith subject be Y yi1 Yin with corresponding vector of means w Wi1 Hin and let V be the covariance matrix of Y Let the vector of independent o
181. lly specify one of the following keywords PARM specifies that the parameters be estimated This is the default EXACT Statement 2483 ODDS specifies that the odds ratios be estimated If you have classification variables then you must also specify the PARAM REF option in the CLASS statement BOTH specifies that both the parameters and odds ratios be estimated JOINT performs the joint test that all of the parameters are simultaneously equal to zero performs individual hypothesis tests for the parameter of each continuous variable and performs joint tests for the parameters of each classification variable The joint test is indicated in the Conditional Exact Tests table by the label Joint JOINTONLY performs only the joint test of the parameters The test is indicated in the Conditional Exact Tests table by the label Joint When this option is specified individual tests for the parameters of each continuous variable and joint tests for the parameters of the classification variables are not performed MIDPFACTOR 6 51 52 sets the tie factors used to produce the mid p hypothesis statistics and the mid p confidence intervals 5 modifies both the hypothesis tests and confidence intervals while 52 affects only the hypothesis tests By default 6 0 5 and 62 1 0 See the section Exact Logistic and Poisson Regression on page 2553 for details ONESIDED requests one sided confidence intervals and p
182. lman and Rubin Diagnostics on page 160 for details GEWEKE lt geweke options gt computes the Geweke spectral density diagnostics which are essentially a two sample t test between the first f portion and the last f portion of the chain The default is fi 0 1 and f2 0 5 but you can choose other fractions by using the following geweke options FRAC1 value specifies the fraction f for the first window FRAC2 value specifies the fraction fz for the second window See the section Geweke Diagnostics on page 162 for details HEIDELBERGER lt heidel options gt computes the Heidelberger and Welch diagnostic for each variable which consists of a stationarity test of the null hypothesis that the sample values form a stationary process If the stationarity test is not rejected a halfwidth test is then carried out Optionally you can specify one or more of the following heidel options SALPHA value specifies the a level 0 lt a lt 1 for the stationarity test HALPHA value specifies the a level 0 lt a lt 1 for the halfwidth test EPS value specifies a positive number e such that if the halfwidth is less than e times the sample mean of the retained iterates the halfwidth test is passed See the section Heidelberger and Welch Diagnostics on page 164 for details MCSE MCERROR computes the Monte Carlo standard error for each parameter The Monte Caro standard error which measures the simul
183. logistic model you can analyze 1 1 1 n m n and general m n matched sets where the number of cases and controls varies across strata For a stratified Poisson model you can have any number of 2508 Chapter 37 The GENMOD Procedure observations in each stratum At least one variable must be specified to invoke the stratified analysis and the usual unconditional asymptotic analysis is not performed The stratified logistic model has the form logit 1_ xhip where zp is the event probability for the ith observation in stratum h with covariates xp and where the stratum specific intercepts are the nuisance parameters that are to be conditioned out STRATA variables can also be specified in the MODEL statement as classification or continuous covariates however the effects are nondegenerate only when crossed with a nonstratification variable Specifying several STRATA statements is the same as specifying one STRATA statement that contains all the strata variables The STRATA variables can be either character or numeric and the formatted values of the STRATA variables determine the levels Thus you can also use formats to group values into levels see the discussion of the FORMAT procedure in the Base SAS Procedures Guide The Strata Summary table is displayed by default For an exact logistic regression it displays the number of strata that have a specific number of events and non events For example if you are analyzing
184. luence of single observations on the fitted model For the generalized linear model the variance of the ith individual observation is given by pV ui Wi i where is the dispersion parameter w is a user specified prior weight if not specified w 1 hi is the mean and V z is the variance function Let Wei vj g ui for the ith observation where g j 1 is the derivative of the link function evaluated at u Let We be the diagonal matrix with we denoting the ith diagonal element The weight matrix We is used in computing the expected information matrix Define h as the ith diagonal element of the matrix 1 1 W2X X W X 1 X W2 The Pearson residuals standardized to have unit asymptotic variance are given by Yi Hi rPi Vui l hi You can request standardized Pearson residuals in an output data set with the keyword STDRESCHI in the OUTPUT statement The deviance residuals standardized to have unit asymptotic variance are given by sign yj Hi V di rDi Vol hi where dj is the contribution to the total deviance from observation i and sign y ui is 1 if yi Wj is positive and 1 if y ui is negative You can request standardized deviance residuals in an output data set with the keyword STDRESDEV in the OUTPUT statement The likelihood residuals are defined by rgj sign yj n a hore hr You can request likelihood residuals in an output data set with t
185. m is convenient for using complex statements such as IF THEN ELSE clauses The DEVIANCE statement is ignored unless the VARIANCE statement is also specified EFFECTPLOT Statement EFFECTPLOT lt plot type lt plot definition options gt gt lt options gt The EFFECTPLOT statement produces a display of the fitted model and provides options for changing and enhancing the displays Table 37 2 describes the available plot types and their plot definition options Table 37 2 Plot Types and Plot Definition Options Description Plot Definition Options BOX plot type Displays a box plot of continuous response data at each level of a CLASS effect with predicted values superimposed and connected by a line This is an alternative to the INTERACTION plot type PLOTB Y variable or CLASS effect X CLASS variable or effect CONTOUR plot type Displays a contour plot of predicted values against two continuous covariates PLOTB Y variable or CLASS effect X continuous variable Y continuous variable FIT plot type Displays a curve of predicted values versus a continuous variable INTERACTION plot type Displays a plot of predicted values possibly with error bars versus the levels of a CLASS effect The predicted values are connected with lines and can be grouped by the levels of another CLASS effect PLOTB Y variable or CLASS effect X continuous variable PLOTB Y variable or CLASS effect SLICEB Y variable or CLASS
186. mate The p values for T are computed based on the chi square distribution with r degrees of freedom Assessment of Models Based on Aggregates of Residuals 2541 Assessment of Models Based on Aggregates of Residuals Lin Wei and Ying 2002 present graphical and numerical methods for model assessment based on the cumulative sums of residuals over certain coordinates such as covariates or linear predictors or some related aggregates of residuals The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero mean Gaussian processes whose realizations can be generated by simulation Each observed residual pattern can then be compared both graphically and numerically with a number of realizations from the null distribution Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation These procedures are useful in determining appropriate functional forms of covariates and link function You use the ASSESSIASSESSMENT statement to perform this kind of model checking with cumulative sums of residuals moving sums of residuals or LOESS smoothed residuals See Example 37 8 and Example 37 9 for examples of model assessment Let the model for the mean be glui x B where u is the mean of the response y and x is the vector of covariates for the ith observation Denote the raw residual resulting from fitting the
187. me with the CLASS levels However for the POLYNOMIAL and orthogonal parameteri zations parameter names are formed by concatenating the CLASS variable name and keywords that reflect the parameterization See the section Other Parameterizations on page 414 in Chapter 19 Shared Concepts and Topics for examples and further details Class Variable Parameterization with Unbalanced Designs PROC GENMOD initially parameterizes the CLASS variables by looking at the levels of the variables across the complete data set If you have an unbalanced replication of levels across variables or BY groups then the design matrix and the parameter interpretation might be different from what you expect For instance suppose you have a model with one CLASS variable A with three levels 1 2 and 3 and another CLASS variable B with two levels 1 and 2 If the third level of A occurs only with the first level of B if you use the EFFECT parameterization and if your model contains the effect A B and an intercept then the design for A within the second level of B is not a differential effect In particular the design looks like the following Design Matrix A B 1 A B 2 B A Al A2 Al A2 1 1 1 0 0 0 1 2 0 1 0 0 1 3 1 l 0 0 2 1 0 0 1 0 2 2 0 0 0 1 CONTRAST Statement 2477 PROC GENMOD detects linear dependency among the last two design variables and sets the parameter for A2 B 2 to zero resulting in an interpretation of these parameters
188. ment 2482 Chapter 37 The GENMOD Procedure EXACT Statement EXACT lt label gt lt INTERCEPT gt lt effects gt lt options gt The EXACT statement performs exact tests of the parameters for the specified effects and optionally estimates the parameters and outputs the exact conditional distributions You can specify the keyword INTERCEPT and any effects in the MODEL statement Inference on the parameters of the specified effects is performed by conditioning on the sufficient statistics of all the other model parameters possibly including the intercept You can specify several EXACT statements but they must follow the MODEL statement Each statement can optionally include an identifying label If several EXACT statements are specified any statement without a label is assigned a label of the form Exact where n indicates the nth EXACT statement The label is included in the headers of the displayed exact analysis tables If a STRATA statement is also specified then a stratified exact logistic regression or a stratified exact Poisson regression is performed The model contains a different intercept for each stratum and these intercepts are conditioned out of the model along with any other nuisance parameters parameters for effects specified in the MODEL statement that are not in the EXACT statement The ASSESSMENT BAYES CONTRAST EFFECTPLOT ESTIMATE LSMEANS LSMESTI MATE OUTPUT SLICE and STORE statements a
189. meter covariance matrix Both model based and empiri cal covariances are displayed ECORRB displays the estimated regression parameter empirical correlation matrix ECOVB displays the estimated regression parameter empirical covariance matrix INTERCEPT number specifies either an initial or a fixed value of the intercept regression parameter in the GEE model If you specify the NOINT option in the MODEL statement then the intercept is fixed at the value of number INITIAL numbers specifies initial values of the regression parameters estimation other than the intercept parame ter for GEE estimation If this option is not specified the estimated regression parameters assuming independence for all responses are used for the initial values LOGOR og odds ratio structure keyword specifies the regression structure of the log odds ratio used to model the association of the responses from subjects for binary data The response syntax must be of the single variable type the distribution must be binomial and the data must be binary Table 37 5 displays the log odds ratio structure keywords and the corresponding log odds ratio regression structures See the section Alternating Logistic Regressions on page 2536 for definitions of the log odds ratio types and examples of specifying log odds ratio models You should specify either the LOGOR or the TYPE option but not both REPEATED Statement 2505 Table 37 5 Log Odds Ratio Regression
190. model a response that can take values from a number of categories The binomial is a special case of the multinomial with two categories See the section Multinomial Models on page 2529 and McCullagh and Nelder 1989 Chapter 5 for a description of the multinomial distribution The zero inflated Poisson and zero inflated negative binomial are included in PROC GENMOD even though they are not generalized linear models They are useful extensions of generalized linear models See the section Zero Inflated Models on page 2530 for information about the zero inflated distributions In addition you can easily define your own link functions or distributions through DATA step programming statements used within the procedure An important aspect of generalized linear modeling is the selection of explanatory variables in the model Changes in goodness of fit statistics are often used to evaluate the contribution of subsets of explanatory variables to a particular model The deviance defined to be twice the difference between the maximum attainable log likelihood and the log likelihood of the model under consideration is often used as a measure of goodness of fit The maximum attainable log likelihood is achieved with a model that has a parameter for every observation See the section Goodness of Fit on page 2517 for formulas for the deviance One strategy for variable selection is to fit a sequence of models beginning with a simple model w
191. n and binomial this option is ignored Note that you can specify Gibbs sampling on either the dispersion parameter the scale parameter o 2 or the precision parameter t with the DPRIOR SPRIOR and PPRIOR options respectively These three parameters are transformations of one another and you should specify Gibbs sampling for only one of them A gamma prior G a b with density f t boy is specified by PRECISION PRIOR GAMMA which can be followed by one of the following gamma options enclosed in parentheses The hyperparameters a and b are the shape and inverse scale parameters of the gamma distribution respectively See the section Gamma Prior on page 2550 for details The default is G 10 1074 RELSHAPE lt c gt specifies independent G c c distribution where is the MLE of the dispersion parameter With this choice of hyperparameters the mean of the prior distribution is and the variance is By default c 1074 SHAPE a ISCALE b when both specified results in a G a b prior SHAPE c when specified alone results in an G c c prior ISCALE c when specified alone results in an G c c prior An improper prior with density f t proportional to t is specified with PRECISION PRIOR IMPROPER PLOTS lt global plot options gt plot request PLOTS lt global plot options gt plot request lt plot request gt controls the display of diagnostic plots Three types of plots can be req
192. n can be useful in case of convergence difficulty The intercept parameter is initialized with the INTERCEPT option and is not included here The values are assigned to the variables in the MODEL statement in the same order in which they appear in the MODEL statement The order of levels for CLASS variables is determined by the ORDER option Note that some levels of classification variables can be aliased that is they correspond to linearly dependent parameters that are not estimated by the procedure Initial values must be assigned to all levels of classification variables regardless of whether they are aliased or not The procedure ignores initial values corresponding to parameters not being estimated If you 2494 Chapter 37 The GENMOD Procedure specify a BY statement all classification variables must take on the same number of levels in each BY group Otherwise classification variables in some of the BY groups are assigned incorrect initial values Types of INITIAL specifications are illustrated in the following table Type of List Specification List separated by blanks INITIAL 345 List separated by commas INITIAL 3 4 5 x toy INITIAL 3 to 5 x toy byz INITIAL 3 to 5 by 1 Combination of list types INITIAL 1 3 to 5 9 INTERCEPT number INTERCEPT number list initializes the intercept term to number for parameter estimation If you specify both the INTERCEPT and the NOINT options the intercept term is not estimated bu
193. n in the MODEL statement Iteration History for LR Confidence Intervals If you specify the ITPRINT and LRCI model options PROC GENMOD displays an iteration history table for profile likelihood based confidence intervals For each parameter in the model PROC GENMOD displays the parameter identification number the iteration number the log likelihood value parameter values Likelihood Ratio Based Confidence Intervals for Parameters If you specify the LRCI and the ITPRINT options in the MODEL statement a table is displayed that summarizes profile likelihood based confidence intervals for all parameters For each parameter in the model the table displays the confidence coefficient the parameter identification number lower and upper endpoints of confidence intervals for the parameter and values of all other parameters at the solution Displayed Output for Classical Analysis 2559 LR Statistics for Type 1 Analysis If you specify the TYPE1 model option a table is displayed that contains the name of the effect the deviance for the model including the effect and all previous effects the degrees of freedom for the effect the likelihood ratio statistic for testing the significance of the effect and the p value computed from the chi square distribution with the effect s degrees of freedom If you specify either the SCALE DEVIANCE or SCALE PEARSON option in the MODEL statement columns are displayed that contain the name of the effect t
194. n your model then the rows of this table labeled as Heat show the joint significance of all the Heat effect parameters in that reduced model In this case a model that contains only the Heat parameters still explains a significant amount of the variability however you can see that a model that contains only the Soak parameters would not be significant The Exact Parameter Estimates table in Output 37 11 4 displays parameter estimates and tests of significance for the levels of the CLASS variables Again the Heat 7 parameter has some difficulties however in the exact analysis a median unbiased estimate is computed for the parameter instead of a maximum likelihood estimate The confidence limits show that the Heat variable contains some explanatory power while the categorical Soak variable is insignificant and can be dropped from the model Output 37 11 4 Exact Parameter Estimates Exact Parameter Estimates Standard 95 Confidence Parameter Estimate Error Limits p Value Heat 7 2 7552 5 Infinity 0 4267 0 0199 Heat 14 3 0255 1 0128 5 7450 0 6194 0 0113 Heat 27 1 7846 0 8065 3 6779 0 2260 0 0844 Soak 1 0 3231 1 1717 2 8673 3 6754 1 0000 Soak 1 7 0 5375 1 1284 1 8056 4 4588 1 0000 Soak 2 2 0 4035 1 2347 2 5785 4 5054 1 0000 Soak 2 8 0 1661 1 4214 4 5490 4 2168 1 0000 NOTE indicates a median unbiased estimate 2626 Chapter 37 The GENMOD Procedure NOTE If you want to make predictions from the exa
195. nd all statistics such as standard errors and likelihood ratio statistics are adjusted appropriately Specifying SCALE DEVIANCE or SCALE D is the same as specifying the DSCALE option This fixes the scale parameter at a value of 1 in the estimation procedure After the parameter estimates are determined the exponential family dispersion parameter is assumed to be given by the deviance divided by the degrees of freedom All statistics such as standard errors and likelihood ratio statistics are adjusted appropriately SCORING number requests that on iterations up to number the Hessian matrix be computed using the Fisher scoring method For further iterations the full Hessian matrix is computed The default value is 1 A value of 0 causes all iterations to use the full Hessian matrix and a value greater than or equal to the value of the MAXITER option causes all iterations to use Fisher scoring The value of the SCORING option must be 0 or a positive integer SINGULAR number TYPE1 TYPE3 WALD sets the tolerance for testing singularity of the information matrix and the crossproducts matrix Roughly the test requires that a pivot be at least this number times the original diagonal value By default number is 10 times the machine epsilon The default number is approximately 107 on most machines This value also controls the check on estimability for ESTIMATE and CONTRAST statements requests that a Type 1 or sequential analysis
196. network this method does not reject any of the samples at the cost of using a large amount of memory to create the network METHOD NETWORKMC is most useful for producing parameter estimates for problems that are too large for the DIRECT and NETWORK methods to handle and for which asymptotic methods are invalid for example for sparse data on a large grid 2486 Chapter 37 The GENMOD Procedure N n specifies the number of Monte Carlo samples to take when the METHOD NETWORKMC option is specified By default n 10 000 If the procedure cannot obtain n samples due to a lack of memory then a note is printed in the SAS log the number of valid samples is also reported in the listing and the analysis continues The number of samples used to produce any particular statistic might be smaller than n For example let X1 and X2 be continuous variables denote their joint distribution by f X1 X2 and let f X1 X2 x2 denote the marginal distribution of X1 conditioned on the observed value of X2 If you request the JOINT test of X1 and X2 then n samples are used to generate the estimate f X1 X2 of f X1 X2 from which the test is computed However the parameter estimate for X1 is computed from the subset of f X1 X2 that has X2 x2 and this subset need not contain n samples Similarly the distribution for each level of a classification variable is created by extracting the appropriate subset from the joint distribution for the CLASS variabl
197. nformation about the graphics available in the GENMOD procedure see the section ODS Graphics on page 2572 2600 Chapter 37 The GENMOD Procedure Output 37 8 3 Cumulative Residual Plot for Linear X1 Fit Cumulative Residuals Checking Functional Form for X1 Observed Path and First 20 Simulated Paths 0 04 0202 ae a E aa cnet gog ee ar Me cae ook UM e e a earraig ee md Pr gt MaxAbsVal 0 1084 0 04 10000 Simulations X1 Example 37 8 Model Assessment of Multiple Regression Using Aggregates of Residuals 2601 Output 37 8 4 Cumulative Residual Panel Plot for Linear X1 Fit Checking Functional Form for X1 Observed Path and First 8 Simulated Paths 0 00 Cum Resid 0 02 0 04 0 02 0 00 Cum Resid 0 02 0 04 Output 37 8 5 Summary of Model Assessment Assessment Summary Maximum Assessment Absolute Pr gt Variable Value Replications Seed MaxAbsVal X1 0 0380 10000 603708000 0 1084 The p value of 0 1084 reported on Output 37 8 3 and Output 37 8 5 suggests that a more adequate model might be possible The observed cumulative residuals in Output 37 8 3 and Output 37 8 4 represented by the heavy lines seem atypical of the simulated curves represented by the light lines reinforcing the conclusion that a more appropriate functional form for X1 is possible The cumulative residual plots in Output 37 8 6 provide guidance in determining a more appropriate functional form The four
198. ns gt GAMMA lt options gt IMPROPER DPRIOR GAMMA lt options gt GAMMA lt options gt IMPROPER specifies that Gibbs sampling be performed on the generalized linear model dispersion param eter and the prior distribution for the dispersion parameter if there is a dispersion parameter in the model For models that do not have a dispersion parameter the Poisson and binomial this option is ignored Note that you can specify Gibbs sampling on either the dispersion parameter the scale parameter o 2 or the precision parameter t with the DPRIOR SPRIOR and PPRIOR options respectively These three parameters are transformations of one another and you should specify Gibbs sampling for only one of them A gamma prior G a b with density f t bon is specified by DISPERSION PRIOR GAMMA which can be followed by one of the following gamma options enclosed in parentheses The hyperparameters a and b are the shape and inverse scale parameters of the gamma distribution respectively See the section Gamma Prior on page 2550 for details The default is G 10 1074 RELSHAPE lt c gt specifies independent G c c distribution where is the MLE of the dispersion parameter With this choice of hyperparameters the mean of the prior distribution is and the variance is e By default c 1074 SHAPE a ISCALE b when both specified results in a G a b prior SHAPE c when specified alone results in
199. nstitute Inc All rights reserved 518177_1US 0109
200. ntercept term is included by default Thus the model matrix X the matrix that has as its ith row the transpose of the covariate vector for the ith observation consists of a column of 1s representing the intercept term and columns of Os and 1s derived from indicator variables representing the levels of the car and age variables That is the model matrix is l 0 0o 1 0 1 0 1 O 1 0 1 0 0 1 1 0 x EA ea 1 0 1 ofo 1 10 0 1 o 1 where the first column corresponds to the intercept the next three columns correspond to the variable car and the last two columns correspond to the variable age The response distribution is specified as Poisson and the link function is chosen to be log That is the Poisson mean parameter jy is related to the linear predictor by log u x The logarithm of n is specified as an offset variable as is common in this type of analysis In this case the offset variable serves to normalize the fitted cell means to a per policyholder basis since the total number of claims not individual policyholder claims is observed Poisson Regression 2437 PROC GENMOD produces the following default output from the preceding statements Figure 37 1 Model Information The GENMOD Procedure Model Information Data Set WORK INSURE Distribution Poisson Link Function Log Dependent Variable c Offset Variable in The Model Information table displayed in Figure 37 1 provides information ab
201. number 2 of possible y vectors to be considered Since a Poisson distributed response variable can take an infinite number of values there is an infinite number of y vectors to be scanned The offset variable reduces this number to IF Ni response vectors On a practical level as N gets large the probability of the Poisson random variable achieving this value drops to zero so N can be thought of as the point at which you believe the value does not matter If you are modeling rates then N is the maximum possible value for each observation in the experiment for example if you are counting the number of rats in a cage that acquire a disease then N is the number of rats in cage i Finally if you are conditioning out the intercept and denoting the observed response as yo every N has an effective maximum of 7_ Yoi which is the sufficient statistic for the intercept term OUTDIST Output Data Set The OUTDIST data set contains every exact conditional distribution necessary to process the corresponding EXACT statement For example the following statements create one distribution for the x1 parameter and another for the x2 parameters and produce the data set dist shown in Table 37 7 data test input y x1 x2 count datalines O 1 PHRPRPHRORO NNNFOFRROO NRPONNFPHO PNWRPWHENPEB e proc genmod data test exactonly class x2 param ref model y x1 x2 d b exact x1 x2 outdist dist proc print data dist run Exact L
202. oO 0 eOOO oO wo e V6CORR specifies that the SAS Version 6 method of computing the normalized Pearson chi square be used for working correlation estimation and for model based covariance matrix scale factor WITHINSUBJECT WITHIN within subject effect defines an effect specifying the order of measurements within subjects Each distinct level of the within subject effect defines a different response from the same subject If the data are in proper order within each subject you do not need to specify this option If some measurements do not appear in the data for some subjects this option properly orders the existing measurements and treats the omitted measurements as missing values If the WITHINSUBJECT option is not used in this situation measurements might be improperly ordered and missing values assumed for the last measurements in a cluster Variables used in defining the within subject effect must be listed in the CLASS statement YPAIR variable list specifies the variables in the ZDATA data set corresponding to pairs of responses for log odds ratio association modeling SLICE Statement 2507 ZDATA SAS data set specifies a SAS data set containing either the full z matrix for log odds ratio association modeling or the z matrix for a single complete cluster to be replicated for all clusters ZROW variable list specifies the variables in the ZDATA data set corresponding to rows of the z matrix for log odds ratio associati
203. oblem by using IF THEN ELSE clauses or other conditional statements to check for possible error conditions and appropriately define the functions for these cases Data set variables can be referenced in user definitions of the link function and response distributions by using programming statements and the FWDLINK INVLINK DEVIANCE and VARIANCE statements See the DEVIANCE VARIANCE FWDLINK and INVLINK statements for more information REPEATED Statement REPEATED SUBJECT subject effect lt options gt The REPEATED statement specifies the covariance structure of multivariate responses for GEE model fitting in the GENMOD procedure In addition the REPEATED statement controls the iterative fitting algorithm used in GEEs and specifies optional output Other GENMOD procedure statements such as the MODEL and CLASS statements are used in the same way as they are for ordinary generalized linear models to specify the regression model for the mean of the responses SUBJECT subject effect identifies subjects in the input data set The subject effect can be a single variable an interaction effect a nested effect or a combination Each distinct value or level of the effect identifies a different subject or cluster Responses from different subjects are assumed to be statistically independent and responses within subjects are assumed to be correlated A subject effect must be specified and variables used in defining the subject effect m
204. of each additional term in the Type analysis See the section F Statistics on page 2526 for a definition of F statistics This Type 1 analysis has the general property that the results depend on the order in which the terms of the model are fitted The terms are fitted in the order in which they are specified in the MODEL statement Type 3 Analysis A Type 3 analysis is similar to the Type III sums of squares used in PROC GLM except that likelihood ratios are used instead of sums of squares First a Type III estimable function is defined for an effect of interest in exactly the same way as in PROC GLM Then maximum likelihood estimates are computed under the constraint that the Type III function of the parameters is equal to 0 by using constrained optimization Let the resulting constrained parameter estimates be B and the log likelihood be B Then the likelihood ratio statistic S 2 1 B 1 B where B is the unconstrained estimate has an asymptotic chi square distribution under the hypothesis that the Type II contrast is equal to 0 with degrees of freedom equal to the number of parameters associated with the effect When a Type 3 analysis is requested PROC GENMOD produces a table that contains the likelihood ratio statistics degrees of freedom and p values based on the limiting chi square distributions for each effect in the model If you specify either the DSCALE or PSCALE option in the MODEL statement F statistics ar
205. ogistic and Poisson Regression 2555 Table 37 7 OUTDIST Data Set Obs xl x20 x21 Count Score Prob 1 0 0 3 5 81151 0 03333 2 0 1 15 1 66031 0 16667 3 0 2 9 3 12728 0 10000 4 1 0 15 1 46523 0 16667 5 1 1 18 0 21675 0 20000 6 1 2 6 4 58644 0 06667 7 2 0 19 1 61869 0 21111 8 2 1 2 3 27293 0 02222 9 3 0 3 6 27189 0 03333 10 2 6 3 03030 0 12000 11 3 12 0 75758 0 24000 12 4 11 0 00000 0 22000 13 5 18 0 75758 0 36000 14 6 3 3 03030 0 06000 The first nine observations in the dist data set contain an exact distribution for the parameters of the x2 effect hence the values for the x1 parameter are missing and the remaining five observations are for the x1 parameter If a joint distribution was created there would be observations with values for both the x1 and x2 parameters For CLASS variables the corresponding parameters in the dist data set are identified by concatenating the variable name with the appropriate classification level The data set contains the possible sufficient statistics of the parameters for the effects specified in the EXACT statement and the Count variable contains the number of different responses that yield these statistics In particular there are six possible response vectors y for which the dot product y x1 was equal to 2 and for which y x20 y x21 and y 1 were equal to their actual observed values displayed in the Sufficient Statistics table NOTE If you are performing an exact Poisson analysi
206. ohn Wiley amp Sons Simonoff J S 2003 Analyzing Categorical Data New York Springer Verlag Spiegelhalter D J Best N G Carlin B P and Van der Linde A 2002 Bayesian Measures of Model Complexity and Fit Journal of the Royal Statistical Society Series B 64 4 583 616 with discussion References 2629 Stokes M E Davis C S and Koch G G 2000 Categorical Data Analysis Using the SAS System Second Edition Cary NC SAS Institute Inc Thall P F and Vail S C 1990 Some Covariance Models for Longitudinal Count Data with Overdispersion Biometrics 46 657 671 Ware J H Dockery S A I Speizer F E and Ferris B G Jr 1984 Passive Smoking Gas Cooking and Respiratory Health of Children Living in Six Cities American Review of Respiratory Diseases 129 366 374 White H 1982 Maximum Likelihood Estimation of Misspecified Models Econometrica 50 1 25 Williams D A 1987 Generalized Linear Model Diagnostics Using the Deviance and Single Case Deletions Applied Statistics 36 181 191 Zeger S L Liang K Y and Albert P S 1988 Models for Longitudinal Data A Generalized Estimating Equation Approach Biometrics 44 1049 1060 Subject Index adjusted residuals GENMOD procedure 2529 aggregates of residuals 2597 2604 Akaike s information criterion GENMOD 2519 aliasing GENMOD procedure 2438 ALR algorithm GENMOD
207. oint scale from very good vg to very bad vb An analysis is performed to Example 37 4 Ordinal Model for Multinomial Data 2583 assess the differences in the ratings of the three brands The variable taste contains the ratings and the variable brand contains the brands tested The variable count contains the number of testers rating each brand in each category The following statements create the Icecream data set data Icecream input count brand taste datalines 70 icel vg 71 icel g 151 icel m 30 icel b 46 icel vb 20 ice2 vg 36 ice2 g 130 ice2 m 74 ice2 b 70 ice2 vb 50 ice3 vg 55 ice3 g 140 ice3 m 52 ice3 b 50 ice3 vb run The following statements fit a cumulative logit model to the ordinal data with the variable taste as the response and the variable brand as a covariate The variable count is used as a FREQ variable proc genmod data Icecream rorder data freq count class brand model taste brand dist multinomial link cumlogit aggregate brand typel estimate LogOR12 brand 1 1 exp estimate estimate run LogOR13 brand 1 0 1 exp LogOR23 brand 0 1 1 exp The AGGREGATE BRAND option in the MODEL statement specifies the variable brand as defining multinomial populations for computing deviances and Pearson chi squares The RORDER DATA option specifies that the taste variable levels be ordered by their order of appearance in the input data set that is from very good vg to ver
208. olumn vector of covariates or explanatory variables for observation i that is known from the experimental setting and is considered to be fixed or nonrandom The vector of unknown coefficients B is estimated by a least squares fit to the data y The are assumed to be independent normal random variables with zero mean and constant variance The expected value of y denoted by ui is Hi X B While traditional linear models are used extensively in statistical data analysis there are types of problems such as the following for which they are not appropriate e It might not be reasonable to assume that data are normally distributed For example the normal distribution which is continuous might not be adequate for modeling counts or measured proportions that are considered to be discrete e If the mean of the data is naturally restricted to a range of values the traditional linear model might not be appropriate since the linear predictor x B can take on any value For example the mean of a measured proportion is between 0 and 1 but the linear predictor of the mean in a traditional linear model is not restricted to this range e It might not be realistic to assume that the variance of the data is constant for all observations For example it is not unusual to observe data where the variance increases with the mean of the data Examples of Generalized Linear Models 2431 A generalized linear model extends the traditional linear mo
209. on preceding columns If a dependency is found the parameter corresponding to the dependent column is set to 0 along with its standard error to indicate that it is not estimated The order in which the levels of a classification variable are checked for dependencies can be set by the ORDER option in the PROC GENMOD statement or by the ORDER option in the CLASS statement For full rank parameterizations the columns of the X matrix are designed to be linearly independent You can exclude the intercept term from the model by specifying the NOINT option in the MODEL statement Missing Level Combinations All levels of interaction terms involving classification variables might not be represented in the data In that case PROC GENMOD does not include parameters in the model for the missing levels Type 1 Analysis A Type 1 analysis consists of fitting a sequence of models beginning with a simple model with only an intercept term and continuing through a model of specified complexity fitting one additional effect on each step Likelihood ratio statistics that is twice the difference of the log likelihoods are computed between successive models This type of analysis is sometimes called an analysis of deviance since if the dispersion parameter is held fixed for all models it is equivalent to computing differences of scaled deviances The asymptotic distribution of the likelihood ratio statistics under the hypothesis that the additional pa
210. on modeling SLICE Statement SLICE model effect lt options gt The SLICE statement provides a general mechanism for performing a partitioned analysis of the LS means for an interaction This analysis is also known as an analysis of simple effects The SLICE statement uses the same options as the LSMEANS statement which are summarized in Table 19 19 For details about the syntax of the SLICE statement see the section SLICE Statement on page 526 of Chapter 19 Shared Concepts and Topics STORE Statement STORE lt OUT gt item store name lt LABEL label gt The STORE statement requests that the procedure save the context and results of the statistical analysis The resulting item store is a binary file format that cannot be modified The contents of the item store can be processed with the PLM procedure For details about the syntax of the STORE statement see the section STORE Statement on page 529 of Chapter 19 Shared Concepts and Topics STRATA Statement STRATA variable lt option gt lt variable lt option gt gt lt options gt The STRATA statement names the variables that define strata or matched sets to use in stratified exact logistic regression of binary response data or a stratified exact Poisson regression of count data An EXACT statement must also be specified Observations that have the same variable values are in the same matched set For a stratified
211. on structure The MODEL statement specifies the regression model for the mean with the binomial distribution variance function The following SAS statements perform the GEE model fit proc genmod data resp descend class id treatment ref P center ref 1 sex ref M baseline ref 0 param ref model outcome treatment center sex age baseline dist bin repeated subject id center corr unstr corrw run These statements first fit the generalized linear GLM model specified in the MODEL statement The parameter estimates from the generalized linear model fit are not shown in the output but they are used as initial values for the GEE solution The DESCEND option in the PROC GENMOD statement specifies that the probability that outcome 1 be modeled If the DESCEND option had not been specified the probability that outcome 0 would be modeled by default Information about the GEE model is displayed in Output 37 5 2 The results of GEE model fitting are displayed in Output 37 5 3 Model goodness of fit criteria are displayed in Output 37 5 4 If you specify no other options the standard errors confidence intervals Z scores and p values are based on empirical standard error estimates You can specify the MODELSE option in the REPEATED statement to create a table based on model based standard error estimates Output 37 5 2 Model Fitting Information The GENMOD Procedure GEE Model Information Correlation Structure Unstructured
212. onvergence the ALR algorithm provides estimates of the regression parameters for the mean f the regression parameters for the log odds ratios their standard errors and their covariances Generalized Estimating Equations 2537 Specifying Log Odds Ratio Models Specifying a regression model for the log odds ratio requires you to specify rows of the z matrix Z x for each cluster i and each unique within cluster pair j k The GENMOD procedure provides several methods of specifying z These are controlled by the LOGOR keyword and associated options in the REPEATED statement The supported keywords and the resulting log odds ratio models are described as follows EXCH specifies exchangeable log odds ratios In this model the log odds ratio is a constant for all clusters i and pairs j k The parameter a is the common log odds ratio Zijk l forall i j k FULLCLUST specifies fully parameterized clusters Each cluster is parameterized in the same way and there is a parameter for each unique pair within clusters If a complete cluster is of size n then there are gt D parameters in the vector For example if a full cluster is of size 4 then there are 4x3 5 6 parameters and the z matrix is of the form 100 0 0 0 0100 0 0 0010 0 0 000 10 0 000010 000 00 1 The elements of correspond to log odds ratios for cluster pairs in the following order Pair Parameter 1 2 Alphal 1 3 Alpha2 1
213. otherwise it is an absolute change By default CONVERGE 1E 4 This convergence criterion is used in parameter estimation for a single model fit Type 1 statistics and likelihood ratio statistics for Type 3 analyses and CONTRAST statements CONVH number sets the relative Hessian convergence criterion The value of number must be between 0 and 1 After convergence is determined with the change in parameter criterion specified with the CONVERGE option the quantity tc oe is computed and compared to number where g is the gradient vector H is the Hessian matrix for the model parameters and f is the log likelihood function If tc is greater than number a warning that the relative Hessian convergence criterion has been exceeded is printed This criterion detects the occasional case where the change in parameter convergence criterion is satisfied but a maximum in the log likelihood function has not been attained By default CONVH 1E 4 CORRB requests that the parameter estimate correlation matrix be displayed COVB requests that the parameter estimate covariance matrix be displayed MODEL Statement 2493 DIAGNOSTICS INFLUENCE requests that case deletion diagnostic statistics be displayed see the OBSTATS option DIST keyword D keyword ERROR keyword ERR keyword specifies the built in probability distribution to use in the model If you specify the DIST option and you omit a user defined link function a default link f
214. out the specified model and the input data set Figure 37 2 Class Level Information Class Level Information Class Levels Values car 3 large medium small age 2 12 Figure 37 2 displays the Class Level Information table which identifies the levels of the classifi cation variables that are used in the model Note that car is a character variable and the values are sorted in alphabetical order This is the default sort order but you can select different sort orders with the ORDER option in the PROC GENMOD statement Figure 37 3 Goodness of Fit Criteria For Assessing Goodness Of Fit Criterion DF Value Value DF Deviance 2 2 8207 1 4103 Scaled Deviance 2 2 8207 1 4103 Pearson Chi Square 2 2 8416 1 4208 Scaled Pearson X2 2 2 8416 1 4208 Log Likelihood 837 4533 Full Log Likelihood 16 4638 AIC smaller is better 40 9276 AICC smaller is better 80 9276 BIC smaller is better 40 0946 The Criteria For Assessing Goodness Of Fit table displayed in Figure 37 3 contains statistics that summarize the fit of the specified model These statistics are helpful in judging the adequacy of a model and in comparing it with other models under consideration If you compare the deviance 2438 Chapter 37 The GENMOD Procedure of 2 8207 with its asymptotic chi square with 2 degrees of freedom distribution you find that the p value is 0 24 This indicates that the specified model fits the data reasonably well Fig
215. parameters equating D to its mean and solving for yields D n p Similarly an estimate of based on Pearson s chi square X is X n p Alternatively a maximum likelihood estimate of can be computed by the procedure if desired See the discussion in the section Type 1 Analysis on page 2523 for more about the estimation of the dispersion parameter Other Fit Statistics The Akaike information criterion AIC is a measure of goodness of model fit that balances model fit against model simplicity AIC has the form AIC 2LL 2p where p is the number of parameters estimated in the model and LL is the log likelihood evaluated at the value of the estimated parameters An alternative form is the corrected AIC given by NCC Cee ee a n p l1 where n is the total number of observations used The Bayesian information criterion BIC is a similar measure BIC is defined by BIC 2LL plog n 2520 Chapter 37 The GENMOD Procedure See Akaike 1981 1979 for details of AIC and BIC See Simonoff 2003 for a discussion of using AIC AICC and BIC with generalized linear models These criteria are useful in selecting among regression models with smaller values representing better model fit PROC GENMOD uses the full log likelihoods defined in the section Log Likelihood Functions on page 2514 with all terms included for computing all of the criteria Dispersion Parameter There are several op
216. parameters replaced by their estimated values K nj nee BY 2 1Oinj lt 0 ME i 1k 1 op K Io DV D i l and Z i 1 K are independent N 0 1 random variables You replace x with the linear predictor x in the preceding formulas to check the link function Case Deletion Diagnostic Statistics For ordinary generalized linear models regression diagnostic statistics developed by Williams 1987 can be requested in an output data set or in the OBSTATS table by specifying the DIAGNOS TICS INFLUENCE option in the MODEL statement These diagnostics measure the influence of an individual observation on model fit and generalize the one step diagnostics developed by Pregibon 1981 for the logistic regression model for binary data Preisser and Qaqish 1996 further generalized regression diagnostics to apply to models for cor related data fit by generalized estimating equations GEEs where the influence of entire clusters of correlated observations or the influence of individual observations within a cluster is measured These diagnostic statistics can be requested in an output data set or in the OBSTATS table if a model for correlated data is specified with a REPEATED statement The next two sections use the following notation B is the maximum likelihood estimate of the regression parameters or in the case of correlated data the solution of the GEEs B i is the corresponding estimate evaluated
217. ponse variable The following statements illustrate the use of programming statements Even though you usually request the Poisson distribution by specifying DIST POISSON as a MODEL statement option you can define the variance and deviance functions for the Poisson distribution by using the VARIANCE and DEVIANCE statements For example the following statements perform the same analysis as the Poisson regression example in the section Getting Started GENMOD Procedure on page 2435 The statements must be in logical order for computation just as in a DATA step proc genmod class car age a _MEAN y _RESP_ d 2 yx log y a y a variance var a deviance dev d model c car age link log offset 1n run The variables var and dev are dummy variables used internally by the procedure to identify the variance and deviance functions Any valid SAS variable names can be used Similarly the log link function and its inverse could be defined with the FWDLINK and INVLINK statements as follows fwdlink link log _MEAN_ invlink ilink exp _XBETA_ REPEATED Statement 2503 These statements are for illustration and they work well for most Poisson regression problems If however in the iterative fitting process the mean parameter becomes too close to 0 or a 0 response value occurs an error condition occurs when the procedure attempts to evaluate the log function You can circumvent this kind of pr
218. predicted values and residuals Note that raw Pearson and deviance residuals are equal in this example This is a characteristic of the normal distribution and is not true in general for other distributions Example 37 3 Gamma Distribution Applied to Life Data Life data are sometimes modeled with the gamma distribution Although PROC GENMOD does not analyze censored data or provide other useful lifetime distributions such as the Weibull or lognormal it can be used for modeling complete uncensored data with the gamma distribution and it can provide a statistical test for the exponential distribution against other gamma distribution alternatives See Lawless 2003 or Nelson 1982 for applications of the gamma distribution to life data The following data represent failure times of machine parts some of which are manufactured by manufacturer A and some by manufacturer B data A input lifetime mfg A datalines 620 470 260 89 388 242 103 100 39 460 284 1285 218 393 106 158 152 477 403 103 69 158 818 947 399 1274 32 12 134 660 548 381 203 871 193 531 317 85 1410 250 41 1101 32 421 32 343 376 1512 1792 47 95 76 515 72 1585 253 6 860 89 1055 537 101 385 176 11 565 164 16 1267 352 160 195 1279 356 751 500 803 560 151 24 689 1119 1733 2194 763 555 14 45 776 1 run data B input lifetime mfg B datalines 1747 945 12 1453 14 150 20 41 35 69 195 89 1090 1868 294 96 618 44 142 892 1307 310 230 30 403 8
219. procedure 2589 alternating logistic regressions ALR GENMOD procedure 2589 bar I operator GENMOD procedure 2522 Bayesian analysis linear regression GENMOD procedure 2440 Bayesian information criterion GENMOD 2519 binomial distribution GENMOD procedure 2513 case deletion diagnostics GENMOD procedure 2544 CATMOD procedure log linear models 2435 classification variables GENMOD procedure 2522 sort order of levels GENMOD 2457 confidence intervals confidence coefficient GENMOD 2492 fitted values of the mean GENMOD 2496 2528 profile likelihood GENMOD 2495 2525 Wald GENMOD 2498 2526 continuous variables GENMOD procedure 2522 contrasts GENMOD procedure 2481 convergence criterion GENMOD procedure 2492 2503 correlated data GEE GENMOD 2429 2532 correlation matrix GENMOD 2492 2517 covariance matrix GENMOD procedure 2492 2517 crossed effects GENMOD procedure 2522 cumulative residuals 2597 2604 design matrix GENMOD procedure 2523 deviance definition GENMOD 2433 GENMOD procedure 2491 scaled GENMOD 2517 2518 deviance information criterion 2551 deviance residuals GENMOD procedure 2528 2529 diagnostics GENMOD procedure 2493 2544 DIC 2551 dispersion parameter estimation GENMOD 2432 2519 2524 2525 GENMOD procedure 2520 weights GENMOD 2509 effect specification GENMOD 2522 effective number of parameters 2551 estimability checking G
220. ption 2479 SINGULAR2 option 2479 WALD option 2479 GENMOD procedure DEVIANCE statement 2479 2503 GENMOD procedure EFFECTPLOT statement 2480 GENMOD procedure ESTIMATE statement ALPHA option 2481 E option 2481 EXP option 2481 SINGULAR2 option 2481 GENMOD procedure EXACT statement 2482 ALPHA option 2482 CLTYPE option 2482 ESTIMATE option 2482 JOINT option 2483 JOINTONLY option 2483 MIDPFACTOR option 2483 ONESIDED option 2483 OUTDIST option 2483 GENMOD procedure EXACTOPTIONS statement 2484 GENMOD procedure FREQ statement 2481 2487 GENMOD procedure FWDLINK statement 2487 2503 GENMOD procedure INVLINK statement 2488 2503 GENMOD procedure LSMESTIMATE statement 2489 GENMOD procedure MODEL statement 2491 ABSFCONV option 2484 AGGREGATE option 2491 ALPHA option 2492 CICONV option 2492 CL option 2492 CODING option 2492 CONVERGE option 2492 CONVHE option 2492 CORRB option 2492 COVB option 2492 DIAGNOSTICS option 2493 DIST option 2493 ERR option 2493 EXPECTED option 2493 FCONVE option 2485 INFLUENCE option 2493 INITIAL option 2493 INTERCEPT option 2494 ITPRINT option 2494 LINK option 2494 LRCI option 2495 MAXIT option 2495 NOINT option 2495 NOLOGSCALE option 2486 NOSCALE option 2495 OBSTATS option 2495 OFFSET option 2497 PRED option 2497 PREDICTED option 2497 RESIDUALS option 2497 SCALE option 2497 SCORING option 2498 SINGULAR option
221. put 37 8 6 suggesting cubic time trends The following SAS statements fit the model create the plot in Output 37 9 2 and compute a p value for a model with the additional terms T3 and R T3 ods graphics on proc genmod data cd4 class Id model Y Time Time2 Time3 TrtTime TrtTime2 TrtTime3 repeated sub Id assess var Time resample seed 603708000 run ods graphics off Example 37 9 Assessment of a Marginal Model for Dependent Data 2607 Output 37 9 2 Cumulative Residual Plot for Cubic Time Fit Checking Functional Form for Time Observed Path and First 20 Simulated Paths 200 100 100 Cumulative Residuals 200 Pr gt MaxAbsVal 0 4470 1000 Simulations 300 Time The observed cumulative residual pattern appears more typical of the simulated realizations and the p value is 0 45 indicating that the model with cubic time trends is more appropriate 2608 Chapter 37 The GENMOD Procedure Example 37 10 Bayesian Analysis of a Poisson Regression Model This example illustrates a Bayesian analysis of a log linear Poisson regression model Consider the following data on patients from clinical trials The data set is a subset of the data described in Ibrahim Chen and Lipsitz 1999 data Liver input X1 X6 Y datalines 19 1358 50 0110 51 000 0 0 1 3 23 5970 18 4959 3 429 0 0 1 9 20 0474 56 7699 3 429 1 1 0 6 28 0277 59 7836 4 000 0 0 1 6 28 6851 74 1589 5 714 1 0 1 1 18 8092 31 0630 2 2
222. r on page 153 for more details Posterior Samples Output Data Set You can output posterior samples into a SAS data set through ODS The following SAS statement outputs the posterior samples into the SAS data set Post OUTPOST Post The data set also includes the variables LogPost and LogLike which represent the log of the posterior likelihood and the log of the likelihood respectively Priors for Model Parameters The model parameters are the regression coefficients and the dispersion parameter or the precision or scale if the model has one The priors for the dispersion parameter and the priors for the regression coefficients are assumed to be independent while you can have a joint multivariate normal prior for the regression coefficients Dispersion Precision or Scale Parameter Gamma Prior The gamma distribution G a b has a probability density function b bu le 0 T u gt fu where a is the shape parameter and b is the inverse scale parameter The mean is and the variance is ri Improper Prior The joint prior density is given by pu x ut u gt 0O Inverse Gamma Prior The inverse gamma distribution J G a b has a probability density function b f u Oe u gt 0 a Tr a where a is the shape parameter and b is the scale parameter The mean is Z ifa gt 1 and the variance is ifa gt 2 b2 a 1 a 2 Bayesian Analysis 2551 Regression Coefficients Let B be the
223. r We can be used in the update equation The GENMOD procedure uses Fisher scoring for iterations up to the number specified by the SCORING option in the MODEL statement and it uses the observed information matrix on additional iterations Wei Covariance and Correlation Matrix The estimated covariance matrix of the parameter estimator is given by xy H where H is the Hessian matrix evaluated using the parameter estimates on the last iteration Note that the dispersion parameter whether estimated or specified is incorporated into H Rows and columns corresponding to aliased parameters are not included in X The correlation matrix is the normalized covariance matrix That is if oj is an element of 2 then the corresponding element of the correlation matrix is oj o 0 where oj Oii Goodness of Fit Two statistics that are helpful in assessing the goodness of fit of a given generalized linear model are the scaled deviance and Pearson s chi square statistic For a fixed value of the dispersion parameter 2518 Chapter 37 The GENMOD Procedure the scaled deviance is defined to be twice the difference between the maximum achievable log likelihood and the log likelihood at the maximum likelihood estimates of the regression parameters Note that these statistics are not valid for GEE models If y w is the log likelihood function expressed as a function of the predicted mean values u and the vector y of response values
224. r explanatory variables for the jth measurement on the ith subject be Xij Kije Xol The generalized estimating equation of Liang and Zeger 1986 for estimating the p x 1 vector of regression parameters is an extension of the independence estimating equation to correlated data and is given by K S B D V7 Y Hi B 0 i 1 where Ofi D dp Since g uij xij B where g is the link function the p x n matrix of partial derivatives of the mean with respect to the regression parameters for the ith subject is given by Xill Xinjl aye g Hii 8 Hin rr op Xilp Xinip g uii Min Working Correlation Matrix Let R be an n xn working correlation matrix that is fully specified by the vector of parameters a The covariance matrix of Y is modeled as NI 1 _41 1 V pA W R a W 7A i Generalized Estimating Equations 2533 where A is an n x n diagonal matrix with v j1 as the jth diagonal element and W is an n x ni diagonal matrix with w as the jth diagonal where wj is a weight specified with the WEIGHT statement If there is no WEIGHT statement w 1 for alli and j If R is the true correlation matrix of Y then V is the true covariance matrix of Y The working correlation matrix is usually unknown and must be estimated It is estimated in the iterative fitting process by using the current value of the parameter vector B to compute appropriat
225. r specified by other normal options BAYES Statement 2465 INPUT SAS data set specifies a SAS data set containing the mean and covariance information of the normal prior The data set must have a _TYPE_ variable to represent the type of each obser vation and a variable for each regression coefficient If the data set also contains a _NAME_ variable the values of this variable are used to identify the covariances for the _TYPE_ COV observations otherwise the TYPE_ COV observations are assumed to be in the same order as the explanatory variables in the MODEL statement PROC GENMOD reads the mean vector from the observation with _TYPE_ MEAN and reads the covariance matrix from observations with _TYPE_ COV For an independent normal prior the variances can be specified with _TYPE_ VAR alternatively the precisions inverse of the variances can be specified with _TYPE_ PRECISION RELVAR lt c gt specifies the normal prior V 0 cJ where J is a diagonal matrix with diagonal elements equal to the variances of the corresponding ML estimator By default c 10 VAR lt c gt specifies the normal prior N 0 cI where I is the identity matrix DIAGNOSTICS ALL NONE keyword list DIAG ALL NONE keyword list controls the number of diagnostics produced You can request all the following diagnostics by specifying DIAGNOSTICS ALL If you do not want any of these diagnostics specify DIAGNOSTICS
226. r the MCMC are shown in Output 37 10 19 The initial values of the covariates are joint estimates of their posterior modes The prior distribution for X1 is informative so the initial value of X1 is further from the MLE than the rest of the covariates Initial values for the rest of the covariates are close to their MLEs since noninformative prior distributions were specified for them Output 37 10 19 MCMC Initial Values and Seeds Initial Values of the Chain Chain Seed Intercept X1 X2 X3 X4 1 1 2 14282 0 010595 0 01434 0 00301 0 28062 Initial Values of the Chain x5 x6 0 334983 0 231213 Goodness of fit summary and interval statistics are shown in Output 37 10 20 Except for X1 the statistics shown in Output 37 10 20 are very similar to the previous statistics for noninformative priors shown in Output 37 10 4 through Output 37 10 7 The point estimate for X1 is now positive This is expected because the prior distribution on 6 is quite informative The distribution reflects the belief that the coefficient is positive The N 0 1385 0 0005 distribution places the majority of its probability density on positive values As a result the posterior density of 6 places more likelihood on positive values than in the noninformative case Output 37 10 20 Fit Statistics Fit Statistics DIC smaller is better 833 134 pD effective number of parameters 6 861 The GENMOD Procedure Bayesian Analysis Posterior Summa
227. rameters included in the model are equal to 0 is a chi square with 2524 Chapter 37 The GENMOD Procedure degrees of freedom equal to the difference in the number of parameters estimated in the successive models Thus these statistics can be used in a test of hypothesis of the significance of each additional term fit This type of analysis is not available for GEE models since the deviance is not computed for this type of model If the dispersion parameter is known it can be included in the models if it is unknown there are two strategies allowed by PROC GENMOD The dispersion parameter can be estimated from a maximal model by the deviance or Pearson s chi square divided by degrees of freedom as discussed in the section Goodness of Fit on page 2517 and this value can be used in all models An alternative is to consider the dispersion to be an additional unknown parameter for each model and estimate it by maximum likelihood on each step By default PROC GENMOD estimates scale by maximum likelihood at each step A table of likelihood ratio statistics is produced along with associated p values based on the asymptotic chi square distributions If you specify either the SCALE DEVIANCE or the SCALE PEARSON option in the MODEL statement the dispersion parameter is estimated using the deviance or Pearson s chi square statistic and F statistics are computed in addition to the chi square statistics for assessing the significance
228. re not available with an exact analysis Ex act analyses are not performed when you specify a WEIGHT statement or a model other than LINK LOGIT with DIST BIN or LINK LOG with DIST POISSON Exact estimation is not available for ordinal response models For classification variables use of the reference parameterization is recommended The following options can be specified in each EXACT statement after a slash ALPHA number specifies the level of significance a for 100 1 w confidence limits for the parameters or odds ratios The value of number must be between 0 and 1 By default number is equal to the value of the ALPHA option in the MODEL statement or 0 05 if that option is not specified CLTYPE EXACT MIDP requests either the exact or mid p confidence intervals for the parameter estimates By default the exact intervals are produced The confidence coefficient can be specified with the ALPHA option The mid p interval can be modified with the MIDPFACTOR option See the section Exact Logistic and Poisson Regression on page 2553 for details ESTIMATE lt keyword gt estimates the individual parameters conditioned on all other parameters for the effects specified in the EXACT statement For each parameter a point estimate a standard error a confidence interval and a p value for a two sided test that the parameter is zero are displayed Note that the two sided p value is twice the one sided p value You can optiona
229. recision parameter tT 7t with the DPRIOR SPRIOR and PPRIOR options respectively These three parameters are transformations of one another and you should specify Gibbs sampling for only one of them A gamma prior G a b with density f t roy is specified by SCALEPRIOR GAMMA which can be followed by one of the following gamma options enclosed in parentheses The hyperparameters a and b are the shape and inverse scale parameters of the gamma distribution respectively See the section Gamma Prior on page 2550 for details The default is G 10 1074 RELSHAPE lt c gt specifies independent G cG c distribution where is the MLE of the dispersion parameter With this choice of hyperparameters the mean of the prior distribution is and the variance is By default c 1074 2472 Chapter 37 The GENMOD Procedure SHAPE a ISCALE b when both specified results in a G a b prior SHAPE c when specified alone results in an G c c prior ISCALE c when specified alone results in an G c c prior An improper prior with density f t proportional to t7 is specified with SCALEPRIOR IMPROPER SEED number specifies an integer seed in the range 1 to 23 1 for the random number generator in the simulation Specifying a seed enables you to reproduce identical Markov chains for the same specification If the SEED option is not specified or if you specify a nonpositive seed a random seed is derived from th
230. regression coefficients Jeffreys Prior The joint prior density is given by p B x 1 B where I is the Fisher information matrix for the model If the underlying model has a scale parameter for example a normal linear regression model then the Fisher information matrix is computed with the scale parameter set to a fixed value of one If you specify the CONDITIONAL option then Jeffreys prior conditional on the current Markov chain value of the generalized linear model precision parameter t is given by 1 c1 B where qt is the model precision parameter See brahim and Laud 1991 for a full discussion with examples of Jeffreys prior for generalized linear models Normal Prior Assume has a multivariate normal prior with mean vector Bo and covariance matrix Xo The joint prior density is given by p B e732 B Bo Z 5 B 80 If you specify the CONDITIONAL option then conditional on the current Markov chain value of the generalized linear model precision parameter t the joint prior density is given by p B x e7 2 B Bo tZG B Bo Uniform Prior The joint prior density is given by p B 1 Deviance Information Criterion Let 6 be the model parameters at iteration 7 of the Gibbs sampler and let LL 6 be the corresponding model log likelihood PROC GENMOD computes the following fit statistics defined by Spiegelhalter et al 2002 e Effective number of parameters Pp LL 6 LL 6
231. riable is less than or equal OUTPUT Statement 2501 to the value of _LEVEL_ if the multinomial model for ordinal data is used in other words Pr Y lt _LEVEL_ where Y is the response variable PZERO represents the zero inflation probability for zero inflated models RESCHI represents the Pearson chi residual for identifying observations that are poorly accounted for by the model RESDEV represents the deviance residual for identifying poorly fitted observations RESLIK represents the likelihood residual for identifying poorly fitted observations RESRAW represents the raw residual for identifying poorly fitted observations STDRESCHI represents the standardized Pearson chi residual for identifying observa tions that are poorly accounted for by the model STDRESDEV represents the standardized deviance residual for identifying poorly fitted observations STDXBETA represents the standard error estimate of XBETA see the XBETA key word UPPER U represents the upper confidence limit for the predicted value of the mean or the upper confidence limit for the probability that the response is less than or equal to the value of Level or Value The confidence coefficient is determined by the ALPHA number option in the MODEL statement as 1 number x 100 The default confidence coefficient is 95 XBETA represents the estimate of the linear predictor x 8 for observation 7 or aj x B where j is the corresponding ordered v
232. rials format is used for response the number of trials if events trials format is used for response the sum of frequency weights the number of missing values in data set and the number of invalid observations for example negative or 0 response values with gamma distribution or number of observations with events greater than trials with binomial distribution Class Level Information The Class Level Information table displays the levels of classification variables if you specify a CLASS statement Maximum Likelihood Estimates The Analysis of Maximum Likelihood Parameter Estimates table displays the maximum likelihood estimate of each parameter the estimated standard error of the parameter estimator and confidence limits for each parameter Coefficient Prior The Coefficient Prior table displays the prior distribution of the regression coefficients Independent Prior Distributions for Model Parameters The Independent Prior Distributions for Model Parameters table displays the prior distributions of additional model parameters scale exponential scale Weibull scale Weibull shape gamma shape Initial Values and Seeds The Initial Values and Seeds table displays the initial values and random number generator seeds for the Gibbs chains Fit Statistics The Fit Statistics table displays the deviance information criterion DIC and the effective number of parameters 2566 Chapter 37 Th
233. ribution samples are produced by default However these statistics might not be sufficient for carrying out your Bayesian inference and further processing of the posterior samples might be necessary The following SAS statements request the Bayesian analysis and the OUTPOST option saves the samples in the SAS data set PostSurg for further processing ods graphics on proc genmod data Surg model y Logxl X2 X3 X4 dist normal bayes seed 1 OutPost PostSurg run ods graphics off The results of this analysis are shown in the following figures The Model Information table in Figure 37 8 summarizes information about the model you fit and the size of the simulation Figure 37 8 Model Information The GENMOD Procedure Bayesian Analysis Model Information Data Set WORK SURG Burn In Size 2000 MC Sample Size 10000 Thinning 1 Sampling Algorithm Conjugate Distribution Normal Link Function Identity Dependent Variable y Survival Time Bayesian Analysis of a Linear Regression Model 2443 The Analysis of Maximum Likelihood Parameter Estimates table in Figure 37 9 summarizes maximum likelihood estimates of the model parameters Figure 37 9 Maximum Likelihood Parameter Estimates Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Confidence Parameter DF Estimate Error Limits Intercept 1 730 559 85 4333 898 005 563 112 Logx1 1 171 8758 38 2250 96 9561 246 7954 x2 1 4 3019 0
234. ries Standard Percentiles Parameter N Mean Deviation 25 50 75 Intercept 10000 2 1393 0 2160 1 9929 2 1417 2 2845 X1 10000 0 0104 0 00685 0 00583 0 0106 0 0151 X2 10000 0 0143 0 00236 0 0159 0 0143 0 0127 x3 10000 0 00318 0 00218 0 00463 0 00313 0 00170 X4 10000 0 2801 0 0798 0 3342 0 2807 0 2258 x5 10000 0 3336 0 0834 0 2772 0 3337 0 3902 X6 10000 0 2333 0 0822 0 1791 0 2327 0 2892 2622 Chapter 37 The GENMOD Procedure Output 37 10 20 continued Posterior Intervals Parameter Alpha Equal Tail Interval HPD Interval Intercept 0 050 1 7161 2 5599 1 7075 2 5507 X1 0 050 0 00323 0 0236 0 00264 0 0241 X2 0 050 0 0189 0 00960 0 0189 0 00972 x3 0 050 0 00754 0 00101 0 00754 0 000963 X4 0 050 0 4348 0 1223 0 4311 0 1196 x5 0 050 0 1705 0 4970 0 1661 0 4915 X6 0 050 0 0696 0 3968 0 0655 0 3904 Example 37 11 Exact Poisson Regression The following data taken from Cox and Snell 1989 pp 10 11 consists of the number Notready of ingots that are not ready for rolling out of Total tested for several combinations of heating time and soaking time data ingots input Heat Soak Notready Total lnTotal log Total datalines 7 1 0 O 10 141 00 31 27 1 0 1 56 51 1 0 3 13 7 1 7 0 17 14 1 7 O 43 27 1 7 4 44 511 70 1 72 20 7 14 2 2 2 33 27 2 2 0 21 512 20 1 72 8012 14 2 8 0 31 272 8122 514 00 1 74 00 9 144 0 0 19 27 4 0 1 16 The following invocation of PROC GENMOD fits an asymp
235. s then the Count variable is replaced by a variable named Weight When hypothesis tests are performed on the parameters the Prob variable contains the probability of obtaining that statistic which is just the count divided by the total count and the Score variable contains the score for that statistic The OUTDIST data set can contain a different exact conditional distribution for each specified EXACT statement For example consider the following EXACT statements exact 01 x1 outdist o01 exact 0J12 x1 x2 jointonly outdist 0j12 exact OA12 x1 x2 joint outdist oal12 exact OE12 x1 x2 estimate outdist o0el2 The O1 statement outputs a single exact conditional distribution The OJ12 statement outputs only the joint distribution for x1 and x2 The OA12 statement outputs three conditional distributions one for x1 one for x2 and one jointly for x1 and x2 The OE12 statement outputs two conditional distributions one for x1 and the other for x2 Data set oe12 contains both the x1 and x2 variables 2556 Chapter 37 The GENMOD Procedure the distribution for x1 has missing values in the x2 column while the distribution for x2 has missing values in the x1 column Missing Values For generalized linear models PROC GENMOD ignores any observation with a missing value for any variable involved in the model You can score an observation in an output data set by setting only the response value to missing For models fit wit
236. s conditional on the observed sufficient statistics for the intercept x1 and x3 proc genmod 2484 Chapter 37 The GENMOD Procedure model y x1 x2 x3 d b exact x1 x2 run PROC GENMOD determines from all the specified EXACT statements the distinct conditional distributions that need to be evaluated For example there is only one exact conditional distribution for the following two EXACT statements exact One x1 estimate parm exact Two xl estimate parm onesided For each EXACT statement individual tests for the parameters of the specified effects are computed unless the JOINTONLY option is specified Consider the following EXACT statements exact E12 x1 x2 estimate exact El x1 estimate exact E2 x2 estimate exact J12 x1 x2 joint In the E12 statement the parameters for x1 and x2 are estimated and tested separately Specifying the E12 statement is equivalent to specifying both the E1 and E2 statements In the J12 statement the joint test for the parameters of x1 and x2 is computed in addition to the individual tests for x1 and x2 EXACTOPTIONS Statement EXACTOPTIONS options The EXACTOPTIONS statement specifies options that apply to every EXACT statement in the program The following options are available ABSFCONV value specifies the absolute function convergence criterion Convergence requires a small change in the log likelihood function in subsequent iterations l li 1 l
237. s outside the valid range of arguments for the inverse link function the corresponding confidence interval endpoint is set to missing Residuals The GENMOD procedure computes three kinds of residuals Residuals are available for all gen eralized linear models except multinomial models for ordinal response data for which residuals are not available Raw residuals and Pearson residuals are available for models fit with generalized estimating equations GEEs The raw residual is defined as ti Yi Mi where y is the ith response and u is the corresponding predicted mean You can request raw residuals in an output data set with the keyword RESRAW in the OUTPUT statement The Pearson residual is the square root of the ith contribution to the Pearson s chi square TETN rei Vi Hi PESOS EN Vep You can request Pearson residuals in an output data set with the keyword RESCHI in the OUTPUT statement Finally the deviance residual is defined as the square root of the contribution of the ith observation to the deviance with the sign of the raw residual rpi Vd sign yi ni Multinomial Models 2529 You can request deviance residuals in an output data set with the keyword RESDEV in the OUTPUT statement The adjusted Pearson deviance and likelihood residuals are defined by Agresti 2002 Williams 1987 and Davison and Snell 1991 These residuals are useful for outlier detection and for assessing the inf
238. s parameter estimates standard errors confidence intervals Z scores and p values for the parameter estimates Empirical standard error estimates are used in this table A table that displays model based standard errors can be created by using the REPEATED statement option MODELSE Figure 37 30 GEE Parameter Estimates Table Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95 Confidence Parameter Estimate Error Limits Z Pr gt Z Intercept 1 2751 3 0561 7 2650 4 7148 0 42 0 6765 city kingston 0 1223 0 6882 1 4713 1 2266 0 18 0 8589 city portage 0 0000 0 0000 0 0000 0 0000 age 0 2036 0 2789 0 3431 0 7502 0 73 0 4655 smoke 0 0935 0 3613 0 6145 0 8016 0 26 0 7957 2456 Chapter 37 The GENMOD Procedure Syntax GENMOD Procedure You can specify the following statements in the GENMOD procedure Items within the lt gt are optional PROC GENMOD lt options gt ASSESS ASSESSMENT VAR effect LINK lt options gt BAYES lt options gt BY variables CLASS variable lt options gt lt variable lt options gt gt lt options gt CONTRAST abel contrast specification lt options gt DEVIANCE variable expression EFFECTPLOT lt plot type lt plot definition options gt gt lt options gt ESTIMATE label effect values lt effect values gt lt options gt EXACT lt label gt lt INTERCEPT gt lt effects gt l
239. s part of the output by PROC GENMOD The example that follows shows how to use PROC GENMOD to carry out a Bayesian analysis of the linear model with a normal error term The SEED option is specified to maintain reproducibility no other options are specified in the BAYES statement By default a uniform prior distribution is assumed on the regression coefficients The uniform prior is a flat prior on the real line with a distribution that reflects ignorance of the location of the parameter placing equal likelihood on all possible values the regression coefficient can take Using the uniform prior in the following example you would expect the Bayesian estimates to resemble the classical results of maximizing the likelihood If you can elicit an informative prior distribution for the regression coefficients you should use the COEFFPRIOR option to specify it A default noninformative gamma prior is used for the scale parameter o 2442 Chapter 37 The GENMOD Procedure You should make sure that the posterior distribution samples have achieved convergence before using them for Bayesian inference PROC GENMOD produces three convergence diagnostics by default If ODS Graphics is enabled as specified in the following SAS statements diagnostic plots are also displayed See the section Assessing Markov Chain Convergence on page 155 for more information about convergence diagnostics and their interpretation Summary statistics of the posterior dist
240. s the procedure All statements other than the MODEL statement are optional The CLASS statement if present must precede the MODEL statement and the CONTRAST and EXACT statements must come after the MODEL statement PROC GENMOD Statement 2457 PROC GENMOD Statement PROC GENMOD lt options gt The PROC GENMOD statement invokes the procedure You can specify the following options DATA SAS data set specifies the SAS data set containing the data to be analyzed If you omit the DATA option the procedure uses the most recently created SAS data set DESCENDING DESCEND DESC specifies that the levels of the response variable for the ordinal multinomial model and the binomial model with single variable response syntax be sorted in the reverse of the default order For example if RORDER FORMATTED the default the DESCENDING option causes the levels to be sorted from highest to lowest instead of from lowest to highest If RORDER FREQ the DESCENDING option causes the levels to be sorted from lowest frequency count to highest instead of from highest to lowest EXACTONLY requests only the exact analyses The asymptotic analysis that PROC GENMOD usually performs is suppressed NAMELEN n specifies the length of effect names in tables and output data sets to be n characters long where nis a value between 20 and 200 characters The default length is 20 characters ORDER DATA FORMATTED FREQ INTERNAL specifies the order in whic
241. spersion 4000 2000 2000 4000 6000 8000 10000 12000 Iteration 1 0 0 5 0 0 Autocorrelation Posterior Density 1 0 0 10 20 30 40 50 2000 4000 6000 8000 Lag Dispersion Suppose for illustration a question of scientific interest is whether blood clotting score has a positive effect on survival time Since the model parameters are regarded as random quantities in a Bayesian analysis you can answer this question by estimating the conditional probability of 6 being positive given the data Pr B gt O Y from the posterior distribution samples The following SAS statements compute the estimate of the probability of 6 being positive data Prob set PostSurg Indicator logX1 gt 0 label Indicator log Blood Clotting Score gt 0 run proc Means data Prob keep Indicator n mean run As shown in Figure 37 26 there is a 1 00 probability of a positive relationship between the logarithm of a blood clotting score and survival time adjusted for the other covariates Figure 37 26 Probability That 6 gt 0 Generalized Estimating Equations 2453 Analysis Variable The MEANS Procedure Indicator log Blood Clotting Score gt 0 Generalized Estimating Equations This section illustrates the use of the REPEATED statement to fit a GEE model using repeated measures data from the Six Cities study of the health effects of air pollution Ware et al 1984 The data
242. sson w 1 a e for y 0 fo d w Ps for y 1 2 o 1 E l o Var Y 1 ow A l A u Tay w e Zero inflated negative binomial 1 o k fry 0 fo 1 o rent TES ONT for y 1 2 1 dispersion k Ce EY p Var Y E ES ES Il ee fe H gt as N The negative binomial and the zero inflated negative binomial distributions contain a parameter k called the negative binomial dispersion parameter This is not the same as the generalized linear model dispersion but it is an additional distribution parameter that must be estimated or set to a fixed value For the binomial distribution the response is the binomial proportion Y events trials The variance function is V w w 1 u and the binomial trials parameter n is regarded as a weight w 2514 Chapter 37 The GENMOD Procedure If a weight variable is present is replaced with w where w is the weight variable PROC GENMOD works with a scale parameter that is related to the exponential family dispersion parameter instead of working with itself The scale parameters are related to the dispersion parameter as shown previously with the probability distribution definitions Thus the scale parameter output in the Analysis of Parameter Estimates table is related to the exponential family dispersion parameter If you specify a constant scale parameter with the SCALE option in the MODEL
243. st can be any variables in the input data set Specifying the AGGREGATE 2492 Chapter 37 The GENMOD Procedure option is equivalent to specifying the AGGREGATE option with a variable list that includes all explanatory variables in the MODEL statement Pearson chi square and deviance statistics are not computed for multinomial models unless this option is specified ALPHA number ALPH number A number sets the confidence coefficient for parameter confidence intervals to 1 number The value of number must be between 0 and 1 The default value of number is 0 05 CICONV number sets the convergence criterion for profile likelihood confidence intervals See the section Confidence Intervals for Parameters on page 2525 for the definition of convergence The value of number must be between 0 and 1 By default CICONV 1E 4 CL requests that confidence limits for predicted values be displayed see the OBSTATS option CODING EFFECT CODING FULLRANK specifies that effect coding be used for all classification variables in the model This is the same as specifying PARAM EFFECT as a CLASS statement option CONVERGE number sets the convergence criterion The value of number must be between 0 and 1 The iterations are considered to have converged when the maximum change in the parameter estimates between iteration steps is less than the value specified The change is a relative change if the parameter is greater than 0 01 in absolute value
244. ster that is let X Xj X x and Y Yj Yk corresponding to the K clusters Let n be the number of responses for cluster i and denote by N ya n i the total number of observations Denote by A the n x n diagonal matrix with V uij as the jth diagonal element If there is a WEIGHT statement the diagonal element of A is V uij wij where wij is the specified weight of the jth observation in the ith cluster Let B the N x N diagonal matrix with g j1 as diagonal elements i 1 K jJ 1 nj Let Bi the ni x n diagonal matrix corresponding to cluster i with g j1 as the jth diagonal element Let W be the N x N block diagonal weight matrix whose ith block corresponding to the ith cluster is the n Xx n matrix Wei BA R7 amp A7 Bo where R is the working correlation matrix for cluster i Let Qi Xi X WX X where X is the n x p design matrix corresponding to cluster i Define the adjusted residual vector as E BY and E B Y i the estimated residual for the ith cluster Let the subscript i denote estimates evaluated without the ith cluster it estimates evaluated using all the data except the tth observation of the ith cluster and let i t denote matrices corresponding to the ith cluster without the tth observation The following statistics are available for generalized estimating equation models CH CLUSTERH CLEVERAGE The leverage of cluster 7 is contained
245. t ChiSq Intercept 175 1536 car 107 4620 2 67 69 lt 0001 age 2 8207 1 104 64 lt 0001 In the table for Type 1 analysis displayed in Figure 37 5 each entry in the deviance column represents the deviance for the model containing the effect for that row and all effects preceding it in the table For example the deviance corresponding to car in the table is the deviance of the model containing an intercept and car As more terms are included in the model the deviance decreases Entries in the chi square column are likelihood ratio statistics for testing the significance of the effect added to the model containing all the preceding effects The chi square value of 67 69 for car represents twice the difference in log likelihoods between fitting a model with only an intercept term and a model with an intercept and car Since the scale parameter is set to 1 in this analysis this is equal to the difference in deviances Since two additional parameters are involved this statistic can be compared with a chi square distribution with two degrees of freedom The resulting p value labeled Pr gt Chi of less than 0 0001 indicates that this variable is highly significant Similarly the chi square value of 104 64 for age represents the difference in log likelihoods between the model with the intercept and car and the model with the intercept car and age This effect is also highly significant as indicated by the small p value 2440 Chapter 37 Th
246. t options gt EXACTOPTIONS options FREQ FREQUENCY variable FWDLINK variable expression INVLINK variable expression LSMEANS lt model effects gt lt options gt LSMESTIMATE model effect lt label gt values lt divisor n gt lt lt label gt values lt divisor n gt gt lt options gt MODEL response lt effects gt lt options gt OUTPUT lt OUT SAS data set gt lt keyword name keyword name gt Programming statements REPEATED SUBJECT subject effect lt options gt SLICE model effect lt options gt STORE lt OUT gt item store name lt LABEL label gt STRATA variable lt option gt lt variable lt option gt gt lt options gt WEIGHT SCWGT variable VARIANCE variable expression ZEROMODEL lt effects gt lt options gt The ASSESS BAYES BY CLASS CONTRAST DEVIANCE ESTIMATE FREQUENCY FWDLINK INVLINK MODEL OUTPUT programming statements REPEATED VARIANCE WEIGHT and ZEROMODEL statements are described in full after the PROC GENMOD statement in alphabetical order The EFFECTPLOT LSMEANS LSMESTIMATE SLICE and STORE statements are common to many procedures Summary descriptions of functionality and syntax for these statements are also given after the PROC GENMOD statement in alphabetical order and full documentation about them is available in Chapter 19 Shared Concepts and Topics The PROC GENMOD statement invoke
247. t value where is the value of the log likelihood function at iteration 7 By default ABSFCONV 1E 12 You can also specify the FCONV and XCONV criteria optimizations are terminated as soon as one criterion is satisfied ADDTOBS adds the observed sufficient statistic to the sampled exact distribution if the statistic was not sampled This option has no effect unless the METHOD NETWORKMC option is specified and the ESTIMATE option is specified in the EXACT statement If the observed statistic has not been sampled then the parameter estimate does not exist by specifying this option you can produce biased estimates BUILDSUBSETS builds every distribution for sampling By default some exact distributions are created by tak ing a subset of a previously generated exact distribution When the METHOD NETWORKMC EXACTOPTIONS Statement 2485 option is invoked this subsetting behavior has the effect of using fewer than the desired n samples see the N option for more details Use the BUILDSUBSETS option to suppress this subsetting EPSILON value controls how the partial sums ix are compared value must be between 0 and 1 by default value 1E 8 FCONV value specifies the relative function convergence criterion Convergence requires a small relative change in the log likelihood function in subsequent iterations l li 1 lt val kalt ES value where is the value of the log likelihood at iteration i B
248. t REPEATED NObs Number of observations Default summary NonEst Nonestimable rows of con CONTRAST Default trasts ObStats Observation wise statistics MODEL OBSTATS CL PREDICTED RESIDUALS XVARS ParameterEstimates Parameter estimates MODEL Default without REPEATED PRINTMLE with REPEATED ParmInfo Parameter indices MODEL Default ResponseProfiles Frequency counts for multi MODEL DIST MULTINOMIAL nomial and binary models DIST BINOMIAL Typel Type tests MODEL TYPE Type3 Type 3 tests MODEL TYPE3 ZeroParameterEstimates Parameter estimates for zero ZEROMODEL Default inflated model Table 37 9 ODS Tables Produced in PROC GENMOD for a Bayesian Analysis ODS Table Name Description Statement Option AutoCorr Autocorrelations of the pos BAYES Default terior samples ClassLevels Classification variable levels CLASS Default CoeffPrior Prior distribution of the re BAYES Default gression coefficients ConvergenceStatus Convergence status of maxi MODEL Default mum likelihood estimation Corr Correlation matrix of the BAYES SUMMARY CORR posterior samples ESS Effective sample size BAYES Default FitStatistics Fit statistics BAYES Default Gelman Gelman and Rubin conver BAYES DIAG GELMAN gence diagnostics Geweke Geweke convergence diag BAYES Default nostics Heidelberger Heidelberger and Welch con BAYES DIAG HEIDELBERGER vergence diagnostics Initial Values Initial values of the Markov BAYES Default chains Table 37 9 continued
249. t an intercept term of number is included in the model If you specify a multinomial model for ordinal data you can specify a number list for the multiple intercepts in the model ITPRINT displays the iteration history for all iterative processes parameter estimation fitting constrained models for contrasts and Type 3 analyses and profile likelihood confidence intervals The last evaluation of the gradient and the negative of the Hessian second derivative matrix are also displayed for parameter estimation If you perform a Bayesian analysis by specifying the BAYES statement the iteration history for computing the mode of the posterior distribution is also displayed This option might result in a large amount of displayed output especially if some of the optional iterative processes are selected LINK keyword specifies the link function to use in the model The keywords and their associated built in link functions are as follows LINK Link Function CUMCLL CCLL Cumulative complementary log log CUMLOGIT CLOGIT Cumulative logit CUMPROBIT CPROBIT Cumulative probit CLOGLOG CLL Complementary log log IDENTITY ID Identity LOG Log LOGIT Logit PROBIT Probit POWER number POW number Power with A number MODEL Statement 2495 If no LINK option is supplied and there is a user defined link function the user defined link function is used If you specify neither the LINK option nor a user defined link function then
250. ta Second Edition New York John Wiley amp Sons Liang K Y and Zeger S L 1986 Longitudinal Data Analysis Using Generalized Linear Models Biometrika 73 13 22 Lin D Y Wei L J and Ying Z 2002 Model Checking Techniques Based on Cumulative Residuals Biometrics 58 1 12 Lipsitz S H Fitzmaurice G M Orav E J and Laird N M 1994 Performance of Generalized Estimating Equations in Practical Situations Biometrics 50 270 278 2628 Chapter 37 The GENMOD Procedure Lipsitz S H Kim K and Zhao L 1994 Analysis of Repeated Categorical Data Using Generalized Estimating Equations Statistics in Medicine 13 1149 1163 Littell R C Freund R J and Spector P C 1991 SAS System for Linear Models Third Edition Cary NC SAS Institute Inc Long J S 1997 Regression Models for Categorical and Limited Dependent Variables Thousand Oaks CA Sage Publications McCullagh P 1983 Quasi likelihood Functions Annals of Statistics 11 59 67 McCullagh P and Nelder J A 1989 Generalized Linear Models Second Edition London Chapman amp Hall Mehta C R Patel N and Senchaudhuri P 1992 Exact Stratified Linear Rank Tests for Ordered Categorical and Binary Data Journal of Computational and Graphical Statistics 1 21 40 Miller M E Davis C S and Landis J R 1993 The Analysis of Longitudinal Polytomous
251. tement creates a new SAS data set that contains all the variables in the input data set and optionally the estimated linear predictors XBETA and their standard error estimates the weights for the Hessian matrix predicted values of the mean confidence limits for predicted values residuals and case deletion diagnostics Residuals and diagnostic statistics are not computed for multinomial models You can also request these statistics with the OBSTATS PREDICTED RESIDUALS DIAGNOS TICS INFLUENCE CL or XVARS option in the MODEL statement You can then create a SAS data set containing them with ODS OUTPUT commands You might prefer to specify the OUTPUT statement for requesting these statistics since the following are true e The OUTPUT statement produces no tabular output e The OUTPUT statement creates a SAS data set more efficiently than ODS This can be an advantage for large data sets e You can specify the individual statistics to be included in the SAS data set If you use the multinomial distribution with one of the cumulative link functions for ordinal data the data set also contains variables named _ORDER_ and _LEVEL_ that indicate the levels of the ordinal response variable and the values of the variable in the input data set corresponding to the sorted levels These variables indicate that the predicted value for a given observation is the probability that the response variable is as large as the value of the _LEVEL_ variable
252. ter is often used to indicate overdispersion or under dispersion this estimate might also indicate other problems such as an incorrectly specified model or outliers in the data You should carefully assess whether this type of model is appropriate for your data 2522 Chapter 37 The GENMOD Procedure Specification of Effects Each term in a model is called an effect Effects are specified in the MODEL statement You specify effects with a special notation that uses variable names and operators There are two types of variables classification or CLASS variables and continuous variables There are two primary types of operators crossing and nesting A third type the bar operator is used to simplify effect specification Crossing is the type of operator most commonly used in generalized linear models Variables that identify classification levels are called CLASS variables in SAS and are identified in a CLASS statement These might also be called categorical qualitative discrete or nominal variables CLASS variables can be either character or numeric The values of CLASS variables are called levels For example the CLASS variable Sex could have the levels male and female In a model an explanatory variable that is not declared in a CLASS statement is assumed to be continuous Continuous variables must be numeric For example the heights and weights of subjects in an experiment are continuous variables The types of effec
253. teration of the Gibbs sampler After one iteration you have 9 eag ah After n iterations you have co See py PROC GENMOD implements the ARMS algorithm provided by Gilks 2003 to draw a sample from a full conditional distribution See the section Adaptive Rejection Sampling Algorithm on page 152 for more information about the ARMS algorithm The ARMS algorithm is the default method used to sample from the posterior distribution except in the case of a normal distribution with a conjugate prior in which case a closed form is available for the posterior distribution See any of the introductory references in Chapter 7 Introduction to Bayesian Analysis Procedures for a discussion of conjugate prior distributions for a linear model with the normal distribution Gamerman Algorithm The Gamerman algorithm unlike a Gibbs sampling algorithm samples parameters from their multivariate posterior conditional distribution The algorithm uses the structure of generalized linear models to efficiently sample from the posterior distribution of the model parameters For a detailed description and explanation of the algorithm see Gamerman 1997 and the section Gamerman Algorithm on page 153 2550 Chapter 37 The GENMOD Procedure Independence Metropolis Algorithm The independence Metropolis algorithm is another sampling algorithm that draws multivariate samples from the posterior distribution See the section Independence Sample
254. tes OF RESIdUAISs e s 4S Rice mw as Sr ee es FO ee ee Pd 2597 Example 37 9 Assessment of a Marginal Model for Dependent Data 2604 Example 37 10 Bayesian Analysis of a Poisson Regression Model 2608 Example 37 11 Exact Poisson Regression 2622 Referenden ese eye G Soest ea be ee Bde bat wl Alki Henle ae alee ht a e eae 2626 Overview GENMOD Procedure 2429 Overview GENMOD Procedure The GENMOD procedure fits generalized linear models as defined by Nelder and Wedderburn 1972 The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions Many widely used statistical models are generalized linear models These include classical linear models with normal errors logistic and probit models for binary data and log linear models for multinomial data Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution See McCullagh and Nelder 1989 for a discussion of statistical modeling using generalized linear models The books by Aitkin et al 1989 and Dobson 1990 are also excellent references with many examples of applications of generalized linear models Firth 1991 pro
255. th its asymptotic chi square with 11 degrees of freedom distribution you find that the p value is 0 084 This indicates that the specified model fits the data reasonably well Output 37 11 1 Unconditional Goodness of Fit Criteria The GENMOD Procedure Criteria For Assessing Goodness Of Fit Criterion DF Value Value DF Deviance 11 10 9363 0 9942 Scaled Deviance 11 10 9363 0 9942 Pearson Chi Square 11 9 3722 0 8520 Scaled Pearson X2 11 9 3722 0 8520 Log Likelihood 7 2408 Full Log Likelihood 12 9038 AIC smaller is better 41 8076 AICC smaller is better 56 2076 BIC smaller is better 49 3631 2624 Chapter 37 The GENMOD Procedure From the Analysis Of Parameter Estimates table in Output 37 11 2 you can see that only two of the Heat parameters are deemed significant Looking at the standard errors you can see that the uncondi tional analysis had convergence difficulties with the Heat 7 parameter Standard Error 264324 6 which means you cannot fit this unconditional Poisson regression model to this data Output 37 11 2 Unconditional Maximum Likelihood Parameter Estimates Parameter Intercept Heat Heat Heat Soak Soak Soak Soak Scale Analysis Of Maximum Likelihood Parameter Estimates DF 1 7 1 14 1 27 1 1 1 1 7 1 2 2 1 2 8 1 0 Estimate 1 5700 27 6129 3 0107 1 7180 0 2454 0 5572 0 4079 0 1301 1 0000 Standard Error 1 1657 264324 6 1 0025 7691 1455
256. th two rows the first row has coefficients 1 for the first level of A 1 for the second level of A and zeros for all levels of B the second row has coefficients 0 for all levels of A 1 for the first level of B and 1 for the second level of B effect identifies an effect that appears in the MODEL statement The value INTERCEPT or intercept can be used as an effect when an intercept is included in the model You do not need to include all effects that are included in the MODEL statement values are constants that are elements of the L vector associated with the effect The rows of L are specified in order and are separated by commas If you use the default less than full rank PROC GLM CLASS variable parameterization each row of the L matrix is checked for estimability If PROC GENMOD finds a contrast to be nonestimable it displays missing values in corresponding rows in the results See Searle 1971 for a discussion of estimable functions If the elements of L are not specified for an effect that contains a specified effect then the elements of the specified effect are distributed over the levels of the higher order effect just as the GLM procedure does for its CONTRAST and ESTIMATE statements For example suppose that the model contains effects A and B and their interaction A B If you specify a CONTRAST statement involving A alone the L matrix contains nonzero terms for both A and A B since A B contains A When you use
257. the LOGIT link is used User defined link functions are not allowed Details GENMOD Procedure Generalized Linear Models Theory This is a brief introduction to the theory of generalized linear models Response Probability Distributions In generalized linear models the response is assumed to possess a probability distribution of the exponential form That is the probability density of the response Y for continuous response variables Generalized Linear Models Theory 2511 or the probability function for discrete responses can be expressed as _ y8 b f y exp aay c y for some functions a b and c that determine the specific distribution For fixed this is a one parameter exponential family of distributions The functions a and c are such that a w and c c y w where w is a known weight for each observation A variable representing w in the input data set can be specified in the WEIGHT statement If no WEIGHT statement is specified w for all observations Standard theory for this type of distribution gives expressions for the mean and variance of Y E Y b 0 Var Y 2 Oe where the primes denote derivatives with respect to 0 If u represents the mean of Y then the variance expressed as a function of the mean is Var Y we where V is the variance function Probability distributions of the response Y in generalized linear models are usually parameterized in terms of th
258. the automatic variable _MEAN_ to represent the mean in the expression Alternatively you can define the variance function with programming statements as detailed in the section Programming Statements on page 2502 This form is convenient for using complex statements such as IF THEN ELSE clauses Derivatives of the variance function for use during optimization are computed automatically The DEVIANCE statement must also appear when the VARIANCE statement is used to define the variance function WEIGHT Statement WEIGHT SCWGT variable The WEIGHT statement identifies a variable in the input data set to be used as the exponential family dispersion parameter weight for each observation The exponential family dispersion parameter is divided by the WEIGHT variable value for each observation This is true regardless of whether the parameter is estimated by the procedure or specified in the MODEL statement with the SCALE option It is also true for distributions such as the Poisson and binomial that are not usually defined to have a dispersion parameter For these distributions a WEIGHT variable weights the overdispersion parameter which has the default value of 1 The WEIGHT variable does not have to be an integer if it is less than or equal to 0 or if it is missing the corresponding observation is not used 2510 Chapter 37 The GENMOD Procedure ZEROMODEL Statement ZEROMODEL effects lt options gt The ZEROMODEL stat
259. the default canonical link function is used if you specify the DIST option Otherwise if you omit the DIST option the identity link function is used The cumulative link functions are appropriate only for the multinomial distribution LRCI requests that two sided confidence intervals for all model parameters be computed based on the profile likelihood function This is sometimes called the partially maximized likelihood function See the section Confidence Intervals for Parameters on page 2525 for more information about the profile likelihood function This computation is iterative and can consume a relatively large amount of CPU time The confidence coefficient can be selected with the ALPHA number option The resulting confidence coefficient is 1 number The default confidence coefficient is 0 95 MAXITER number MAXIT number sets the maximum allowable number of iterations for all iterative computation processes in PROC GENMOD By default MAXITER 50 NOINT requests that no intercept term be included in the model An intercept is included unless this option is specified NOSCALE holds the scale parameter fixed Otherwise for the normal inverse Gaussian and gamma distributions the scale parameter is estimated by maximum likelihood If you omit the SCALF option the scale parameter is fixed at the value 1 OBSTATS specifies that an additional table of statistics be displayed Formulas for the statistics are given in the s
260. then the scaled deviance is defined by D y p 2 y y ly For specific distributions this can be expressed as _ Py where D is the deviance The following table displays the deviance for each of the probability distributions available in PROC GENMOD The deviance cannot be directly calculated for zero inflated models Twice the negative of the log likelihood is reported instead of the proper deviance for the zero inflated Poisson and zero inflated negative binomial D y m Distribution Deviance Normal wii ui Poisson 2 Wi vi log ji mi Binomial 2J wimi E log 1 yj log z Gamma 20 wi log ze Inverse Gaussian o OD Multinomial 2 DL Wi dij log i Negative binomial 2 gt gt log y u y wi k log sed wi log a 1 aj exp p4i yi 0 Zero inflated Poisson a2 w log 1 wi y log ui ui log D yee logla A 0 A yi 0 log 1 wi yi log 4 yi M4 log 1 Z log meii i Zero inflated negative binomial 2 Toi DIr K In the binomial case y r mj where r is a binomial count and m is the binomial number of trials parameter Generalized Linear Models Theory 2519 In the multinomial case y refers to the observed number of occurrences of the jth category for the ith subpopulation defined by the AGGREGATE variable m is the total number in the ith subpopulation and p is the
261. tics are not available for the multinomial distribution MODEL Statement 2497 The RESIDUALS DIAGNOSTICS INFLUENCE PREDICTED XVARS and CL options cause only subgroups of the observation statistics to be displayed You can specify more than one of these options to include different subgroups of statistics The D variable option causes the values of variable in the input data set to be displayed in the table If an explicit format for variable has been defined the formatted values are displayed If a REPEATED statement is present a table is displayed for the GEE model specified in the REPEATED statement Regression variables response values predicted values confidence limits for the predicted values linear predictor raw residuals Pearson residuals for each observation in the input data set are available Case deletion diagnostic statistics are available for each observation and for each cluster OFFSET variable specifies a variable in the input data set to be used as an offset variable This variable cannot be a CLASS variable and it cannot be the response variable or one of the explanatory variables An OFFSET variable is required when you perform an exact Poisson regression Let 0 be the offset for the ith observation Then exp o0 should be a nonnegative integer which is greater than or equal to the response value If exp o is not an integer then the integer part is used See the section Exact Logistic and Poisson Regr
262. tions available in PROC GENMOD for handling the exponential distribution dispersion parameter The NOSCALE and SCALE options in the MODEL statement affect the way in which the dispersion parameter is treated If you specify the SCALE DEVIANCE option the dispersion parameter is estimated by the deviance divided by its degrees of freedom If you specify the SCALE PEARSON option the dispersion parameter is estimated by Pearson s chi square statistic divided by its degrees of freedom Otherwise values of the SCALE and NOSCALE options and the resultant actions are displayed in the following table NOSCALE SCALE value Action Present Present Scale fixed at value Present Not present Scale fixed at 1 Not present Not present Scale estimated by ML Not present Present Scale estimated by ML starting point at value Present negative binomial Not present k fixed at 0 The meaning of the scale parameter displayed in the Analysis Of Parameter Estimates table is different for the gamma distribution than for the other distributions The relation of the scale parameter as used by PROC GENMOD to the exponential family dispersion parameter is displayed in the following table For the binomial and Poisson distributions is the overdispersion parameter as defined in the Overdispersion section which follows Distribution Scale Normal Vo Inverse Gaussian Gamma 1 Binomial Job Poisson Job In the case of the negative binomial dis
263. tor 1991 Generalized score tests for Type III contrasts are computed for GEE models if you specify the TYPE3 option in the MODEL statement when a REPEATED statement is also used See the section Generalized Score Statistics on page 2540 for more information about generalized score statistics Wald tests are also available with the Wald option in the CONTRAST statement In this case the robust covariance matrix estimate is used for amp in the Wald statistic Confidence Intervals for Parameters Likelihood Ratio Based Confidence Intervals PROC GENMOD produces likelihood ratio based confidence intervals also known as profile likelihood confidence intervals for parameter estimates for generalized linear models These are not computed for GEE models since there is no likelihood for this type of model Suppose that the parameter vector is B Bo 1 Bp and that you want a confidence interval for 6 The profile likelihood function for f is defined as 8 ERA where is the vector B with the jth element fixed at 8 j and is the log likelihood function If l B is the log likelihood evaluated at the maximum likelihood estimate then 2 B has a limiting chi square distribution with one degree of freedom if 6 is the true parameter value A 1 w 100 confidence interval for is By 1 B lo 1 0 577_43 2526 Chapter 37 The GENMOD Procedure where Mies is the 100 1 th percenti
264. totic unconditional Poisson regression model to the data The variable Notready is specified as the response variable and the continuous predictors Heat and Soak are defined in the CLASS statement as categorical predictors that use reference coding Specifying the offset variable as InTotal enables you to model the ratio Notready Total proc genmod data ingots class Heat Soak param ref model Notready Heat Soak offset lnTotal dist poisson link log exact Heat Soak joint estimate exactoptions statustime 10 run Example 37 11 Exact Poisson Regression 2623 The EXACT statement is specified to additionally fit an exact conditional Poisson regression model Specifying the InTotal offset variable models the ratio Notready Total in this case the Total variable contains the largest possible response value for each observation The JOINT option produces a joint test for the significance of the covariates along with the usual marginal tests The ESTIMATE option produces exact parameter estimates for the covariates The STATUSTIME 10 option is specified in the EXACTOPTIONS statement for monitoring the progress of the results this example can take several minutes to complete due to the JOINT option If you run out of memory see the SAS Companion for your system for information about how to increase the available memory The Criteria For Assessing Goodness Of Fit table is displayed in Output 37 11 1 Comparing the deviance of 10 9363 wi
265. tribution PROC GENMOD reports the dispersion parameter estimated by maximum likelihood This is the negative binomial parameter k defined in the section Response Probability Distributions on page 2510 Generalized Linear Models Theory 2521 Overdispersion Overdispersion is a phenomenon that sometimes occurs in data that are modeled with the binomial or Poisson distributions If the estimate of dispersion after fitting as measured by the deviance or Pearson s chi square divided by the degrees of freedom is not near 1 then the data might be overdispersed if the dispersion estimate is greater than or underdispersed if the dispersion estimate is less than 1 A simple way to model this situation is to allow the variance functions of these distributions to have a multiplicative overdispersion factor e Binomial V w du n e Poisson V u du An alternative method to allow for overdispersion in the Poisson distribution is to fit a negative binomial distribution where V w u kp instead of the Poisson The parameter k can be estimated by maximum likelihood thus allowing for overdispersion of a specific form This is different from the multiplicative overdispersion factor which can accommodate many forms of overdispersion The models are fit in the usual way and the parameter estimates are not affected by the value of The covariance matrix however is multiplied by and the scaled deviance and log li
266. ts 2522 design matrix 2523 deviance 2491 deviance definition 2433 deviance residuals 2529 diagnostics 2493 2544 dispersion parameter 2520 dispersion parameter estimation 2432 2524 2525 dispersion parameter weights 2509 effect specification 2522 estimability checking 2479 events trials format for response 2491 2513 exact logistic regression 2482 2553 exact Poisson regression 2482 2507 2553 2622 expected information matrix 2517 exponential distribution 2582 F statistics 2526 2527 Fisher s scoring method 2498 2517 gamma distribution 2512 GEE 2429 2453 2503 2532 2586 2589 2592 generalized estimating equations GEE 2429 generalized linear model 2431 geometric distribution 2512 goodness of fit 2517 gradient 2516 Hessian matrix 2516 information matrix 2498 initial values 2493 2504 intercept 2433 2436 2495 inverse Gaussian distribution 2512 Lagrange multiplier statistics 2527 life data 2579 likelihood residuals 2529 linear model 2430 linear predictor 2429 2430 2436 2523 2560 link function 2429 2431 2514 log likelihood functions 2515 log linear models 2435 logistic regression 2574 main effects 2522 maximum likelihood estimation 2516 _MEAN_ automatic variable 2502 model checking 2597 2604 multinomial distribution 2513 multinomial models 2529 negative binomial distribution 2512 nested effects 2522 Newton Raphson algorithm 2516 norma
267. ts most useful in generalized linear models are shown in the following list Assume that A B and C are classification variables and that X1 and X2 are continuous variables Regressor effects are specified by writing continuous variables by themselves X1 X2 Polynomial effects are specified by joining two or more continuous variables with asterisks X1 X2 e Main effects are specified by writing classification variables by themselves A B C Crossed effects interactions are specified by joining two or more classification variables with asterisks A B B C A B C Nested effects are specified by following a main effect or crossed effect with a classification variable or list of classification variables enclosed in parentheses B A C B A A B C In the preceding example B A is B nested within A Combinations of continuous and classification variables can be specified in the same way by using the crossing and nesting operators The bar operator consists of two effects joined with a vertical bar I It is shorthand notation for including the left hand side the right hand side and the cross between them as effects in the model For example A B is equivalent to A B A B The effects in the bar operator can be classification variables continuous variables or combinations of effects defined using operators Multiple bars are permitted For example A B C means A B C A B A C B C A B C You can specify the maximum number o
268. uated with the parameter estimates under the working correlation of interest as K ni O B R X 9 O B R 6 Vij Xis i 1j 1 where the quasi likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi likelihood Functions on page 2539 and B R are the parameter estimates obtained from GEEs with the working correlation of interest R QIC is defined as OIC R 20 B R p 2trace Q7 Vp where Vr is the robust covariance estimate and I is the inverse of the model based covariance estimate under the independent working correlation assumption evaluated at B R the parameter estimates obtained from GEEs with the working correlation of interest R PROC GENMOD also computes an approximation to O C R defined by Pan 2001 as QICy R 20 B R 2p where p is the number of regression parameters Pan 2001 notes that QIC is appropriate for selecting regression models and working correlations whereas QIC is appropriate only for selecting regression models Quasi likelihood Functions See McCullagh and Nelder 1989 and Hardin and Hilbe 2003 for discussions of quasi likelihood functions The contribution of observation j in cluster i to the quasi likelihood function evaluated 2540 Chapter 37 The GENMOD Procedure at the regression parameters is given by Q B Vij Xij Qij where Qj is defined in the following list These are used in the computation of the quas
269. uested trace plots autocorrelation function plots and kernel density plots By default the plots are displayed in panels unless the global plot option UNPACK is specified Also when you are specifying more than one type of plots the plots are displayed by parameters unless the global plot option GROUPBY is specified When you specify only one plot request you can omit the parentheses around the plot request For example 2470 Chapter 37 The GENMOD Procedure plots none plots unpack trace plots trace autocorr You must enable ODS Graphics before requesting plots For example the following SAS statements enable ODS Graphics ods graphics on proc genmod model y x bayes plots trace run end ods graphics off The global plot options are as follows FRINGE creates a fringe plot on the X axis of the density plot GROUPBY PARAMETER GROUPBY TYPE specifies how the plots are grouped when there is more than one type of plot GROUPBY TYPE specifies that the plots be grouped by type GROUPBY PARAMETER specifies that the plots be grouped by parameter GROUPB Y PARAMETER is the default LAGS n specifies that autocorrelations be plotted up to lag n If this option is not specified autocorrelations are plotted up to lag 50 SMOOTH displays a fitted penalized B spline curve for each trace plot UNPACKPANEL UNPACK specifies that all paneled plots be unpacked meaning that each plot in a panel is displayed sep
270. unction is chosen as displayed in the following table If you specify no distribution and no link function then the GENMOD procedure defaults to the normal distribution with the identity link function DIST Distribution Default Link Function BINOMIAL BIN B Binomial Logit GAMMA GAM 1G Gamma Inverse power 1 GEOMETRIC GEOM Geometric Log IGAUSSIAN IG Inverse Gaussian Inverse squared power 2 MULTINOMIAL MULT Miultinomial Cumulative logit NEGBIN NB Negative binomial Log NORMAL NORIN Normal Identity POISSON POI P Poisson Log ZIP Zero inflated Poisson Log logit ZINB Zero inflated negative binomial Log logit EXPECTED requests that the expected Fisher information matrix be used to compute parameter estimate covariances and the associated statistics The default action is to use the observed Fisher information matrix This option does not affect the model fitting only the way in which the covariance matrix is computed see the SCORING option ID variable causes the values of variable in the input data set to be displayed in the OBSTATS table If an explicit format for variable has been defined the formatted values are displayed If the OBSTATS option is not specified this option has no effect INITIAL numbers sets initial values for parameter estimates in the model The default initial parameter values are weighted least squares estimates based on using the response data as the initial mean estimate This optio
271. ure 37 4 Analysis of Parameter Estimates Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95 Confidence Wald Parameter DF Estimate Error Limits Chi Square Intercept 1 1 3168 0 0903 1 4937 1 1398 212 73 car large 1 1 7643 0 2724 2 2981 1 2304 41 96 car medium 1 0 6928 0 1282 0 9441 0 4414 29 18 car small 0 0 0000 0 0000 0 0000 0 0000 age 1 1 1 3199 0 1359 1 5863 1 0536 94 34 age 2 0 0 0000 0 0000 0 0000 0 0000 Scale 0 1 0000 0 0000 1 0000 1 0000 Analysis Of Maximum Likelihood Parameter Estimates Parameter Pr gt ChiSq Intercept lt 0001 car large lt 0001 car medium lt 0001 car small A age 1 lt 0001 age 2 Scale NOTE The scale parameter was held fixed Figure 37 4 displays the Analysis Of Parameter Estimates table which summarizes the results of the iterative parameter estimation process For each parameter in the model PROC GENMOD displays columns with the parameter name the degrees of freedom associated with the parameter the estimated parameter value the standard error of the parameter estimate the confidence intervals and the Wald chi square statistic and associated p value for testing the significance of the parameter to the model If a column of the model matrix corresponding to a parameter is found to be linearly dependent or aliased with columns corresponding to parameters preceding it in the model PROC GENMOD assigns it zero degrees of freedom and displays a v
272. ust be listed in the CLASS statement The input data set does not need to be sorted by subject see the SORTED option The options control how the model is fit and what output is produced You can specify the following options after a slash ALPHAINIT numbers specifies initial values for log odds ratio regression parameters if the LOGOR option is specified for binary data If this option is not specified an initial value of 0 01 is used for all the parameters CONVERGE number specifies the convergence criterion for GEE parameter estimation If the maximum absolute difference between regression parameter estimates is less than the value of number on two successive iterations convergence is declared If the absolute value of a regression parameter estimate is greater than 0 08 then the absolute difference normalized by the regression parameter value is used instead of the absolute difference The default value of number is 0 0001 2504 Chapter 37 The GENMOD Procedure CORRW displays the estimated working correlation matrix If you specify an exchangeable working correlation structure with the CORR EXCH option the CORRW option is not needed to view the estimated correlation since a table is printed by default that contains the single estimated correlation CORRB displays the estimated regression parameter correlation matrix Both model based and empiri cal correlations are displayed COVB displays the estimated regression para
273. ut Delivery System ODS to select tables and create output data sets These names are listed separately in Table 37 8 for a maximum likelihood analysis in Table 37 9 for a Bayesian analysis and in Table 37 10 for an Exact analysis For more information about ODS see Chapter 20 Using the Output Delivery System Table 37 8 ODS Tables Produced in PROC GENMOD for a Classical Analysis ODS Table Name Description Statement Option AssessmentSummary Model assessment summary ASSESS Default ClassLevels Classification variable levels CLASS Default Contrasts Tests of contrasts CONTRAST Default ContrastCoef Contrast coefficients CONTRAST E ConvergenceStatus Convergence status MODEL Default CorrB Parameter estimate correla MODEL CORRB tion matrix CovB Parameter estimate covari MODEL COVB ance matrix Estimates Estimates of contrasts ESTIMATE Default Table 37 8 continued ODS Table Name EstimateCoef GEEEmpPEst GEEFFitCriteria GEELogORInfo GEEModInfo GEEModPEst GEENCorr GEENCov GEERCorr GEERCov GEEWCorr IterContrasts IterLRCI IterParms IterParmsGEE IterType3 LRCI LSMeanCoef LSMeanDiffs LSMeans LagrangeStatistics LastGEEGrad LastGradHess Description Contrast coefficients GEE parameter estimates with empirical standard er rors GEE QIC fit criteria GEE log odds ratio model information GEE model information GEE parameter estimates with model based standard errors GEE model based correla t
274. values for the individual parameter estimates and odds ratios The one sided p value is the smaller of the left and right tail probabilities for the observed sufficient statistic of the parameter under the null hypothesis that the parameter is zero The two sided p values default are twice the one sided p values See the section Exact Logistic and Poisson Regression on page 2553 for more details OUTDIST SAS data set names the SAS data set that contains the exact conditional distributions This data set contains all of the exact conditional distributions that are required to process the corresponding EXACT statement This data set contains the possible sufficient statistics for the parameters of the effects specified in the EXACT statement the counts and when hypothesis tests are performed on the parameters the probability of occurrence and the score value for each sufficient statistic When you request an OUTDIST data set the observed sufficient statistics are displayed in the Sufficient Statistics table See the section OUTDIST Output Data Set on page 2554 for more information EXACT Statement Examples In the following example two exact tests are computed one for x1 and the other for x2 The test for x1 is based on the exact conditional distribution of the sufficient statistic for the x1 parameter given the observed values of the sufficient statistics for the intercept x2 and x3 parameters likewise the test for x2 i
275. variable is numeric then the ORDER option in the CLASS statement is ignored and the internal unformatted values are used See the section Other Parameterizations on page 414 of Chapter 19 Shared Concepts and Topics for further details 2476 Chapter 37 The GENMOD Procedure REF level keyword specifies the reference level for PARAM EFFECT PARAM REFERENCE and their orthog onalizations For an individual but not a global variable REF option you can specify the level of the variable to use as the reference level Specify the formatted value of the variable if a format is assigned For a global or individual variable REF option you can use one of the following keywords The default is REF LAST FIRST designates the first ordered level as reference LAST designates the last ordered level as reference TRUNCATE lt n gt specifies the length n of CLASS variable values to use in determining CLASS variable levels The default is to use the full formatted length of the CLASS variable If you specify TRUNCATE without the length n the first 16 characters of the formatted values are used When formatted values are longer than 16 characters you can use this option to revert to the levels as determined in releases before SAS 9 The TRUNCATE option is available only as a global option Class Variable Naming Convention Parameter names for a CLASS predictor variable are constructed by concatenating the CLASS variable na
276. vides an overview of generalized linear models Myers Montgomery and Vining 2002 provide applications of generalized linear models in the engineering and physical sciences Collett 2003 and Hilbe 2009 provide comprehensive accounts of generalized linear models when the responses are binary The analysis of correlated data arising from repeated measurements when the measurements are assumed to be multivariate normal has been studied extensively However the normality assumption might not always be reasonable for example different methodology must be used in the data analysis when the responses are discrete and correlated Generalized estimating equations GEEs provide a practical method with reasonable statistical efficiency to analyze such data Liang and Zeger 1986 introduced GEEs as a method of dealing with correlated data when except for the correlation among responses the data can be modeled as a generalized linear model For example correlated binary and count data in many cases can be modeled in this way The GENMOD procedure can fit models to correlated responses by the GEE method You can use PROC GENMOD to fit models with most of the correlation structures from Liang and Zeger 1986 by using GEEs See Hardin and Hilbe 2003 Diggle Liang and Zeger 1994 and Lipsitz et al 1994 for more details on GEEs Bayesian analysis of generalized linear models can be requested by using the BAYES statement in the GENMOD procedure
277. w typical the observed W x is of the null distribution samples You can supplement the graphical inspection method with a Kolmogorov type supremum test Let Sj be the observed value of S sup W x The p value Pr S gt sj is approximated by Pr j sj where S j sup W x Pr j sj is estimated by generating realizations of W 1 000 is the default number of realizations You can check the link function instead of the jth covariate by using values of the linear predictor x B in place of values of the jth covariate x The graphical and numerical methods described previously are then sensitive to inadequacies in the link function An alternative aggregate of residuals is the moving sum statistic W x b yess lt x ei F If you specify the keyword WINDOW b then the moving sum statistic with window size b is used instead of the cumulative sum of residuals with I x b lt xij lt x replacing I x lt x in the earlier equation If you specify the keyword LOESS f loess smoothed residuals are used in the preceding ates where f is the fraction of the data to be used at a given point If f is not specified f 4 is used For data Y X i 1 n define r as the nearest integer to nf and h as the rth Smallest among X x i 1 n Let Ki x where Ki U K Define Assessment of Models Based on Aggregates of Residuals 2543 wi x Kj x S2 x Xi
278. with the REPEATED statement the test is based on a score statistic The GEE model is fit under the constraint that the linear function of the parameters defined by the contrast is equal to 0 The score chi square statistic is computed based on the generalized score function See the section Generalized Score Statistics on page 2540 for more information The degrees of freedom is the number of linearly independent constraints implied by the CONTRAST statement that is the rank of L You can specify the following options after a slash requests that the L matrix be displayed SINGULAR number EPSILON number tunes the estimability checking If v is a vector define ABS v to be the absolute value of the element of v with the largest absolute value Let K be any row in the contrast matrix L Define C to be equal to ABS K if ABS K is greater than 0 otherwise C equals 1 If ABS K K T is greater than Cxnumber then K is declared nonestimable T is the Hermite form matrix X X X X and X X represents a generalized inverse of the matrix X X The value for number must be between 0 and 1 the default value is IE 4 The SINGULAR option in the MODEL statement affects the computation of the generalized inverse of the matrix X X It might also be necessary to adjust this value for some data WALD requests that a Wald chi square statistic be computed for the contrast rather than the default likelihood ratio
279. without parentheses to include deletion diagnostics for all of the parameters in the model Although you can use the OUTPUT statement without any keyword name specifications the output data set then contains only the original variables and possibly the variables Level and Value if you use the multinomial model with ordinal data Note that the residuals and deletion diagnostics are not available for the multinomial model with ordinal data Some of the case deletion diagnostic statistics apply only to models for correlated data specified with a REPEATED statement If you request these statistics for ordinary generalized linear models the values of the corresponding variables are set to missing in the output data set Formulas for the statistics are given in the section Predicted Values of the Mean on page 2527 the section Residuals on page 2528 and the section Case Deletion Diagnostic Statistics on page 2544 The keywords allowed and the statistics they represent are as follows DFBETA DBETA represents the effect of deleting an observation on parameter estimates If you specify the keyword _all_ after the equal sign variables named DFBETA_ParameterName will be included in the output data set to contain the values of the diagnostic statistic to measure the influence of deleting a single observation on the individual parameter estimates ParameterName is the name of the regression model parameter formed from the input variable n
280. y u gt log f Oi Hi where the sum is over the observations The forms of the individual contributions l log f yi Mi are shown in the following list the parameterizations are expressed in terms of the mean and dispersion parameters For the discrete distributions binomial multinomial negative binomial and Poisson the functions computed as the sum of the l terms are not proper log likelihood functions since terms involving binomial coefficients or factorials of the observed counts are dropped from the computation of the log likelihood and a dispersion parameter is included in the computation Deletion of factorial terms and inclusion of a dispersion parameter do not affect parameter estimates or their estimated covariances for these distributions and this is the function used in maximum likelihood estimation Generalized Linear Models Theory 2515 The value of used in computing the reported log likelihood function is either the final estimated value or the fixed value if the dispersion parameter is fixed Even though it is not a proper log likelihood function in all cases the function computed as the sum of the terms is reported in the output as the log likelihood The proper log likelihood function is also computed as the sum of the ll terms in the following list and it is reported as the full log likelihood in the output Normal EEE mers yi Hi log 2 lozm 2 od Wi
281. y bad vb By default the response is sorted in increasing ASCII order Always check the Response Profiles table to verify that response levels are appropriately ordered The TYPE option requests a Type 1 test for the significance of the covariate brand 2584 Chapter 37 The GENMOD Procedure If y x Pr taste lt j is the cumulative probability of the jth or lower taste category then the odds ratio comparing x1 to x2 is as follows yj xi U yj 1 yj x2 1 y x2 exp x1 x2 See McCullagh and Nelder 1989 Chapter 5 for details on the cumulative logit model The ESTIMATE statements compute log odds ratios comparing each of brands The EXP option in the ESTIMATE statements exponentiates the log odds ratios to form odds ratio estimates Standard errors and confidence intervals are also computed Output 37 4 1 displays general information about the model and data the levels of the CLASS variable brand and the total number of occurrences of the ordered levels of the response variable taste Output 37 4 1 Ordinal Model Information The GENMOD Procedure Model Information Data Set WORK ICECREAM Distribution Multinomial Link Function Cumulative Logit Dependent Variable taste Frequency Weight Variable count Class Level Information Class Levels Values brand 3 icel ice2 ice3 Response Profile Ordered Total Value taste Frequency 1 vg 140 2 g 162 3 m 421 4 b 156 5 vb 166
282. y default FCONV 1E 8 You can also specify the ABSFCONV and XCONVS criteria if more than one criterion is specified then optimizations are terminated as soon as one criterion is satisfied MAXTIME seconds specifies the maximum clock time in seconds that PROC GENMOD can use to calculate the exact distributions If the limit is exceeded the procedure halts all computations and prints a note to the LOG The default maximum clock time is seven days METHOD keyword specifies which exact conditional algorithm to use for every EXACT statement specified You can specify one of the following keywords DIRECT invokes the multivariate shift algorithm of Hirji Mehta and Patel 1987 This method directly builds the exact distribution but it can require an excessive amount of memory in its intermediate stages METHOD DIRECT is invoked by default when you are conditioning out at most the intercept NETWORK invokes an algorithm described in Mehta Patel and Senchaudhuri 1992 This method builds a network for each parameter that you are conditioning out combines the networks then uses the multivariate shift algorithm to create the exact distribution The NETWORK method can be faster and require less memory than the DIRECT method The NETWORK method is invoked by default for most analyses NETWORKMC invokes the hybrid network and Monte Carlo algorithm of Mehta Patel and Senchaudhuri 1992 This method creates a network then samples from that
283. y less expensive than likelihood ratio Statistics but it is thought to be less accurate because the specified significance level of hypothesis tests based on the Wald statistic might not be as close to the actual significance level as it is for likelihood ratio tests A Type 3 analysis generalizes the use of Type III estimable functions in linear models Briefly a Type III estimable function contrast for an effect is a linear function of the model parameters that involves the parameters of the effect and any interactions with that effect A test of the hypothesis that the Type III contrast for a main effect is equal to 0 is intended to test the significance of the main effect in the presence of interactions See Chapter 39 The GLM Procedure and Chapter 15 The Four Types of Estimable Functions for more information about Type III estimable functions Also refer to Littell Freund and Spector 1991 Additional features of the GENMOD procedure include the following likelihood ratio statistics for user defined contrasts that is linear functions of the parameters and p values based on their asymptotic chi square distributions estimated values standard errors and confidence limits for user defined contrasts and least squares means ability to create a SAS data set corresponding to most tables displayed by the procedure see Table 37 8 and Table 37 9 e confidence intervals for model parameters based on either the profile likel

SAS/STAT 922 User's Guide: The GENMOD Procedure (Book Excerpt)

Contents

Download Pdf Manuals

Related Search

Related Contents