Home

Advanced Regression Analyses Tutorial

1. General Linear Models Options Sums of Squares Display DK Typel fy ield E Cancel Tupe lll v Constant in Model Include MANO Help ul Factar Error T ern elections C Residual A B Mone B C Residual Figure 4 4 The Completed Options Dialog Box 5 Click OK to redisplay the Analysis Summary shown in Figure 4 5 The values shown in the second ANOVA table match the results 1n the Milliken and Johnson 1984 study References Milliken G A and Johnson D E 1984 Analysis of Messy Data Volume 1 Designed Experiments New York Van Nostrand Reinhold 29 umber of dependent variables 1l Lh fo Row ooo umber of categorical factors 3 umber of quantitative factors O alysis of Variance for yield Source Sum of Squares Df Mean Square F BRatio P Value odel 162 02 ll 16 5475 7 85 0 0306 Residual 3 43 4 e 1075 Total iCorr 190 45 15 Source Sum of Squares Df Mean Square F BRatio P Value lock L3s1l 103 l 131 103 regime 40 15 3 13 3567 5 801 0 0914 ariety 2 25 l 2 25 L O7 0 0 3599 lock regime 6 9275 3 2 30917 Figure 4 5 Redisplay of the Analysis Summary with Results that Match Milliken and Johnson 1984 General Linear Models OF x 30 TUTORIAL 5 Creating and Using Repeated Measures Designs Repeated measures designs like split plot designs contain structures that involve more than one size of experimental unit For example you might measure
2. Je nl mj a nj ultiple Comparisons for Sand by Location ethod 95 0 percent LSD Location Homogeneous Groups 7 65167 14 6417 29 8167 37 6417 denotes a statistically significant difference Figure 2 3 The Table of Results for the Multiple Range Tests First look at the results for Sand by Location The table illustrates that the mean for each of the four locations 1s significantly different from each of 13 6 the other means Therefore each location forms a homogenous group by itself Also notice that in the second half of the table the program lists the pairwise differences which are known as contrasts If they are significantly different from zero they are marked with asterisks To this point all the steps you have taken and the results you have generated could have been accomplished using the Multifactor ANOVA Analysis in the base program The General Linear Models Analysis lets you formulate your own contrasts to test a more complicated hypothesis For example Is the estimated mean for Sand when averaged over the first and third locations significantly different from the mean for the fourth location To test this hypothesis use Pane Options Click the right mouse button on the text pane then the left on Pane Options to display the Multiple Comparisons Options dialog box Click the User Specified button accept the defaults for the remaining options on the dialog box and click OK to display the Hy
3. Axis Yanis Profle Title Loaittt2 v Vertical From 2 To Buy 0 5 Skip E SU MaPaower Hold mauumuuamuum Title Fonts Tickmark Fonte Cancel Apply Help Figure 11 5 Completed Y Axis Tab Page 0 at about 21 The price reduction that gives a probability greater than 05 Redemption is at least 21 Now you will create the confidence intervals for the estimated coefficients Creating Confidence Intervals 1 Click the Tabular Options button to display the dialog box then click the Confidence Intervals check box and OK to display the table Maximize the table see Figure 11 7 The confidence intervals for the coefficient estimates show that neither interval includes 0 so each parameter 1s significant at 95 percent The confidence intervals for the odds ratios indicate that for the long term a range of about 7 to 13 percent will cover 95 percent of the estimated odds ratios 107 O EA ero E l 0 IP Plot of Fitted Model 15 20 reduction Figure 11 6 The Logit Plot with New Y Axis Scaling HET BEMALEN ow H as INN Standard 04435 0 160976 2 55664 1 53205 0 0963336 n0 085491z2 0695265 0 124041 Figure 11 7 The Confidence Intervals Table At this point it would be helpful to look at the predictions 108 Creating and Viewing Predictions 1 Click the Tabular Options button to display the dialog box
4. ee Bideae Regression ependent variable body fat umber of complete cases 0 Variance Inflation Factor 7 40343 0 555353 1 10255 0 191627 1 01051 B Squared 77 2602 percent B Squared adjusted for d f 72 9965 percent Standard Error of Est Z 5988724 ean absolute error 1 927607 urbin Watzon statistic z 38078 Figure 10 12 Redisplayed Analysis Summary After Using New Ridge Parameter References Draper N and Smith H 1981 Applied Regression Analysis second edition New York John Wiley amp Sons Myers R H 1990 Classical and Modern Regression with Applications Belmont California Duxbury Press Neter J Kutner M H Nachtsheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc Vogt W P 1993 Dictionary of Statistics and Methodology New York Sage Publications 102 TUTORIAL 11 Analyzing Coupon Redemption Rate with Logistic Regression This tutorial 1s a study of the effectiveness of price reduction coupons on a given product using logistic regression it is adapted from Neter et al 1996 The analysts selected 1 000 homes and mailed product advertising material and coupons to each home Two hundred selected homes were randomly assigned to each of the price reduction categories The coupons offered five price reductions 5 10 15 20 and 30 The explanatory variable for this study was the X variab
5. Click Enter to enter the nested factor on the next line Click A in the Factors list box then click the arrow button to move the factor to the Effects list box Click the left parenthesis Nest to move the left parenthesis to the right of the A factor in the Effects list box Click B in the Factors list box then click the arrow button to move the factor to the Effects list box Click the right parenthesis Nest to move the right parenthesis to the right of the B factor in the Effects list box see Figure 3 3 21 12 13 14 GLH Model Specification Factors Effects A Chamber B Temperatur B C Gender C Cross B L A B se A on Mest LIF Cancel Delete Help Figure 3 3 The Completed GLM Model Specification Dialog Box If you make an error highlight the text you want to remove then click the Delete button Click OK to display the Analysis Summary and the Scatterplot in the Analysis window Maximize the Analysis Summary see Figure 3 4 The values for the mean squares agree with the results shown 1n Milliken and Johnson 1984 To see the results of the Temperatur Gender interaction you will create an Interaction Plot Click the Graphical Options button to display the dialog box then the Interaction Plot check box and OK to display the Interaction Plot in a graphics pane Maximize the plot see Figure 3 5 The interaction effect is shown by the crossed lines on the plot Th
6. Finding Good Values for Parameters 1 Choose SPECIAL ADVANCED REGRESSION RIDGE REGRESSION from the Menu bar to display the Analysis dialog box Enter Bodyfat into the Dependent Variable text box Enter Triceps Thigh and Midarm into the Independent Variables text box see Figure 10 1 Click OK to display the Analysis Summary and the Ridge Trace in the Analysis window Maximize the Analysis Summary see Figure 10 2 93 Ridge Regression Dependent Variable body fa Independent V anables Select weights W Sort Cancel Delete Transform Help Figure 10 1 Completed Analysis Dialog Box Bidee Reqression E amp Rw s ependent variable body fat umber of complete cases 0 odel Results for Ridge Parameter Variance Inflation Factor 117 055 d 33409 708 843 18606 1lo4_606 Z 85568E5 564 343 80 1359 percent B Squared adjusted for d f 756 4113 percent Standard Error of Est 47998 ean absolute error 1 88563 urhin Watson statistic F_f47591 Figure 10 2 The Analysis Summary E Ridge Regression body fat OF x 94 The Analysis Summary displays the natural Unstandardized regression coefficient estimates which correspond to the ridge parameter theta 0 The large variance inflation factors are due to high correlation among the three independent variables Triceps Thigh and Midarm The c
7. Calibration Models Absorbence versus Concent Calibration Models measured Abzorbence factual Concent Linear model Y a t O 0234693 1 70435 Slope 41 6667 2 34594 lz 4529 0 0011 Source cum of Squares Df Mean Square F Ratia P Value odel 0 114583 l 0 114583 155 08 0 0011 Bezidual O 00221667 3 0D 000738885 Total Corr 0 1168 4 x Fisure 6 2 The Analysis Summary Results 2 Click the Include Constant check box to turn the option off as shown in Figure 6 3 S Click OK to recalculate and redisplay the data with the constant removed from the model see Figure 6 4 The results are significant so you will create a plot of the model and look at the calibration line Plotting the Model 1 Minimize the Analysis Summary and maximize the Plot of Fitted Model see Figure 6 5 Notice that when you remove the constant from the plot the prediction limits are not parallel instead the lower values for the concentration are slightly 40 Calibration Model Options Type of Model Linear E sponential Reciprocal Reciprocal Double Reciprocal Logarithmic 75 Multiplicative Square Root X Square Root 5 Curve Logistic Log Frabit 99939799 999 io Figure 6 3 The Calibration Model Options Dialog Box S Calibration Models Absorbence versus Concent Calibration Models messured Abzsorbence factual Concent Linear model Y a b X O 552937 0 532937 0 00436301 O 001090
8. SEPTEMBER 1999 Introduction The manual of tutorials for the Advanced Regression analyses in STATGRAPHICS Plus is broken into two parts Part I consists of five individual tutorials that all pertain to the General Linear Models Analysis Part II consists of six individual tutorials one each for the remaining analyses For information about advanced regression in general see the section Overview of the Model Building Process in Chapter 1 of the online Advanced Regression User Manual as well as the online help system Tutorials in Part The tutorials for the General Linear Models Analysis are Using Two Covariates in a Two Way Analysis of Variance Using MANOVA and Entering User Specified Contrasts Using Nested and Crossed Factors in a Model Creating and Using a Split Plot Design Creating and Using Repeated Measures Designs TUTORIAL 1 Using Two Covariates in a Two Way Analysis of Variance This tutorial uses sample data to illustrate using two covariates 1n a two way analysis of variance The purpose of the tutorial 1s to e illustrate how to use the Select text box on the General Linear Models Analysis dialog box to select or remove a random sample e create an Interaction Plot e create a Table of Least Squares Means and a Means Plot e test for differences among group means To begin the tutorial open STATGRAPHICS P us and the Cardata data file Completing the Analysis Dialog Box 1 Choose SPECIAL ADVANC
9. fi Skip Rotate Asis Labels S LLU NoPower Log E Hold Number of Coefficients Figure 8 10 Mallows Cp Plot with the X Axis Rescaled 76 Optional Exercise As an optional exercise generate the Adjusted R Squared R Squared and MSE plots All three plots illustrate that adding the fourth variable provides little or no improvement References Draper N and Smith H 1981 Applied Regression Analysis second edition New York John Wiley amp Sons Mallows C L 1973 Some Comments on Cp 7echnometrics 15 661 675 Mallows C L 1995 More Comments on Cp Technometrics 37 362 372 Montgomery D C and Peck E A 1992 Introduction to Linear Regression Analysis second edition New York John Wiley amp Sons Myers A H 1990 Classical and Modern Regression with Applications second edition Belmont California Duxbury Press Neter J Kutner M H Nachscheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc TI TUTORIAL 9 West Virginia Mining Excavation Study This tutorial 1s adapted from Myers 1990 who used data collected by the Mining Engineering Department and analyzed by the Statistical Consulting Center at Virginia Polytechnic Institute and State University Blacksburg Virginia 1982 In the study Myers noted that a major problem connected with mining projects was ground sinking above the excavation or subsi
10. Figure 9 7 Analysis Summary Results Recalculated Using New Iterations Interpreting Data on Plots Now you will create a Plot of Fitted Model to see if there is a relationship between the Drawangl and Width variables 84 Minimize the Analysis Summary then maximize the Plot of Fitted Model Plot see Figure 9 8 lele v D El Plot of Fitted Model 37 2 depth 750 0 oh 29 s 25 i Ao oo 17 13 410 450 490 530 570 610 width Fisure 9 8 Plot of Fitted Model The plot shows that the relationship between the Drawangle and Width variables is almost linear over the range of Width when Depth is held at 750 which is the middle of the Depth range Now you will use the options to plot the function versus the other variable Click the right mouse button then the left on Pane Options to display the Plot of Fitted Model Options dialog box Click the Depth and Width check boxes to select and deselect them respectively see Figure 9 9 Notice that when you click the check boxes the Low High and Hold text boxes switch between active and inactive Click OK to display the Plot of Fitted Model see Figure 9 10 The plot 1s nonlinear but shows a monotonic decrease 1n the variable Drawangl as Depth increases over the range of data Now you will create a Response Surface Plot to see both of these effects at the same time 85 Plot of Fitted Model Options Low High Hold width jn gn fo Cancel I depth a Reo fa NEN
11. Influential Data and Sources of Colinearity New York John Wiley and Sons Chatterjee S and Price B 1991 Regression Analysis by Example second edition New York John Wiley amp Sons Draper N R and Smith H 1981 Applied Regression Analysis second edition New York John Wiley amp Sons Durbin J and Watson G S 1951 Testing for Serial Correlation in Least Squares Regression Biometrika 38 Montgomery D C 1991 Design and Analysis of Experiments third edition New York John Wiley amp Sons Myers R H 1990 Classical and Modern Regression with Applications second edition Belmont California Duxbury Press Neter J Kutner M H Nachtsheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc Vogt W P 1993 Dictionary of Statistics and Methodology Newbury Park California Sage Publications 63 TUTORIAL 8 Illustrating Model Building Techniques This tutorial illustrates model building techniques by working through the surgical unit example in Neter et al 1996 The simple example is based on an exploratory observational study that contained four potential explanatory variables Limiting the number of potential explanatory variables helps to illustrate the process The focus of the example was predicting the survival rate for patients who were undergoing a particular type of liver surgery The hospital surgical unit randomly sele
12. L p ESSE E po ES 7 Ly ESSE L jm Emm L pom Em 7 SS ELE M pou EE E p ESSE M pem Em L Ee SS 7 SS SS E pm Em L p S SS 7 SS S Ss Figure 9 9 Completed Plot of Fitted Model Options Dialog Box 6 Click the Graphical Options button to display the dialog box then the Response Surface Plot check box and OK to display the plot in the second graphics pane T Maximize the plot see Figure 9 11 The plot shows the fitted surface Now create a Square Plot to see the results 1n yet another way 8 Click the right mouse button on the graphics pane then the left on Pane Options to display the Response Plot Options dialog box 9 Click the Square Plot check box leave the remaining options on the dialog box as they are see Figure 9 12 10 Click OK to replace the Square Plot with the Surface Plot see Figure 9 13 86 mw fia Plot of Fitted Model width 5 10 0 600 200 1200 1500 depth Fisure 9 10 Plot of Fitted Model Honlnear Regression drawangl B zu ED Fisure 9 11 The Response Surface Plot 87 Response Plot Options Type Surface C Contour Surface Horizontal Division Ta Vertical Division E fio cUm Contours Below Painted Regions TES ho IS wire Frame Resolution ah level tal E C Contoured Figure 9 12 Completed Response Plot Options Dialog Box Pw Square Plot for drawangl 11 5882 15 6348 30 Z 787 Fi
13. in a two way treatment structure based on each person s comfort level The participants were each randomly assigned to three of nine available environmental chambers numbered 1 to 3 for each of the three temperatures In the tutorial you will analyze the environmental chambers nested within temperatures as well as the effects of the Temperatur Gender temperature interaction which 1s a between person comparison Before you begin open STATGRAPHICS Plus and the Comfort data file Completing the General Linear Models Analysis Dialog Box 1 Choose SPECIAL ADVANCED REGRESSION GENERAL LINEAR MODELS from the Menu bar to display the General Linear Models Analysis dialog box Enter Comfort into the Dependent Variables text box Enter Chamber Temperatur and Gender into the Categorical Factors text box see Figure 3 1 Click OK to display the GLM Specification dialog box see Figure 3 2 19 General Linear Models Dependent Variables Comfort Temperatur Categorical Factors Juantitative Factors HD 3 iz eights Select W Sort Cancel Delete Transform Help Figure 3 1 The Completed General Linear Models Analysis Dialog Box GLH Model Specification Factors Effects A Chamber B Temperatur C Gender hd Delete Help Figure 3 2 GLM Model Specification Dialog Box 20 10 11 Creating and Analyzing Nested Effects You can u
14. see Figure 10 7 Figure 10 7 makes it easier to see the meaning of the stabilized coefficients the lines for the three variables Triceps Thigh and Midarm that become horizontal and parallel As an optional exercise you can use the Ridge Trace Options dialog box to change the coefficients to Unstandardized and to compare the graphical results with the results in the Regression Coefficients Table Now create the Variance Inflation Factors Plot 97 8 943 50 559z 16 9516 S8 5 s33z 5 1471 3 4855 2 64337 1 365811 1 56979 L l 29696 10255 955565 242679 7E4077 623241 985507 365511 349091 935291 564 343 40 4453 13 7247 e 30457 98127 Z32062 76398 45408 Z37769 08054 362726 872053 300714 743517 cQoodocdocrmPrimbPsrmm rs hm oe 2 7 644 Lbl How Variable triceps midarm thigh Coefficient 0 02 0 03 Ridge parameter Figure 10 7 Ridge Trace As an optional exercise you can use the Ridge Trace Options dialog box to change the coefficients to Unstandardized and to compare the graphical results with the results in the Regression Coefficients Table Now create the Variance Inflation Factors Plot 98 10 11 Click the Graphical Options button to display the dialog box then click the Variance Inflation Factors option and OK to display the plot Maximize the plot see Figure 10 8 mem w
15. then click the Predictions check box and OK to display the table Maximize the table The program calculated the predictions using the default values but you decide to change them to shorten the table and to eliminate the display of prediction performance results for the tails of the logistic curve Click the right mouse button then the left on Pane Options to display the Predictions Options dialog box Enter 0 2 into the From text box 0 8 into the To text box accept the default in the By text box then click the All Values option to change the values that will display in the Predictions Table Accept the default 1n the Confidence Level text box see Figure 11 8 Predictions Options Cutoff Display Dk From All Values Cancel Ue C Forecasts Only Help dii Confidence Level To Buy 0 8 5 e 002 es y Figure 11 8 Completed Predictions Options Dialog Box Click OK to recalculate and redisplay the Predictions Table see Figure 11 9 The Predictions Performance Table shows how well the function performs at various cut off values when you use it to predict True or False Success Failure for the estimation data If a prediction of success is made whenever the fitted value 1s greater than 0 5 68 2 percent of the sample data are correctly classified To graphically view the model s capability of correctly predicting success or failure you will examine two more plots the Prediction Capability Plot and Pred
16. value of Type For example when Type Mutual the model reduces to Time 33 8384 0 101531 Size because the other two terms become zero Similarly when Type Stock the model reduces to Time 41 9696 0 101948 Size The R Squared statistic indicates that the model as it was fitted explains 90 5061 percent of the variability in the Size variable The Adjusted R Squared statistic which is more suitable for comparing models that have different numbers of independent variables 1s 87 5385 percent For an explanation of the other statistics read the explanation offered by the StatAdvisor Now compare the data from the two analyses Notice that the values in the second Analysis Summary for the R Squared Adjusted R Squared Standard Error of Estimates Mean Absolute Error and the Durbin Watson statistics have all improved over the values in the first analysis refer to Figure 7 2 Also notice that the value for the parameter Size Type Stock 0 000417141 is very close to zero and has a p value of 9821 which means that it 1s an unnecessary term the slopes are virtually equal 57 To test for statistically significant differences between the two intercepts and the two slopes you will create a Conditional Sums of Squares Table 6 Click the Tabular Options button then the Conditional Sums of Squares check box and OK to display the table in the text pane f Maximize the table see Figure 7 8 1168 17 1188 1 316 246
17. 316 246 O 00570841 O 00570841 1504 42 Figure 7 8 The Conditional Sums of Squares Table Showing the Statistically Significant Differences among the Intercepts Refining the Model The p value for the intercepts is less than 0 01 which indicates that there is a statistically significant difference among the intercepts at the 99 percent confidence level However the p value for the slopes is greater than 0 10 which indicates that there is not a statistically significant difference between the slopes for the two values for the Type variable at 90 percent or higher confidence level Because the slopes are not significantly different it makes sense to simplify the model by forcing equal slopes 1 Click the right mouse button on the Conditional Sums of Squares pane then the left on Analysis Options to display the Comparison of Regression Lines Options dialog box 2 Click the Assume Equal Slopes check box then OK to recalculate and redisplay the table with slopes equal see Figure 7 9 58 Comparison of Regression Lines time versus size by type US am E e n CU NNI Further ANOVA for Variables in the Order Fitted 1188 17 1188 17 316 246 B31l6 246 1504 41 Figure 7 9 The Conditional Sums of Squares for the Assume Equal Slopes Option The conclusion for this section of the tutorial is that only the intercepts differ significantly therefore you can conclude that there 1s an additive relationship due t
18. 4 01521 l0 3379 27 5614 16 3156 5 20 Z 17 476 4 01868 5 85673 26 0952 14 6206 6 31 0 31 2644 4_76763 z lllz 40 4175 27 1974 7 30 0 31 1756 4 25148 22 0571 40 2942 27 1e72 8 32 0 31 0845 4 23574 21 3955 40 1693 27 1735 E 76 6 21 99159 3 99178 13 4304 20 5534 13 5303 lu l5 1 21 0356 4 00096 lz 4b5 74 29 6195 18 5094 ll 30 0 31 0972 4 23704 22 0047 40 1798 27 1751 lz 13 5 13 0 3 992686 4 43605 21 5639 10 5302 l3 26 6 22 5553 3 98565 14 0363 31 1336 u l1e6598 l4 25 0 z4 3655 3 96823 15 85785 32 8799 22 0891 l5 ZO 4 21 0481 4 00087 12 467 29 6291 16 5155 16 15 0 22 7537 3 98383 14 7091 31 2565 20 3516 Figure 9 16 The Recalculated Reports Table The table presents the results from the nonlinear regression equation The conclusions show the confidence limits to be somewhat wide so you will save these results for use after you complete further research Saving the Results What is noteworthy here is that you can save the Coefficients the parameter estimates In addition you can save the function as a character variable so you will be able to edit and use it in future analyses Click the Save Results button the fourth button from the left to display the Save Results Options dialog box Click the check boxes for the following options Predicted Values Lower Limits for Predictions Upper Limits for Predictions Coefficients and Function You want to save the function so you will change the name in the Target Variables
19. DataSheet minimize it 3 Heart sf a gt nn BERE ERECTO 70 8 66 83 60 a2 66 60 4 TL T8 TO 15 63 og 19 al 66 e e ww M R e co M RB e c M Ro o4 c M RB b 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 3 Figure 5 1 The DataSheet for the Heart Data File Completing the General Linear Models Analysis Dialog Box 1 Choose SPECIAL ADVANCED REGRESSION GENERAL LINEAR MODELS from the Menu bar to display the Analysis dialog box 2 Enter Response into the Dependent Variables text box 3 Enter Drug Person and Time into the Categorical Factors text box see Figure 5 2 4 Click OK to display the GLM Model Specification dialog box Entering the Model Effects Now you will specify the model effects B A which is Person within Drug However first you will add the Time Drug interaction C A because you suspect that 1t might also be 1mportant 1 Position and click the mouse pointer under C in the Effects list box 32 General Linear Models drug person response ie time Categorical Factors Dependent Variables gt t Juantitative Factors BH eights Select Cancel Delete Transform Help Figure 5 2 The Completed General Linear Models Analysis Dialog Box Click C in the Factors list box then the arrow button to move the factor to the Effects list box Click the asterisk Cross button to place the asterisk to th
20. Plot of Fitted Model Options dialog box Click the Confidence Limits check box to turn 1t off then click the X predictions option enter 0 352 into the At text box and accept the defaults for the remaining options see Figure 6 8 Click OK to plot the prediction limits and the concentration values and redisplay the plot see Figure 6 9 The results shown in the plot are fine but the chemist decides it would be helpful to have a table that lists several predictions at the same time To do this create a Predictions Table 43 Plot of Fitted Model Options Include W Prediction Limits Cancel OK Cancel Confidence Limits Mean Size or Weight pes Figure 6 8 Completed Dialog Box for Predicting X S Calibration Models Absorbence versus Concent s 0 s i Absorbence D O0756262 0005375490 400974975 12 X 0 001 Concent Figure 6 9 The Plot of Fitted Model with Confidence Limits and Prediction Lines Added Creating a Predictions Table 1 Click the Tabular Options button to display the dialog box then click the Predictions check box and OK to display the Predictions Table in the second text pane Maximize the table see Figure 6 10 44 S Calibration Models Absorbence versus Concent B awm EE de Jon maj nep o0 ml Predicted Values for Y Prediction Limits Confidence Limits 0 09530894 O 000650293 0 185529 0 081396
21. R H 1990 Classical and Modern Regression with Applications second edition Belmont California Duxbury Press Neter J Kutner M H Nachtsheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc 112
22. anables in Model Minimum Maximum Figure 8 2 Completed Dialog Box Click OK to remove the single variable models from the analysis and to redisplay the summary see Figure 8 3 E Regression Model Selection logsurvive OF x xe sx S ONT S Li Row Independent variables A clotting B proqnost C enzwyie D liver umber of complete cases umber of models fit 16 odel Results Included Variables U 437717 z7559265 367813 0145707 z7z9e63 z4a4z 3 00219775 3 033902 0278101 574 537 Figure 8 3 The Analysis Summary Redisplayed The Analysis Summary lists the results of fitting various multiple regression models to describe the relationship among the dependent variable and the different possible subsets of explanatory variables 71 10 The models that were fit contain all possible combinations of two to four variables the four single model variables were removed To determine which of these models is best you will use the other tabular options Click the Tabular Options button to display the dialog box click the All button to choose the remaining options then OK to display the three tabular options in the text panes The remaining options are Best Adjusted R Squared and Best Cp Maximize the Best Adjusted R Squared Table the second text pane see Figure 8 4 E Regression Model Selection logsurvive odels with Largest Adjusted B Sqoiared adel PRezults A
23. are small Do this by creating a Variance Inflation Factors Table Creating a Variance Inflation Factors Table 1 Click the Tabular Options button then the Variance Inflation Factors check box and OK to display the Variance Inflation Factors Table in the fourth text pane To get a closer look at the values near 0 02 change the maximum value for the ridge parameter 96 Maximize the table Click the right mouse button on the text pane then the left on Analysis Options to display the Ridge Regression Options dialog box Enter 0 04 into the Maximum text box see Figure 10 5 Hidge Regression Options Hidge Parameter DK Current n Cancel Minimum Maximum Number of Divisions Figure 10 5 Completed Ridge Regression Options Dialog Box Click OK to recalculate the parameter and redisplay the table see Figure 10 6 Two items in Figure 10 6 are worth noting At a value of 0 02 for the ridge parameter the R Squared value of 77 26 indicates that there is little decrease in the R Squared statistic compared with the gain in stabilization of the coefficient estimates Most analysts would be willing to accept this tradeoff Also note that now all the values for the variance inflation factors are near 1 which is desirable To see these results graphically look first at the Ridge Trace then the Variance Inflation Factors Plot Minimize the Variance Inflation Factors Table then maximize the Ridge Trace
24. difficult 1f unsorted data had been used Determining How Groups Affect Regression In this portion of the tutorial you will determine exactly how regression lines differ between the groups 1 Minimize the Autocorrelation Function Plot 2 Maximize the Analysis Summary 3 Click the right mouse button then the left on Analysis Options to display the Comparison of Regression Lines Options dialog box 4 Click the Assume Equal Intercepts and Assume Equal Slopes check boxes to turn off both the options allowing a separate regression line to be fit for each group 5 Click OK to redisplay the Analysis Summary see Figure 7 7 The results describe the relationship among the variables Time Size and Type The equation for the fitted model 1s Time 33 8384 0 101531 Size 8 13125 Type Stock 0 000417141 Size Type Stock 56 L omparison of Regression Lines time versus size by type B awm ED Je l mj a nj 356 4821 e 8d4d25 lz S9z66 0 0939352 0l425566 6 58891 1168 17 Total Corr B Squared 70 6906 percent B Squared adjusted for d f 69 0623 percent Standard Error of Est 5 23149 ean absolute error 4 29764 urbin Watson statistic O 74607 Figure 7 7 The Analysis Summary Results for Fitting a Linear Regression Model where the term Type Stock is the indicator variable that takes the value 1 if true and O if false This corresponds to two separate lines one for each
25. one subject over a timeframe where time is one of the factors in the treatment structure In repeated measures designs you cannot randomly assign levels of factors which means that errors that correspond to the experimental units may have a covariance matrix This tutorial adapted from Milliken and Johnson 1984 illustrates how you create and enter data for a repeated measures design how you enter model effects change error terms and create the subsequent report and plot The Milliken and Johnson investigation examines the effects of three drugs which were administered to eight subjects The researchers measured the heart rate for each subject every five minutes for four time intervals The larger experimental unit 1s the subject the smaller 1s the time interval Before you begin open STATGRAPHICS Plus and the Heart data file When the DataSheet appears notice how the entries are coded by scrolling through the file Notice that the Drug column contains three different drugs ax23 bww9 and the control see Figure 5 1 In the analysis the Person variable 1s nested within the Drug factor This is because Subject 1 for the ax23 drug is not the same as Subject 1 for the bww 9 drug or the same as Subject 1 for the control drug Although you could code each subject with a different number 1 to 24 coding them 1 through 8 within each drug then specifying them as a nested factor in the model is faster and easier After you view the
26. structures The model is built by incorporating models for each size of experimental unit Milliken and Johnson s example concerns yield in pounds for two varieties of wheat grown according to four fertility regimes The researchers divided the fields into two blocks each of which contained four whole plots To each of the four plots within each block they randomly assigned four fertilizing regimes to one whole plot Then they divided each whole plot into two parts subplots and randomly assigned each variety of wheat to one subplot within each whole plot Before you begin open STATGRAPHICS P us and the Wheat data file Completing the General Linear Models Analysis Dialog Box 1 Choose SPECIAL ADVANCED REGRESSION GENERAL LINEAR MODELS from the Menu bar to display Analysis dialog box Enter Yield into the Dependent Variables text box Enter Block Regime and Variety into the Categorical Factors text box see Figure 4 1 25 General Linear Models block Dependent Variables regime variety P weld es D F Categorical Factors block regime variety Juantitative Factors BH eights Select Cancel Delete Transform Help Figure 4 1 The Completed General Linear Models Analysis Dialog Box Creating a Split Plot Model 1 Click OK to display the GLM Model Specification dialog box You will enter two interaction effects A B and B C Position the mouse
27. to select it Click the Scheffe option and accept the defaults for the remainder of the options see Figure 1 10 Multiple Comparisons Options Type Method OK All Pairwise Means LSD um C Versus Control Tukey HSO Cancel C User Specified Scheie Banferrani Help Factor Multivariate t ear Student Hevwman Kaeulz Duncan Dunnett Control Level Confidence Level Figure 1 10 Completed Multiple Comparisons Options Dialog Box Click OK to redisplay the table see Figure 1 11 JE EE de J nd Ci m nultiple Comparisons for Mpg by Origin ethod 95 0 percent Scheffe Homogeneous Groups 27 S425 eo 9249 31 8013 4 25902 1 38254 1 61927 2 87638 e 18358 Figure 1 11 Multiple Comparisons Tests for Mpg by Origin The table shows that when you use the Scheffe Method for the comparison and group the Mpg factor with the Origin factor they become two homogenous groups Optional Exercise For an optional exercise continue the tutorial above creating an Unusual Residuals Table which will confirm that the model although good for illustrative purposes 1s not necessarily the best one to use References Milliken G A and Johnson D E 1984 Analysis of Messy Data Volume 1 Designed Experiments New York Van Nostrand Reinhold 10 TUTORIAL 2 Using MANOVA and Entering User Specified Contrasts The purpose of this tutorial 1s to introduce you to MANOVA in the General Linear M
28. 02 0 03 Ridge parameter Figure 10 10 Redisplay of the Variance Inflation Factors Plot 100 Estimating Regression Coefficients Using New Values 1 Minimize the Variance Inflation Factors Plot then maximize the Analysis Summary 2 Click the right mouse button then the left on Analysis Options to display the Ridge Regression Options dialog box 3 Enter 0 02 in the Current text box but make no other changes see Figure 10 11 4 Click OK to recalculate and redisplay the Analysis Summary see Figure 10 12 Ridge Regression Options Hidge Parameter Current 0 02 Minimum Maximum 0 4 e 002 Number of Divisions CENE Figure 10 11 Completed Ridge Regression Options Dialog Box Compare the values for the R Squared Adjusted R Squared Standard Error of Estimate Mean Absolute Error and the Durbin Watson statistics in Figure 10 12 with those in Figure 10 2 Using the value of 0 02 for the ridge parameter gives error statistics that are slightly inferior for biased estimates however because the estimates for the coefficients have been stabilized the shght changes are acceptable Conclusions The tables and plots you have created all help you find a reliable value for the ridge parameter The results indicate that 1n spite of the ill conditioned data you could use the model to estimate skin fold measurements to estimate body fat 101 E Ridge Regression body fat IP x le SEJ ep 8B
29. 1 into the Maximum Number per Subset text box to include one model for each subset size then click OK to redisplay the table see Figure 8 6 The results show the models that give the largest Adjusted R Squared values Values are included for the best model of each size The best model contains three variables A Clotting B Prognost and Enzyme Minimize the table then maximize the Best Cp Table see Figure 8 7 The results show the best for each size again the ABC model rates best Note that changing the minimum subset size for the Best Adjusted R Squared Table automatically changed it in the Best Cp Table 73 15 16 E Regression Model Selection logsurvive OF x odels with Largest Adjusted BR Soquared odel Results 27 234 27 2362 B1 2945 Lh How EE Included Variables 3 033902 5 0 253 624 Figure 8 6 Recalculated Best Adjusted R Squared Results E Regression Model Selection logsurvive OF x odels with Smallest Cp odel Rezultz 27 234 27 2362 1 2945 Lh fo Row NE Included Variables 3 03902 5 0 263 624 Figure 8 7 The Best Cp Table Shown with the Recalculated Results Sometimes using the plots will help to confirm which model is best Create a Mallows C Plot to test this Click the Graphical Options button to display the dialog box then click the Mallows C Plot check box and OK to display the plot in the second graphics pane Maximize the p
30. 14 141 lz 4124 534 753 133 688 235 674 117 2337 42 4812 4z 48lz 293 061 293 061 Figure 1 5 The Analysis Summary with the Random Sample Removed Using the full set of data the Displace variable is significant at the 90 percent confidence level but not at 95 percent Creating an Interaction Plot 1 Click the Graphical Options button to display the dialog box then click the Interaction Plot check box and OK to display the plot in a graphics pane Maximize the plot see Figure 1 6 mpg wl Li mend E Interaction Flot Mpg Figure 1 6 The Interaction Plot The plot contains one line for each level of Year which illustrates the change in estimated miles per gallon ratings over five years The three lines rise and fall together which confirms that the interaction effect 1s not strong Creating a Table of Least Squares Means and a Means Plot The data are made up of different observations in various combinations of unbalanced data The best estimates of marginal means in unbalanced data are known as least squares means You will create a Table of Means and a Means Plot to see the values for the least squares means 1 Click the Tabular Options button then the Table of Means check box and OK to display the table in the second text pane 2 Click the Graphical Options button then the Means Plot check box and OK to display the plot 1n a graphics pane d Maximize the Table of Means see Figure 1 7 G
31. 41 0 00431548 00857535 008593559 O 00634746 0 0108403 O 01074244 O O083545 O O0131302 Figure 6 12 Predictions Table for X 46 3 4 Click the Save Results button on the Analysis toolbar the fourth button from the left to display the Save Results Options dialog box Click the Model Statistics check box under the Save options and type MYMODEL in the first Target Variables text box see Figure 6 13 Save Results Options Save Target Variables DK i Model Statistics MYMODEL 3rd Fredicted Values PREDICTED Lower Limits for Predictions DWERPLIMS Upper Limits for Predictions JUPPERPLIMS Lower Limits for Forecast Means DWERELIMS Upper Limits for Forecast Means JUPPERCLIMS Residuals RESIDUALS Studentized Residuals ISRESIDUALS Leverages LEVERAGES Coefficients COEFFS Figure 6 13 Completed Save Results Options Dialog Box Click OK to save the column of numbers that define the model Click FILE SAVE SAVE DATA FILE from the Application toolbar to save the file Using Previously Saved Data Two weeks after completing the first analysis the chemist repeated the new analysis on a new sample which resulted in an absorbency measurement of 44 He now wants to use the calibration line that was saved in the above analysis to determine the concentration of cuprammonium ion in the new sample Restore the Calib data file 47 Notic
32. 7 l1 047s85z 0 559537 0 443085 DO 673993 0 48835 OD 628693 Figure 6 10 Predictions Table for Y Notice that the table lists predictions for Y for the Upper and Lower values of X A regression equation is used to predict either X or Y given a value for the other variable You will predict X values for six different values of Y Click the right mouse button on the text pane then the left on Pane Options to display the Predictions Options dialog box Click X in the Prediction portion of the dialog box to change the prediction limits Enter 0 1 2 3 4 and 5 in the first six Predict At text boxes Accept the defaults in the Confidence Level and Mean Size or Weight text boxes see Figure 6 11 The program calculates and displays the values for the Predictions Table see Figure 6 12 The table shows predictions for the concentrations of six absorbency readings as well as for the prediction limits 45 Saving Statistics for a Calibration Line The chemist is now satisfied with the results and wants to save the calibration line for later use Predictions Options a Confidence Level DK Y vow 95 4 Cancel Mean Size or Weight poems Predict at Figure 6 11 Completed Dialog Box for Predictions Options S Calibration Models Absorbence versus Concent Prediction Limits 0 n001397008 00197008 OOF14847 0 000159995 00413695 nd4z96394 0 0 0225427 n 633961 O006445
33. 75 Figure 6 4 The Analysis Summary Recalculated After Removing the Constant 41 S Calibration Models Absorbence versus Concent Fer hes Kee Absorbence 12 X 0 001 Concent Figure 6 5 The Plot of Fitted Model closer together than they were for the higher values This is particularly evident for the confidence limits which you will now add to the plot Click the right mouse button on the graphics pane then the left on Pane Options to display the Plot of Fitted Model Options dialog box Click the Confidence Limits check box and accept the defaults for the other options on the dialog box see Figure 6 6 Plot of Fitted Model Options Include OF v Prediction Limits OK eau ruse dalreaue saad oinvacee Pe CON e II No Cancel Confidence Level Hep oo Mean Size or Weight pes Figure 6 6 Completed Plot of Fitted Model Options Dialog Box 42 Click OK to display the plot with the confidence limits added see Figure 6 7 S Calibration Models Absorbence versus Concent mee CV aaj el Plot of Fitted Model Absorbence D J 4 8 id i5 X 0 001 Concent Figure 6 7 Plot of Fitted Model with the Confidence Limits Added Instead of using a laborious hand calculation to calculate the prediction for Y you will again use the Plot of Fitted Model Options dialog box Click the right mouse button on the graphics pane then the left on Pane Options to display the
34. ANOVA Statistics As the conclusion for this tutorial you will add the two remaining response variables then create MANOVA statistics The effect of each factor on the dependent variables 1s simultaneously quantified by the MANOVA 15 Fer Ke Ma es ES 8 Row ET Means and 95 0 Percent LSD Intervals Sand Location Figure 2 6 The Means Plot Click the Return to Analysis Dialog Box button on the Analysis toolbar to redisplay the Analysis dialog box Enter Silt Clay and Sand into the Dependent Variables text box Enter Location and Depth into the Categorical Variables text box see Figure 2 7 Click OK to display the GLM Specification dialog box Accept the defaults and click OK to redisplay the Analysis Summary and the Means Plot 1n the Analysis window Maximize the Analysis Summary Click the right mouse button on the Analysis Summary then the left on Analysis Options to display the General Linear Models Options dialog box Click the Include MANOVA check box and accept the defaults for the remaining options on the dialog box Click OK to add the MANOVA statistics and to redisplay the Analysis Summary 16 General Linear Models Clay Depth Location Sand Silt Dependent Variables Categorical Factors Juantitative Factors BH v eights Select m Cancel Delete Transform Help Figure 2 7 Completed General Linear Models Dialog Box As y
35. ED REGRESSION GENERAL LINEAR MODELS from the Menu bar to display the analysis dialog box As you complete the dialog box you will randomly choose a subset of observations that the program will use to estimate the model 1t will use the remaining complete observations to validate the model You will use the optional Select text box to hold out a random sample of about one third of the observations The two categorical factors you will use are Year and Origin Year contains five different values while Origin contains three The variables that represent the discrete groups of data are categorical If a categorical variable has n levels the program will create n J indicator variables Enter Mpg into the Dependent Variables text box Enter Year and Origin into the Categorical Factors text box Enter Displace and Weight into the Quantitative Factors text box These two quantitative factors are the covariates Click the mouse pointer in the Select text box then type Random 100 see Figure 1 1 General Linear Models Accel Dependent Variables Carmakers Cylinders n Displace Mm Horsepower Make Iz Model Categorical Factors Mpg Origin n Tear BIG D Origin Weight id ear Juantitative Factors E eights Select gt Random 00 Cancel Delete Transform Help Figure 1 1 Completed General Linear Model Analysis Dialog Box Click OK to display the GLM Model Specifi
36. alog Box for the Comparison of Regression Lines Analysis Click OK to display the Analysis Summary and the Plot of Fitted Model in the Analysis window Maximize the Analysis Summary Click the right mouse button on the Analysis Summary pane then the left on Analysis Options to display the Comparison of Regression Lines Options dialog box Click the Assume Equal Intercepts and Assume Equal Slopes check boxes to turn on both of the options which results in a single regression line see Figure 7 2 7 8 10 Comparison of Regression Lines Options i Assume Equal Intercepts 5 eg s i Assume Equal Slopes Cancel Help Figure 7 2 Comparison of Regression Lines Options Dialog Box Click OK to redisplay the Analysis Summary see Figure 7 3 E L omparison of Regression Lines time versus size by type US am ED 1 je nl mj NN 356 4821 2 84425 12 8266 0 0939352 O 0142566 1188 17 1188 17 492 2633 27 3685 B Squared 70 6906 percent B Squared adjusted for d f 69 0623 percent Standard Error of Est 5 23149 ean absolute error 4 29764 urbin Watson statistic O 74607 Figure 7 3 The Analysis Summary Showing Various Statistics The values from the regression analysis and the analysis of variance seem to be within normal ranges however note that the value for the Durbin Watson statistic is 0 74607 which indicates possible serial correlation the value 1s less than 1 4 To confirm th
37. cation dialog box shown in Figure 1 2 Notice that the names of the factors have been given letter designations for example A year Additionally the Effects are shown in the Effects list box You are interested in seeing the main effects and the interaction between the Year and Origin factors so you will enter A B into the Effects list box Click in the Effects text box on the line immediately under the D effect then type A B see Figure 1 3 Click OK to display the Analysis Summary and Scatterplot in the Analysis window then maximize the Analysis Summary see Figure 1 4 GLH Model Specification Factors Effects Figure 1 2 The GLM Model Specification Dialog Box GLH Model Specification Factors Effects Ej Delete Help Figure 1 3 The GLM Model Specification Dialog Box with An Interaction Added A B Note Because the program 1s randomly selecting observations the data and their interpretations will differ The interpretation 1s provided here to coincide with the results shown in this tutorial eee H m ver M Sum of Squares Mean Square F BRatio 4450 11 290 007 l ze s28 Total iCorr Type III Sums of Squares 25 118 21 9318 42 37 615 633 ear ricdin ff2 415 Besidual l ze s928 Fisure 1 4 The Analysis Summary for a Random Sample In Figure 1 4 the first ANOVA Table shows that the p value for the Mpg variable is less than 0 01 so there is a statistically significant re
38. cted 54 patients from each patient record they extracted the following preoperative information A Blood clotting test score B Prognostic index which included the patient s age C Enzyme function test score D Liver function test score This information makes up a pool of potential explanatory variables for a predictive regression model The response variable 1s Logsurv log survival time which was determined in a follow up study Because the pool of explanatory variables is small at this stage you can fully explore the relationships and possible strong interaction effects The researchers first prepared a Stem and Leaf display optional exercise for each of the explanatory variables This highlighted several cases as outliers with respect to the dependent variable and reminded the researchers that they would later need to examine these cases They examined the full model and decided to use a log transformation as the survival variable to allow for a first order model Next they produced a Scatterplot Matrix and a Correlation Matrix to check for multi collinearity and bias optional exercises To begin the analysis open STATGRAPHICS Plus and the Surgery data file then continue with the analysis Building the Model 1 Choose SPECIAL ADVANCED REGRESSION REGRESSION MODEL SELECTION from the Menu bar to display the Regression Model Selection Analysis dialog box 2 Enter Logsurv into the Dependent Variable text b
39. cy of a sample at a wavelength of 600 nm The chemist was quite sure there was a relationship between the absorbency reading of the spectrophotometer and the concentration of cuprammonium ion in the sample The relationship might have been influenced by other compounds present in the sample however he chose not to investigate those effects now Instead he will calibrate the test method by quantifying the relationship between absorbency Y and concentration X using a range of concentration that 1s likely to be found when deliveries of the solution are monitored Five samples of known concentration were prepared the absorbency of each was recorded and stored 1n a data file Before you begin open STATGRAPHICS Plus and the Calib calibration data file Determining the Relationship between Two Variables 1 Choose SPECIAL ADVANCED REGRESSION CALIBRATION MODELS from the Menu bar to display the Analysis dialog box The concentrations are known therefore you will use them for the X variable The chief chemist analyzed each sample with the spectrophotometer the Absorbence variable contains these measurements You will use them for the Y variable It is very important that you enter the variables 1n the correct text boxes do not reverse them 39 Removing Intercepts from a Model 1 Click the right mouse button on the text pane then the left on Analysis Options to display the Calibration Model Options dialog box
40. dels Dependent Variables Sarid A Categorical Factors Juantitative Factors HD a iz eights Select W Sort Cancel Delete Transform Help Figure 2 1 Completed General Linear Models Analysis Dialog Box E General Linear Models OF x Ce Row fa umber of dependent variables 1 umber of categorical factors Z umber of quantitative factors oO alysis of Variance for Sand 7468 93 533 4395 l524 78 46 2054 Total Corr_ Type III Sums of Squares 6750 18 2260 06 688 754 62 614 M k Figure 2 2 The Analysis Summary 12 The second ANOVA Table shows the results of testing the statistical significance of each factor as it was entered into the model The highest p value is 0 2399 which corresponds to the Depth variable Because the value is greater than or equal to 0 10 the term is not statistically significant at the 90 percent or higher confidence level You can read the interpretation for the remaining statistics in the StatAdvisor Testing Hypotheses Using the GLM Analysis What you really want to do is to compare the means for the Sand variable with different levels of the Location variable so you will create Multiple Range Tests 1 Click the Tabular Options button to display the dialog box then click the Multiple Range Tests check box and OK to display the table in the second text pane 2 Maximize the text pane see Figure 2 3 BS am ED
41. dence To make sure that existing structures did not collapse during an excavation a mining engineer was responsible for controlling the amount and distribution of the subsidence The amount and nature of the subsidence 1s affected by several factors the depth of the mine and the width of the excavation An important variable known as the angle of draw y was identified as an aid in characterizing the condition Myers defines it as the angle between the perpendicular at the edge of the excavation and the line that connects the same edge of excavation with the point on the surface for which there 1s zero subsidence In the study the engineers felt that the angle of draw should relate to the ratio of the width w of the excavation and the depth d of the mine They also knew that any relationship would be nonlinear You will repeat the analysis using the data collected at Blacksburg Virginia Begin the analysis by opening STATGRAPHICS Plus and the Mining data file Preparing for the Analysis 1 Choose SPECIAL ADVANCED REGRESSION NONLINEAR REGRESSION from the Menu bar to display the Nonlinear Regression dialog box Enter Drawangl into the Dependent Variable text box Move the mouse pointer inside the Function text box then type a 1 exp b width depth see Figure 9 1 Click OK to display the Initial Parameter Estimates dialog box 79 Honlinear Regression Cal 4 Dependent Variable depth Bawnd 7 drawa
42. djusted Included B Squared Variables 00219775 2 3 03902 00244081 5 0 00930606 g 161 649 0145707 233 624 O223134 0244703 z7z9e63 O2 75926 0278101 0367813 Figure 8 4 The Best Adjusted R Squared Table The results show the models sorted by Adjusted R Squared values with the four single variable models removed from the analysis This table indicates that the best model contains three variables A Clotting B Prognost and C Enzyme Minimize the Best Adjusted R Squared Table Maximize the Best Cp Table the third text pane see Figure 8 5 a 11 12 13 14 E Regression Model Selection logsurvive read ca CA aj T a odels with Smallest Cp odel Results Adjusted Included B Squared Variables 00219775 2 3 033902 002244081 5 0 00930606 P 161 649 l45707 E 253 624 zz3l134 d 451 2886 z4adz 3 50 z7z9e63 J 573 0278101 E F 574 5759 Figure 8 5 The Best Cp Table The results show the models sorted by the smallest or best values for the Mallows Cp statistic Look for models other than the full four variable model with C values that are close to p Now you will look at one model for each subset size Minimize the Best Cp Table then maximize the Best Adjusted R Squared pane Click the right mouse button on the table pane then the left on Pane Options to display the Best Adjusted R Squared Options dialog box Enter
43. e Comfort variable indicates that comfort level is not an additive function of the interaction effect Gender and Temperatur 22 3 General Linear Models Im x umber of dependent variables umber of categorical factors 3 umber of quantitative factors alysis of Variance for Comfort 243 972 22 1793 39 568667 1 65275 Temperatur l52 389 79 1944 Gender 3 36111 3 36111 Temperatur Gender 15 7222 7 86111 Figure 8 4 The Analysis Summary E m Lht Row Gender fil Temperatur Figure 8 5 The Interaction Plot Because the purpose of this tutorial was to introduce the concept of using nested and crossed effects no further interpretation of the results 1s included 25 here If you are interested in additional results read the comments provided by the StatAdvisor References Milliken G A and Johnson D E 1984 Analysis of Messy Data Volume 1 Designed Experiments New York Van Nostrand Reinhold 24 TUTORIAL 4 Creating and Using a Split Plot Design The purpose of this tutorial 1s to demonstrate how to specify a model and create proper tests and plots You will create and use a Split Plot design The tutorial 1s adapted from Milliken and Johnson 1984 When you are constructing a model they provide two 1mportant reminders about design and concept e recognize that there are different sizes of experimental units e identify the corresponding design and treatment
44. e predictions and redisplay the Forecast Table shown in Figure 7 13 The table displays the predicted values for the Time variable for the two new firms It shows predicted values for a mutual fund firm and a stock firm of each size Also shown are the prediction intervals for new observations at 95 percent and the confidence intervals for the mean of many observations at 95 percent 61 Predicted BO 7201 38 7755 2 84272 LO 8962 23 6999 31 7553 5256 Prediction Limits 23 1361 31 013 4 72907 3 47728 16 4562 24 4087 6 37394 14 4525 45 5381 10 4145 18 3191 30 9435 39 102 fO 67 73 e 7O095 Confidence Limits 27 3584 35 0243 O 495596 7 91755 21 193 29 9647 11 2553 19 4299 34 0817 42 5267 6 18134 13 8788 26 2067 34 546 15 753 Figure 7 18 The Results of the Recalculated Forecasts Now that you have examined all of the tabular data you will create a Plot of Fitted Model to view the results graphically Minimize the tabular options then maximize the Plot of Fitted Model see Figure 7 14 type E IItual Stock Figure 7 14 The Plot of Fitted Model The plot shows the two parallel regression lines one for each value of the Type variable If you look at the Residual plots again you will find that the earlier symptoms of bias have been resolved 62 References Belsley D A Kuh E and Welsch R E 1980 Regression Diagnostics Identifying
45. e right of the C in the Effects text box Click A in the Factors list box then the arrow button to move the factor to the Effects list box Position and click the mouse pointer next to the B factor in the Effects list box A Click the left parenthesis Nest button to place the parenthesis to the right of the B factor Click A in the Factors list box then the arrow button to move the factor to the Effects list box 33 Click the right parenthesis Nest button to move the parenthesis to the right of the A factor in the Effects list box Your computer screen should now look like the one shown in Figure 5 3 GLH Model Specification Factors Effects A drug n B persan gt B A Chime C Cross C4 se A on Nest LIF Cancel Delete Help Figure 5 3 The Completed GLM Model Specification Dialog Box Changing Error Terms 1 Click OK to display the Analysis Summary and Scatterplot in the Analysis window Maximize the Analysis Summary Click the right mouse button on the text pane then the left on Analysis Options to display the General Linear Models Options dialog box Click A in the Factor list box then the B A interaction in the Error term list box to display A B A 1n the Selections list box Click the B A interaction in the Factor list box then click None in the Error Term list box to display B A None in the Selections list box The two changes appear as A B A and B A No
46. e that the results you saved now appear in a new column titled MYMODEL Choose SPECIAL ADVANCED REGRESSION CALIBRATION MODELS from the Menu bar to display the Calibration Models Analysis dialog box Enter 44 into the Y Measured text box Enter MY MODEL into the Fitted Model Statistics text box Click the Predict X from Y button under the Action portion of the dialog box to turn 1t on see Figure 6 14 Calibration Models Absorbence Y Measured Concent E MY MODEL l Fitted Model Statistics pa J h3J J rj rj w Sort Action m Fit New Model t Predict from Y Cancel Delete Transform Help Figure 6 14 Completed Dialog Box for Predicting X from Y Click OK to display the Analysis Summary and the Plot of Fitted Model in the Analysis window Click the Tabular Options button to display the dialog box then the Predictions check box and OK to display the Predictions Table Maximize the Predictions Table see Figure 6 15 48 Calibration Models HH eee s L1 D Predicted Values for Figure 6 15 The Predictions Table for X Optional Exercises In real life situations a single measurement for each standard and each new sample would probably not provide adequately tight intervals 1 Continue the above tutorial using the Plot of Fitted Models Options dialog box and the Predictions Table option to determine the effect of increasing the number of t
47. emainder of the options on the dialog box and click OK to redisplay the plot BEB E je n mj mI a Means and 95 0 Percent LSD Intervals 34 po EO ag Mpg T8 re al l M Fisure 1 8 The Means Plot for the Year Variable Testing for Differences Among Group Means You can also test for significant differences among group means This 1s done for the various levels of each categorical factor 1 Click the Tabular Options button to display the dialog box then click the Multiple Range Tests check box and OK to display the table 1n the third text pane 2 Maximize the pane see Figure 1 9 To calculate the results a multiple comparison analysis 1s applied to the data to determine which means are significantly different The top portion of the table identifies the homogenous groups by using columns of Xs Within each column the levels containing Xs form a group of means for the statistically significant differences if any The bottom portion of the table shows the estimated differences between each pair of means An asterisk 1dentifies each statistically significant pair Now you will use Pane Options to compare the means among the three origins using Scheffe intervals x w Row a Figure 1 9 Multiple Comparisons for Mpg by Year Click the right mouse button on the text pane then the left on Pane Options to display the Multiple Comparisons Options dialog box Click the Origin factor
48. eneral Linear Models OR x B m EE je nl mj np Cd Table of Least Squares Means for Mpg ah ith 95 0 Percent Confidence Intervals Strid Lower Upper Level Count Mean Error Limit Limit GRAND MEAN 154 eg dz28 0 390247 85 6511 30 1946 ear 7S 36 25 6769 0 683919 24 3245 27 0233 79 z9 5 6694 1 04668 6 5997 30 7392 sO z9 31 56458 0 705528 30 1697 32 965 31 z9 29 9585 Oo 765401 eo d4b5 31 4721 az 3l 31 2445 O 985472 29 2953 33 19532 Origin 1 as 27 5423 0 448127 6 656 23 4264 El L 31 8013 0 836372 30 1474 33 4552 3 44 29 9249 0 731974 27 4775 30 3724 ear by Origin 78 l eg bB 548 0 825967 23 9156 27 1623 7S8 Z amp Eb SEI 1 47625 ez 8917 29 6553 al Figure 1 7 The Table of Least Squares Means for Mpg The least squares means in the table will differ from the simple means for each group Milliken and Johnson 1984 in their book Analysis of Messy Data provide detailed explanations about why least squares means are preferred for analyzing unbalanced designs Minimize the Table of Means and maximize the Means Plot see Figure 1 8 The plot substantiates the results shown in the Table of Least Squares Means for the Year factor To see the Means Plot for the Origin factor use the Means Plot Options dialog box Click the right mouse button on the graphics pane then the left on Pane Options to display the Means Plot Options dialog box Click the Origin factor to highlight it accept the defaults for the r
49. he Advanced Regression analyses in STATGRAPHICS Plus contains Part II tutorials for these analyses Calibration Models Comparison of Regression Lines Regression Model Selection Nonlinear Regression Ridge Regression Logistic Regression To use the tutorials for the General Linear Models Analysis see Part I of this manual Tutorials in this Manual The tutorials for the remaining analyses are Fitting a Calibration Line Calibration Models Analysis Analyzing an Insurance Innovation Study Comparison of Regression Lines Analysis Illustrating Model Building Techniques Regression Model Selection Analysis West Virginia Mining Excavation Study Nonlinear Regression Analysis Studying the Relationship of Body Fat to Explanatory Variables Ridge Regression Analysis Analyzing Coupon Redemption Rate with Logistic Regression Logistic Regression Analysis TUTORIAL 6 Fitting a Calibration Line This tutorial 1s adapted from a study reported in Caulcutt and Boddy 1995 in which four analysts participated Their employer Indichem Ltd uses large quantities of ammonia solution that they purchase from several vendors Although a new and less expensive supply source looks promising the chief chemist suspects that this ammonia solution might be contaminated with cuprammonium He devised a spectrophotometric method for determining the concentration of cuprammonium ion in the solution which involves measuring the absorben
50. hows the fitted logistic regression model and the 105 10 11 proportions of coupons that are predicted for redemption at each of the X levels Minimize the plot then maximize the Logit Plot see Figure 11 4 Eerie Ka ee Sd NISI Li Row a Plot of Fitted Model 9 0 4 S E i 0 6 e 1 1 1 6 2 D 5 10 15 20 25 30 reduction Figure 11 4 The Logit Plot The plot shows a straight line the logit portion of the response function The line is commonly used to find the median effective dose which for this example would be the coupon value that has a fifty fifty chance of being redeemed Notice that the scaling on the plot is arranged according to the default settings You will change the scaling so the plot will have a grid line at zero on the Y Axis Place the mouse pointer on one of the points on the Y Axis Click the left mouse button to place markers at each end of the axis then click the right button on Graphics Options to display the Y Axis Tab page Enter 2 into the From text box J into the To text box and accept the default in the By text box as well as the remainder of the options see Figure 11 5 Click OK to rescale the axis and redisplay the plot see Figure 11 6 The line 1s 2 04435 968536 Reduction You can easily see the median effective value by finding the price reduction that corresponds with logit p 106 Graphics Options Layout Grid Lines Points Top Title
51. ictions Histograms Click the Graphical Options button to display the dialog box then the Prediction Capability Plot and Prediction Histograms check boxes and OK to display the two plots in the third and fourth graphics panes 109 Fitted Standard Lower 95 0 CL Upper 35 0 CL Error for Prediction for Prediction l73621 0177751 117052 2320189 zZb54z6l 174158 198837 309686 356212 0165034 303691 408733 473107 0186146 413867 532347 202799 z 75 0 0z 615261 790317 Figure 11 9 The Recalculated Predictions Performance Table Maximize the Prediction Capability Plot the third graphics pane see Figure 11 10 B Logistic Regression p Im x gw ER EET I TED e o e 5 Prediction Capability Plot for p percent correct Figure 11 10 The Prediction Capability Plot The plot shows a summary of the prediction capability of the fitted logistic model The model first predicts the response using the information in each row of the file If the predicted value is larger than the cutoff the response 1s 110 predicted to be true If the predicted value is less than or equal to the cutoff the response 1s predicted to be False The plot shows the percent of observed data that were correctly predicted at each cutoff value For example using a cutoff equal to 0 36 60 4592 percent of all the True responses were correctly predicted while 73 1908 percent of all the False responses
52. idence Level Ha s5 Figure 9 4 Completed Nonlinear Regression Options Dialog Box Click OK to recalculate and redisplay the Analysis Summary using the new method see Figure 9 5 Notice that the Analysis Summary contains the name of the estimation method that was used the reason for the estimation stopping the number of 82 G3 259 E E ep l eT a ependent variable drawangl rn Independent variables width depth Function to be estimated a i l expi bhb iwidth depth 1 Initial parameter estimates a 35 0 h 1 0 Estimation method Steepest descent Estimation stopped after maximum iterations reached umber of iterations 31 umber of function calls 32 Estimation Results Asymptotic 95 04 ASyuptotic Confidence Interval Parameter Estimate Standard Error Lower Upper 32 5503 2 663335 26 666 38 2926 1 49197 O 2943521 0 860708 2 12322 e Figure 9 5 The Analysis Summary with Recalculated Results iterations completed and the number of function calls An important factor in the report is the primary reason for the estimation stopping In this case the estimation stopped after the program reached the maximum number of iterations therefore the methods did not converge Now return to the Nonlinear Regression Options dialog box where you will increase the maximum number of iterations Click the right mouse button on the text pane then the left on Analysis Options to display
53. io 2 04435 0 160976 0 0963336 O 00854912 1 10168 147 256 2 1l66de Total fcorr 1459_463 Percentage of deviance explained by model 98 5503 djusted percentage 95 2574 Figure 11 2 Analysis Summary Results 104 confidence level a very good fit Additionally the p value for the residuals is greater than or equal to 0 10 which indicates that the model is not significantly worse than the best possible model at the 90 percent or higher confidence level The estimated odds ratio indicates that the odds of a household redeeming a coupon increase by about 10 percent with each 1 00 decrease in price allowed by the coupon see Neter et al 1996 The logistic response function 1s e 2 04435 0968336 Reduction e 2 04485 0968836 Reduction where 2 04435 0968336 Reduction is called the logit You can see the logistic curve and the logit on the Plot of Fitted Model and the Logit Plot Click the Graphical Options button to display the dialog box then click the Logit Plot check box the Plot of Fitted Model automatically displays and OK to display the plots 1n the first and second graphics panes Maximize the Plot of Fitted Model see Figure 11 3 EXEZESLEUP NEEI e ones Plot of Fitted Model 1 0 3 0 6 cu 0 4 0 2 D D 5 10 15 20 25 30 reduction Figure 11 3 The Plot of Fitted Model The plot shows that at a price reduction of 25 the predicted redemption rate is 60 percent It also s
54. is you will create a Residual versus Row Number Plot and an Autocorrelation Function Plot Click the Graphical Options button to display the dialog box then the Residual Plots check box and OK to display the Residual Plot in the second graphics pane Maximize the plot Click the right mouse button then the left on Pane Options to display the Residual Plots Options dialog box 53 11 12 13 Accept the default options Studentized Residuals and Scatterplot move to the Plot versus list box and choose Row Number Your screen should look like that shown in Figure 7 4 Residual Plots Options Flat C Residuals Studentized Residuals Direction Horizontal f Vertical UF Cancel ul Help Fitted Line None Type Scatterplot C Normal Probability Plat f Using Guartiles Autocorrelation Function Using Least Squares Plot versus Humber of Lage Confidence Level pum Fredicted values Figure 7 4 The Completed Residual Plots Options Dialog Box Click OK to display the Residual versus Row Number Plot see Figure 7 5 The Residual versus Row Number Plot reveals a pattern most of the first half of the data is below the zero line while most of the second half 1s above the zero line This indicates a biased model and confirms that using a single regression from the two groups is probably not adequate Now create an Autocorrelation Function P
55. it Plot design 21 General Linear Models OF x B am ee mj ao f General Linear Models P umber of dependent variables 1l umber of categorical factors 3 umber of quantitative factors O alysis of Variance for yield Source Sum of Squares Df Mean Square F BRatio P Value odel 162 02 ll 16 5475 7 85 0 0306 Residual 3 43 4 e 1075 Total iCorr 190 45 15 Type III Sums of Squares Source Sum of Squares Df Mean Square F BRatio P Value lock L3s1l 103 l 131 103 62 21 o 0014 regime 40 15 3 13 3567 6 36 0530 ariety 2 25 I 2 25 1 07 0 3599 lock regime 6 9275 3 30917 1 10 0 4476 Figure 4 3 The Analysis Summary Modifying the Error Terms 1 Click the right mouse button on the text pane then the left on Analysis Options to display the General Linear Models Options dialog box Click A in the Factors list box then None in the Error Term list box to display A None in the Selections list box you are not conducting a test on Block Click B in the Factor list box then A B in the Error Term list box to display B A B in the Selections list box The F test compares the mean squares for Regime B with the whole plot mean squares A B Click A B in the Factor list box then None in the Error Term list box to display A B None in the Selections list box Notice that the first factor A 1s the Block factor B versus A B is the whole plot design and B C is the subplot design see Figure 4 4 28
56. j e Re E Variance Inflation Factors for body fat Variable triceps midarm thigh I 0 01 0 02 0 03 0 04 Ridge parameter Figure 10 8 Variance Inflation Factors Plot Notice that the Y axis scaling ranges from 0 to 800 Because you are interested only in small VIF values you need to change the scaling Place the mouse pointer on one of the numbers in the Y axis and click the left mouse button to place markers at the corners of the scale Click the right button on Graphics Options to display the Graphics Options dialog box opened to the Y Axis tab page Accept the default 0 0 in the From text box enter 20 0 in the To text box and 2 in the By text box Accept the defaults for the remaining check boxes see Figure 10 9 Click OK to recalculate the Y Axis scale and redisplay the plot see Figure 10 10 Looking at Figure 10 10 it 1s evident that the variance inflation factors appear to be stabilized at theta 02 confirming that is the value that should be used to estimate the regression coefficients 99 Graphics Options Layout arid Lines Legend Top Title Axis Asis Profle Tithe VIF v Vertical From Labels Skip Rotate Avis Labels NoPower Log Hold Tithe Fonts Fisure 10 9 The Completed Y Axis Tab Page E Ridge Regression body fat IDE x am MERN ra g em Lt iow Variable triceps midarm thigh 0
57. lationship between that variable and the explanatory variables at the 99 percent confidence level The second ANOVA in the figure shows the results from testing the statistical significance of each factor in the order the factor was entered into the model Notice also that the highest p value in this example 1s 0 0678 for the Displace variable Because that value 1s greater than or equal to 0 10 that term is not statistically significant at the 90 percent or higher confidence level which indicates you should remove it from the model The validation column on the Residual Analysis Table shows that the results are reasonably consistent with the data you withheld Now you will return to the General Linear Models Analysis dialog box and remove the expression from the Select text box Removing a Random Sample 1 Click the Return to Analysis Dialog Box button the left most button on the Analysis toolbar to redisplay the General Linear Models Analysis dialog box 2 Click the mouse pointer in the left most corner of the Select text box over the letter r then hold down the left mouse button and drag the pointer over Random 100 to highlight it 3 Click the Delete button then OK to display the GLM Model Specification dialog box 4 Click OK to display the Analysis Summary and Scatterplot in the Analysis window then maximize the Analysis Summary to see it without the random sample see Figure 1 5 dS awm ED Je J n mj mI fa 4
58. le the amount of the price reduction The response variable Y was the proportion of coupons redeemed within six months for each group Your goal is to quantify the relationship between the level of price reduction and the probability that a coupon would be redeemed The data are stored in a file where p is the dependent variable n is the sample size and X is the quantitative factor reduction You will create the analysis by first finding the fitted response function To begin open STATGRAPHICS Plus and the Coupons data file Fitting the Regression Model 1 Choose SPECIAL ADVANCED REGRESSION LOGISTIC REGRESSION from the Menu bar to display the Analysis dialog box Enter p into the Dependent Variable text box Enter n into the Sample Sizes text box Enter Reduction into the Quantitative Factors text box see Figure 11 1 Click OK to display the Analysis Summary and Plot of Fitted Model in the Analysis window Maximize the Analysis Summary see Figure 11 2 In Figure 11 2 the p value for the model 1s less than 0 01 which indicates a statistically significant relationship between the variables at the 99 percent 103 Logistic Hegression H Dependent Variable p redeemed P reduction rm B ample Sizes mr Juantitative Factors Categorical Factors B 5 elect Figure 11 1 Completed Analysis Dialog Box E Logistic Hegression p Standard Estimated Odds Fat
59. lot Click the right mouse button on the graphics pane then the left on Pane Options to display the Residual Plots Options dialog box 54 14 15 L omparison of Regression Lines time versus size by type type o Mutual O Stock Studentized residual row number Figure 7 5 The Residual versus Row Number Plot Accept Studentized Residuals as the default for the type of data that will appear 1n the plot Click the Autocorrelation Function Plot check box accept the defaults in the Number of Lags and Confidence Level text boxes then click OK to display the Autocorrelation Function Plot shown in Figure 7 6 E L omparison of Regression Lines time versus size by type E D E c z Figure 7 6 The Autocorrelation Function Plot 55 Remember that the Durbin Watson statistic was less than 1 4 which raised the suspicion that there was serial correlation The Autocorrelation Function Plot confirms that suspicion the first bar extends above the upper probability limit The results from the analysis support the notion that analysts often group data in an effort to improve the prediction capability of a model You can safely conclude that the model is inadequate based on these results e the data were presorted by type of firm e the Durbin Watson statistic was less than 1 4 indicating the possibility of serial correlation confirmed by the Autocorrelation Function Plot Detecting bias would be more
60. lot see Figure 8 8 74 17 18 19 E Regression Model Selection logsurvive Number of Coefficients Figure 8 8 Mallows Cp Plot To see how close the Cp values the fourth and fifth coefficients are to the line scale the Cp axis from 0 to 10 by 1 Place the mouse pointer on one number of the X axis scale Click the left button to place markers around the scale then click the right to display the pop up menu Click Graphics Options to display the Graphics Options dialog box opened to the X Axis tab page Enter 0 in the From text box 10 in the To text box 1 in the By text box and accept the defaults for the other text boxes The dialog box should look like the one shown in Figure 8 9 Click OK to rescale the axis and redisplay the plot see Figure 8 10 The plot shows the Cp values for all possible regression models The three variable subset ABC has the smallest Cp value without an indication of severe bias compared with the full four variable model The fact that the Cp measure for this model is below the line p Cp is the result of random Neter et al 1996 noted that although C is on the line for p 5 it is due to the definition of Cp not because the model that contains all four variables is considered best 75 Graphics Options Layout Grid Lines Points Top Title X Axis o ovais Profile Title Number of Coetficients Vertical From Labels c F Ta fia Buy
61. ne see Figure 5 4 34 General Linear Models Options Sums of Squares Diems C Type response E Cancel Tyne lll lf Constant in Model Include MANDYA Help Factor Error T err n C CA BIA C 5A S elections CS Residual Figure 5 4 The Completed General Linear Models Options Dialog Box Creating a Report and a Plot 1 Click OK to recalculate and redisplay the Analysis Summary see Figure 5 5 The second ANOVA Table shows the Type III Sums of Squares Notice that the highest p value 1s 0 0088 for Factor A which 1s less than 0 01 the highest order term that 1s statistically significant at the 99 percent confidence level This means that you probably will not want to remove any variables from the model The new error term definitions are shown in the footnote on the table It would be interesting to see the Interaction Plot Click the Graphical Options button to display the dialog box then the Interaction Plot check box and OK to display the plot in the graphics pane Maximize the plot see Figure 5 6 35 General Linear Models 5 9911 erson i drug eE337 91 111 325 ime 269 615 96 5382 12 96 0 ime drug E27 417 27 9028 11 80 0 469 219 7 44792 Total f corrected 4957 16 F ratios are based on the following wean squarez i Besidual il BCA B Squared 90 5345 percent B Squared adjusted for d f SL5 7Z56 7 percent Standard Error of Est 725905 ean absolute err
62. ngl Laudi width Function gt amp 1 exp b width depth Weights Select Cancel Delete Transform Help Figure 9 1 Completed Analysis Dialog Box You need to enter a starting value for each of the parameters Notice that the two active text boxes a and b are the unknown variables in the function expression If you spell the name of a variable incorrectly the program will interpret the misspelled word as a parameter and enter it into one of the text boxes For example you might accidentally spell depth as detph Remember that if an unexpected parameter name appears 1n this dialog box the name is in the function expression but not in the file See Myers 1990 for the rationale for starting values 5 Type 35 in the a text box and 1 in the b text box see Figure 9 2 Interpreting the Results 1 Click OK to display the Analysis Summary and the Plot of Fitted Model in the Analysis window 2 Maximize the Analysis Summary see Figure 9 3 80 Initial Parameter Estimates _ ok Cancel b m Figure 9 2 Completed Initial Parameter Estimates Dialog Box ES EM ES 7 1 Lb Row Independent variables width depth Function to be estimated a il expi b iwidth depth Initial parameter estimates a 35 0 b 1 0 Estimation method Marquardt Estimation stopped due to convergence of parameter estimatez umber of iterations 4 umber of f
63. nia Duxbury Press Neter J Kutner M H Nachsheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc 9 TUTORIAL 10 Studying the Relationship of Body Fat to Explanatory Variables This tutorial is adapted from Neter et al 1996 You will use a portion of the data that were collected to study the relationship of the amount of body fat Y to several possible explanatory variables based on a sample of 20 healthy females 25 34 years old The variables are Triceps skin fold thickness Xi Thigh circumference X5 and Midarm circumference X3 The measurements for each of the 20 persons were obtained by immersing each person in water a cumbersome and expensive procedure In the Neter et al example the researchers thought it would be more helpful to use skin fold and tape measurements which are easy to obtain to provide reliable estimates It was also noted that there were informal indications of severe multi collinearity in the data When the model was fit using all three explanatory variables the estimated regression coefficient for the Thigh variable was negative although it was expected that the amount of body fat was positively related to Thigh circumference You will use the Ridge Regression Analysis to try to overcome the multi collinearity and to evaluate the data in the body fat example To begin open STATGRAPHICS Plus and the Bodyfat data file
64. o make predictions using this model notice that when Type Mutual the model reduces to Time 33 8741 0 101742 Size When Type Stock the model reduces to Time 41 9295 0101742 Size You now decide to estimate the time it takes for firms with revenues between 100 and 200 million to accept the innovation To calculate the predictions you will generate a Forecasts Table then use the Forecasts Options dialog box to add the two new observations to the Forecasts Table Click the Tabular Options button to display the dialog box then click the Forecasts check box and OK to display the Forecasts Table Maximize the table see Figure 7 11 Forecasts are shown for the minimum and maximum values of the Size variable Click the right mouse button then the left on Pane Options to display the Forecasts Options dialog box Type 100 in the third text box 200 in the fourth text box The dialog box should look like the one shown 1n Figure 7 12 60 Comparison of Regression Lines time versus size by type Predicted Prediction Limits Confidence Limits BO 7201 23 1361 27 3584 34 0817 38 7755 31 013 45 5381 35 0243 42 5267 2 84272 4 72907 10 4145 O 495596 6 18134 10 8952 3 47728 18 3131 7 91755 13 8788 Figure 7 11 The Forecasts Table Forecasts Options Confidence Level OK e el Cancel Forecast at X 100 Figure 7 12 Completed Forecasts Options Dialog Box Click OK to calculate th
65. o the type of firm the two regression lines are parallel Using the Model You forced equal slopes by eliminating the unnecessary interaction term from the model Time Look at the Analysis Summary again 1 Minimize the Conditional Sums of Squares Table then maximize the Analysis Summary pane see Figure 7 10 The R Squared and Adjusted R Squared statistics reveal that the reduction in the R Squared statistic 1s minor which is worth the tradeoff for a simpler model The value for the Standard Error of the Estimate has improved indicating less bias in the model The value for the Durbin Watson statistic is still reliable The Residual plots no longer show any problems to verify this you can optionally generate the Residual plots The equation for the final model 1s Time 33 8741 0 101742 Size 8 05547 Type Stock The conclusion is that a stock company will take about eight months longer to accept an innovation than will a mutual fund company of the same size 59 Comparison of Regression Lines time versus size by type US ce E e n CU NNI 33 8741 1 81356 15 6751 O 101742 O 00889122 11 443 2 05547 l 45311 5 52083 1504 41 752 207 L 6 357 10 3757 Total iCorr B Squared 89 5058 percent B Squared adjusted for d f 88 2712 percent Standard Error of Est 3 22113 ean absolute error z 3896 urbin Watzon statistic 1 97068 Figure 7 10 The Results of Forced Equal Slopes T
66. odels Analysis and to show you how to enter user specified contrasts You will use data collected and adapted from soil samples taken from four different locations 1n California Each location was sampled at 12 different depths and the percentage of sand silt and clay was determined for each sample There are three response variables Sand Silt and Clay For the first portion of the tutorial you will focus only on Sand Later you will apply the MANOVA capabilities in STATGRAPHICS Plus to all three variables Before you begin open STATGRAPHICS Plus and the Soil data file Completing the Analysis Dialog Box 1 Choose SPECIAL ADVANCED REGRESSION GENERAL LINEAR MODELS from the Menu bar to display the dialog box Enter Sand into the Dependent Variables text box Enter Location and Depth into the Categorical Factors text box see Figure 2 1 Click OK to display the GLM Model Specification dialog box Accept the defaults and click OK to display the Analysis Summary and Scatterplot in the Analysis window then maximize the Analysis Summary see Figure 2 2 The results show a summary of fitting a general linear statistical model that relates the Sand variable to two predictive factors The first ANOVA Table shows that the p value is less than 0 01 which indicates that there 1s a statistically significant relationship between Sand and the predictor variables at the 99 percent confidence level 11 General Linear Mo
67. oefficient for Thigh 2 85685 is negative The researchers felt that this was incorrect and probably due to ill conditioned data Now you will find a value for the ridge parameter that stabilizes the coefficient estimates by creating a table of regression coefficients for several values of the ridge parameter Creating Regression Coefficients 1 Click the Tabular Options button to display the dialog box then click the Regression Coefficients check box and OK to display the table 1n the second text pane of the Analysis window Maximize the table see Figure 10 3 BS Sel E de l ET NET Beqgression Coefficients ah Ridge Parameter triceps midarm thigh o 0 4 33409 1l8606 2 85685 0z 1 46445 O 673506 0 401195 0 004 l zz94 0 440827 O 0242273 o 006 0 843719 O 34604 0 128193 o 008 0 746454 O 294433 O 10469 0 01 0 685303 0 261854 ze61826 o 012 0 643238 O 2539335 0 296645 0 014 0 612456 O 222778 0 322184 Oo 016 oO 5588987 0 210045 0 341315 0 n1l1s 0 570418 U 18291lz O 356228 0z 0 555353 0 191627 0 368144 g nzz O 542666 O 184703 0 377856 g0 nz4 0 532333 0 178811 0 385898 026 0 523314 0 173719 0 392646 O 028 0 515494 O 169262 0 398371 g n3 0 508638 0 165315 A 032 72 Figure 10 3 Regression Coefficients Table The table shows the natural coefficient Unstandardized estimates for increments of theta from 0 to 0 1 As suspected the coefficient f
68. or 1 78841 Figure 5 5 Redisplay of the Analysis Summary General Linear Models Imp x HEGE ap id drug ax43 ww control L un c ua d Figure 5 6 The Interaction Plot The plot shows the interaction between Time and Drug The three lines on the plot represent each of the three drugs The lines connect the least squares means for the four levels of Time The lines will be parallel if an interaction does not occur You can see from the plot that the Time trend is very different among the three drugs 36 References Graybill F A 1976 Theory and Application of the Linear Model Belmont California Wadsworth McCullagh P and Nelder J A 1989 Generalized Linear Models second edition London Chapman amp Hall Milliken G A and Johnson D E 1984 Analysis of Messy Data Volume 1 Designed Experiments New York Van Nostrand Reinhold Morrison D F 1983 Applied Linear Statistical Methods Englewood Cliffs New Jersey Prentice Hall Inc Nelder J A and Wedderburn R W M 1972 Generalized Linear Models Journal of the Royal Statistical Society A135 370 384 Neter J Kutner M H Nachsheim C J and Wasserman W 1996 Applied Linear Statistical Models fourth edition Chicago Richard D Irwin Inc Scheffe H 1959 The Analysis of Variance New York John Wiley amp Sons 37 Introduction This portion of the online manual of tutorials for t
69. or Thigh becomes positive even for very small values of theta 0 005 To look for stabilization you will create and examine the Standardized Regression Coefficients 95 Click the Tabular Options button to display the dialog box then click the Standardized Regression Coefficients check box and OK to display the table in the third text pane Maximize the table see Figure 10 4 Ridge Regression body_fat Oy x Gio aoe ad m ENTE Standardized BRegression Coefficients Parameter triceps midarm thigh o 0 4 2637 1 56142 2 9287 o 002 1 44066 0 481273 0 411285 0 004 1 00632 0 314866 0z48366 o 006 0 830016 O 247163 0 131423 o 008 O 734352 0 210302 0 215763 0 01 0 674173 0 187032 0 268411 g 01z O 632792 0 170945 O 304311 0 014 0 602539 0 159121 0 330287 0 0168 0 579422 0 150027 0 349599 O 018 0 561155 0 142789 0 365187 0z 0 546334 O 136872 0 377404 n0zz 0 53405 0 131926 0 38736 0 024 0 523687 0 127718 0 395604 o 026 0 514815 lz4 sl O 4027522 0 028 Oo 507122 0 120897 0 40839 Figure 10 4 Standardized Regression Coefficients Table As you review the ridge parameters look for the smallest value that occurs before the estimates begin to slowly change after the standardized coefficients have begun to level off Neter et al recommend using theta 0 02 Before deciding that 0 02 1s the best choice check the values of that parameter to see if the variance inflation factors
70. ou scroll through the Analysis Summary notice that there are separate analysis of variance results for each of the three dependent variables The MANOVA statistics appear at the end of the summary including one set of statistics for each factor see Figure 2 8 Because p values below 0 10 indicate that an effect 1s statistically significant at the 90 percent or higher confidence level the most significant factor is B Depth The test was conducted using Wilks lambda To read the interpretations for other values in the table see the StatAdvisor 17 General Linear Models OVA for A ilks lambda 0 102074 F 13 0533 F walue zZ amp 3534E 12Z Pillai trace 1 24068 F 7 7572 P value l 44965E 8 otelling Lawley trace 5 46853 F 18 0259 P value 6 66134E 16 Boy s greatest root 4 77966 s 3 m O 5 n l4 5 Yvpothesis Matrix H 4350 6 loo0 5 bz53 31 1018 31 25 0711 Silt 1000 5 Se65 541 1582 16 Silt 25 0711 469 075 Sand 5263 31 15852 16 6780 18 Sand 2495 937 498 773 Figure 2 8 MANOVA Statistics in the Analysis Summary 18 TUTORIAL 3 Using Nested and Crossed Factors in a Model This tutorial illustrates how you use nested and crossed factors in a model The lesson 1s adapted from an example in Milliken and Johnson 1984 titled Simple Comfort Experiment The comfort experiment studied the effects of three temperature levels and the gender of a person male female
71. ox 69 The name of the variable is the logarithmic transformation Y logioY which the researchers used to make the distribution of the error terms more nearly normal and to reduce the BC interaction effect Enter Clotting Prognost Enzyme and Liver into the Independent Variables text box Note The variables are labeled alphabetically in the text and graphs For example Clotting becomes variable A Prognost becomes B and so on see Figure 8 1 Regression Model Selection Dependent Variable Jlogsurvive Independent V arables gt lagsurvive prognost survival Select weights W Sort Cancel Delete Transform Help Figure 8 1 Completed Dialog Box for the Regression Model Selection Analysis Click OK to display the Analysis Summary and the Adjusted R Squared Plot in the Analysis window Maximize the Analysis Summary The Analysis Summary includes values for the single variable models labeled as A B C and D Now you will eliminate these models from the analysis Click the right mouse button on the Analysis Summary pane then the left on Analysis Options to display the Regression Model Selection Options dialog box 70 Enter 2 into the Minimum text box to change the minimum number of variables that will be included in the study accept the default in the Maximum text box The dialog box on your screen should look like the one shown 1n Figure 8 2 Number of s
72. pointer in the Effects text box on the line immediately under the C effect Click A in the Factors list box then the arrow button to move the factor to the Effects list box Click the Cross button to place the asterisk to the right of the A factor in the Effects text box Click B in the Factors list box then the arrow button to move the factor to the Effects list box 26 Position and click the mouse pointer in the Effects text box on the line immediately under the C effect Follow Steps 3 4 and 5 above using the B and C factors Figure 4 2 illustrates how the GLM Model Specification dialog box will look when you have completed these steps GLH Model Specification Factors Effects LIF Cancel Enter Delete Help Figure 4 2 Completed GLM Model Specification Dialog Box Click OK to display the Analysis Summary and Scatterplot in the Analysis window Maximize the Analysis Summary see Figure 4 3 The table summarizes the results of fitting a general linear model that relates the Yield variable to three predictive factors Block Regime and Variety where Regime is the whole plot factor and Variety is the subplot factor The table also summarizes how well the model performed when it fit the data and predicted the values that were withheld from the fitting process Each of these statistics is based on the residuals To continue with the analysis you will change the error terms to account for the Spl
73. pothesis Matrix dialog box Using the Tab key to move from left to right in the matrix type the following in the first through fourth cells 5 0 0 5 and 1 see Figure 2 4 Hypothesis Matrix Es OF Cancel Jt Help Figure 2 4 The Completed Hypothesis Matrix Dialog Box Click OK to redisplay the Multiple Range Tests Table showing the results obtained using the contrasts you entered see Figure 2 5 14 E General Linear Models OF x BS Emp ee nl mj nm o nj ultiple Comparisons for Sand by Location ethod 95 0 percent LSD Location denotes a statistically significant estimate The StarAdwvizsar Figure 2 5 The Redisplayed Table of Results for the User Defined Contrasts The table shows the results of testing the contrasts The asterisk indicates that the contrast 1s statistically different from 0 0 at the 95 percent confidence level The program used Fisher s Least Significant Difference LSD method to discriminate among the means Using this method there 1s a 5 percent risk that each pair of means will be significantly different when the actual difference 1s zero You can use a Means Plot to verify the results f Click the Graphical Options button to display the dialog box then the Means Plot check box and OK to display the Means Plot in the graphics pane Maximize the plot see Figure 2 6 Notice that the averages of Locations 1 and 3 are higher than that of Location 4 Creating M
74. rials Increase the mean size or weight to 5 and compare the interval widths with those for a single measurement 2 Refit the data using the next best model from the Calibration Model Options dialog box 3 Create other tabular and graphical options especially Hypothesis Tests and Unusual Residuals References Caulcutt R and Boddy R 19965 Statistics for Analytical Chemists London Chapman amp Hall DataMyte Corporation 1987 DataMyte Handbook third edition Minnetonka Minnesota DataMyte Corporation Draper N and Smith H 1981 Applied Regression Analysis second edition New York John Wiley amp Sons 49 TUTORIAL 7 Analyzing An Insurance Innovation Study This tutorial was adapted from Neter et al 1996 where an economist decided to compare the speed at which a particular insurance innovation was accepted Y with the size of the insurance firm X1 and the type of firm The economist measured the response variable by the number of months that elapsed before the firm accepted the innovation The study included three variables e Size of the firm which is quantitative and measured by the amount of the firm s total assets 1n millions of dollars e Type of firm which is qualitative and identifies two classes stock companies and mutual fund companies e Time which represents the speed with which a firm initiated a particular innovation The economist wanted to compare regression model estimate
75. s across groups 10 mutual fund firms and 10 stock firms The data for these firms are in the Insurance file Time is the dependent variable Y Size is the independent variable X and Type 1s the level code a character variable that represents the type of firm either a stock or a mutual fund brokerage The purpose of the tutorial is to determine if a regression analysis performed on the Time versus Size variables can be improved by taking into account the effect of the type of firm You will complete a regression analysis on each group to see if the slopes and or intercepts differ significantly between the groups To begin the analysis open the Insurance data file Estimating the Model without Groups It is usually a good idea to look for bias in a model when the data come from or are suspected to come from distinct groups The first step 1s to run a simple regression analysis on all the data 51 Choose SPECIAL ADVANCED REGRESSION COMPARISON OF REGRESSION LINES from the Menu bar to display the Comparison of Regression Lines Analysis dialog box Enter Time into the Dependent Variable text box Enter Size into the Independent Variable text box Enter Type into the Level Codes text box see Figure 7 1 L omparison of Hegression Lines size Dependent Variable Hime Independent Variable Level Codes gt Select lf Sart Cancel Delete Transform Help Figure 7 1 Completed Di
76. se either the mouse or the keyboard to add terms to the Effects text box The steps below and throughout the remainder of these tutorials use the mouse and the keyboard If you vary from any of the steps your results may not match the example Because you will use Factor A only in the nested factor you will first delete it from the Effects list box Then you will create the model to include the Temperatur Gender effect B C and the nested factor A B The nesting occurs in the design structure with Chamber nested within Temperatur Milliken and Johnson 1984 The A B nested effect nests Chamber within Temperatur that 1s Chambers 1 2 and 3 for the lowest temperature 65 are not the same as Chambers 1 2 and 3 for the highest temperature 75 For clarification you may want to look at how the data are entered into the DataSheet Click the mouse pointer in the left most corner of the letter A in the Effects list box hold down the left button and drag the pointer over A to highlight it Click Delete to delete the factor Click the mouse pointer 1n the Effects text box on the line immediately under the C effect Click B in the Factors list box then click the arrow button to move the factor to the Effects list box Click the asterisk Cross to move the asterisk to the right of the B factor in the Effects list box Click C in the Factors list box then click the arrow button to move the factor to the Effects list box
77. sure 9 13 Square Plot 88 Estimating Predictions 1 Click the Tabular Options button to display the dialog box then the Reports check box and OK to display the report in the second text pane Maximize the table see Figure 9 14 Stnd Error Lower 95 0 CL Upper 95 0 CL Lower 95 0 CL Value for Forecast for Forecast for Forecast Figure 9 14 The Reports Table Notice that the table in Figure 9 14 1s blank To correct this you need to make a change on the Reports Options dialog box Click the right mouse button on the text pane then the left on Pane Options to display the Reports Options dialog box Four of the options are currently chosen You will add one more Click the Observed Y check box see Figure 9 15 Reports Options I Fitted Y Residuals Studentized Residuals w Standard Errors for Forecasts W Confidence Limits for Individual Forecasts W Confidence Limits for Forecast Means Figure 9 15 The Completed Reports Options Dialog Box 89 Click OK to redisplay the Reports Table see Figure 9 16 E Nonlinear Regression drawangl Id x perder 8 sl Regression Results for drawangl gt bzerwed Fitted Stnd Error Lower 35 0 CL Upper 395 0 CL Lower 35 0 CL Bow Value Value for Forecast for Forecast for Forecast for Mean l 33 6 26 3759 3 9619 17 8814 34 8763 z4 1504 Z 22 3 24 1155 3 97039 15 5995 32 6311 21 8185 3 z2 0 23 6671 3 97457 15 1425 32 1917 1 3372 4 13 7 18 9497
78. text box Click the mouse pointer in the last text box FUNCTION and type MY FUNCTION see Figure 9 17 Click OK to save the values and the function 90 Save Results Options Save Target Variables DK Predicted values PREDICTED Cancel Standard Errors of Predictions PSTDERRORS Lower Limits For Predictions DwWERPLIMS Upper Limits For Predictions JUPPERPLIMS Standard Errors of Means ICSTDERRORS Lower Limits for Forecast Means DWERCLIMS Upper Limits for Forecast Means JUPPERCLIMS Residuals RESIDUALS Studentized Residuals ISRESIDUALS Leverages LEVERAGES DFITS Statistics FITS O Mahalanobis Distances IMDISTS Coefficients COEFFS Functian MY FUNCTION 4 a1 sl c tT tT tT d dod d d x Figure 9 17 The Save Results Options Dialog Box Showing All the Results that Will Be Saved References Cox D R 1970 Analysis of Binary Data London Chapman amp Hall Draper N R and Smith H J Applied Regression Analysis second edition New York John Wiley amp Sons Hartley H O 1961 The Modified Gauss Newton Method for the Fitting of Non Linear Regression Functions by Least Squares Technometrics 3 269 280 Marquardt D W 1963 An Algorithm for Least Squares Estimation of Nonlinear Parameters Journal for the Society of Industrial and Applied Mathematics 11 481 441 Myers R H 1990 Classical and Modern Regression with Applications second edition Belmont Califor
79. the Nonlinear Regression Options dialog box Type 50 in the Maximum Iterations text box leave the remaining options as they are currently set see Figure 9 6 Click OK to recalculate and redisplay the Analysis Summary using the new values for the Maximum Iterations see Figure 9 7 This time the estimation stopped due to convergence of the residual sum of squares The summary shows that 40 iterations and 122 function calls were performed 83 Honlinear Regression Options Estimation Method C Marquardt 0 Stopping Criterion 1 6 005 Stopping Criterion 2 fi 6 004 Maximum Iterations eo Maximum Function Calls 00 C Gauss Newton Cancel Help di Marquardt Parameter Initial Salue 2 002 Scaling Factor 0 MM asimum value fizo Confidence Level Figure 9 6 Completed Nonlinear Regression Options Dialog Box 3 Honlnear Regression drawangl OW XJ ekee e o m ependent variable drawangl Independent variables width depth Function to be estimated a i l expi bhb iwidth depth 1 Initial parameter estimates a 35 0 b 1 0 Estimation method Steepest descent Estimation stopped due to convergence of residual zum of squares umber of iterations 40 umber of function calls 122 Estimation Results Asymptotic 95 0 ASyuUptotic Confidence Interval Standard Error 32 4945 2 64617 26 8193 398 1703 1 505 0 297072 0 868745 2 amp 14306
80. unction calls Estimation Results Asymptotic 95 0 ASyuptotic Confidence Interval Standard Error 2 64058 26 8025 38 1295 0 z98051 0 871546 2 15016 Figure 9 3 The Analysis Summary The Analysis Summary shows that the estimation process was terminated when it successfully completed four iterations At this point the residual sum of squares appeared to approach a minimum 81 The R Squared statistic shows that as the model was fit 1t explained 67 2795 percent of the variability in the dependent variable The value of the Adjusted R Squared statistic is 64 9423 percent the Durbin Watson statistic is greater than 1 4 which indicates that there are no serious autocorrelations in the residuals Now you will use the Nonlinear Regression Options dialog box to change the estimation method from the default Marquardt to Steepest Ascent You will do this to increase the number of iterations it takes to get convergence Click the right mouse button on the text pane then the left on Analysis Options to display the Nonlinear Regression Options dialog box Click Steepest Descent in the Method portion of the dialog box accept the defaults for the remaining options see Figure 9 4 Honlinear Regression Options Estimation Method C Marquardt OF Stopping Criterion 1 Tai C 3ausz Mewtan Steepest Descent Help Cancel di Stopping Criterion 2 i 6 004 Maximum Iterations 30 Maximum Function Calls Conf
81. were correctly predicted for a total of 68 2 percent This cutoff value may be a good value to use to make additional predictions Now look at the Prediction Histograms Maximize the Prediction Histograms see Figure 11 11 meg wl Row Ed Model Predictions for p frequency False 0 0 2 0 4 0 6 0 8 predicted probability Figure 11 11 The Prediction Histograms Plot The plot shows the ability of the fitted logistic model to distinguish between cases when the outcome 1s True or False and shows the frequency distribution of the True and False cases versus the probability predicted by the fitted model Ideally the model predicts a small probability for the False cases and a large probability for the True cases Notice that the large frequencies above the line are plotted on the far right and large frequencies below the line are plotted on the left which indicates that the model works reasonably well 111 Additional Exercise 1 As an additional exercise access the Logistic Regression Options dialog box change the method to Weighted Least Squares and compare the Analysis Summary to the one you created using Maximum Likelihood References Cox D R 1970 The Analysis of Binary Data London Methuen and Co Ltd Chatterjee S and Price B 1991 Regression Analysis by Example second edition New York John Wiley amp Sons Inc Collett D 1991 Modelling Binary Data London Chapman amp Hall Myers

Advanced Regression Analyses Tutorial

Contents

Download Pdf Manuals

Related Search

Related Contents