Home

MODEL II REGRESSION USER'S GUIDE, R EDITION

1. Model II regression should be used when the two variables in the regression equation are random i e not controlled by the researcher Model I regression using least squares underestimates the slope of the linear relationship between the variables when they both contain error see example in chapter 5 4 p m Detailed recommendations follow Date Id mod2user Rnw 593 2008 11 24 19 14 59Z jarioksa 1m Sokal and Rohlf Biometry 2nd edition 1981 551 the numerical result for MA regression for the example data set is wrong The mistake has been corrected in the 1995 edition 1 2 PIERRE LEGENDRE TABLE 1 Application of the model II regression methods The numbers in the left hand column refer to the corresponding paragraphs in the text Par Method Conditions of application Test possible 1 OLS Error on y gt error on x Yes MA Distribution is bivariate normal Yes Variables are in the same physical units or dimensionless Variance of error about the same for x and y 4 Distribution is bivariate normal Error variance on each axis proportional to variance of corresponding variable 4a RMA Check scatter diagram no outlier Yes 4b SMA Correlation r is significant No 5 OLS Distribution is not bivariate normal Yes Relationship between x and y is linear 6 OLS To compute forecasted fitted or predicted y values Yes Regression equation and confidence intervals are irrelevant 7 MA To compare observations to model predictions Yes 1 REC
2. 721176 NA 4 RMA 0 26977945 2 53296649 68 456190 0 16 Confidence intervals Method 2 5 Intercept 97 5 Intercept 2 5 Slope 97 5 Slope 1 OLS 0 1220525 0 26354020 0 2711429 0 1109234 2 MA NA NA NA NA 3 SMA 0 1278759 0 15887844 1 1662547 0 7841884 4 RMA 0 0965427 0 06826575 1 6330043 0 3980472 Eigenvalues 1 071318 0 8857145 H statistic used for computing C I of MA 1 106884 Neither the correlation nor any of the regression coefficients are significant this is as expected from the way the data were generated Note that the slope estimates differ widely among methods The MA slope is bma 0 60005 but its confidence interval noted NA for Not available or Missing value covers all 360 of the plane as stated in the comment above the regression table The RMA slope estimate 14 PIERRE LEGENDRE is brma 2 53297 OLS which should only be used to predict the values from x point 6 in Table 1 tends to produce slopes near zero for random data bors 0 08011 Since the correlation is not significant SMA should not have been computed This method tends to produce slopes near 1 with the present example the slope is indeed near 1 bsma 0 95633 since the standard deviations of the two variables are nearly equal sy 1 01103 s 0 96688 This example shows that RMA does not necessarily produce results that are similar to SMA The confidence intervals of the slope and intercept of RMA provide an exampl
3. degrees P perm 1 tailed 2 631527 3 465907 3 059635 2 963292 Confidence intervals Method 2 5 Intercept 97 5 Intercept 2 5 Slope 97 5 Slope 1 OLS 12 490993 2 MA 1 347422 3 SMA 9 195287 4 RMA 8 962997 Eigenvalues 269 8212 6 418234 69 19283 0 01 73 90584 0 01 71 90073 NA 71 35239 0 01 27 56251 1 858578 3 404476 19 76310 2 663101 4 868572 22 10353 2 382810 3 928708 23 84493 2 174260 3 956527 H statistic used for computing C I of MA 0 006120651 1 In this table of the output file the rows correspond respectively to the MA OLS and RMA slopes and to the coefficient of correlation r Corr Stat is the value of the statistic being tested for significance As explained in the Output file section the statistic actually used by the program for the test of the MA slope in this example is the inverse of the bma slope estimate 1 3 46591 0 28852 because the reference value of the statistic in this permutation test must not exceed 1 One tailed probabilities One tailed p are computed in the direction of the sign of the coefficient For a one tailed test in the upper tail i e for a coefficient with a positive sign p EQ GT Number of permutations 1 For a test in the lower tail i e for a coefficient with a negative sign p LT EQ Number of permutations 1 where e LT is the number of values under permutation that are smaller than the reference value e EQ is the number of values under permu
4. methods The method name is followed by the parametric 95 confidence intervals 2 5 and 97 5 percentiles for the intercept and slope estimates 5 The eigenvalues of the bivariate dispersion computed during major axis regression and the H statistic used for computing the confidence interval of the major axis slope notation following Sokal and Rohlf 1995 For the slopes of MA OLS and RMA the permutation tests are carried out using the slope estimates b themselves as the reference statistics In OLS simple linear regression a permutation test of significance based on the r statistic is equivalent to a permutation test based on the pivotal t statistic associated with bors 21 On the other hand across the permutations the slope estimate bors differs from r by a constant sy s since bots TzySy 8z so that bors and r are equivalent statistics for permutation testing As a consequence a permutation test of borg is equivalent to a permutation test carried out using the pivotal t statistic associated with bozs This is not the case in multiple linear regression however as shown by 1999 If the objective is simply to assess the relationship between the two variables under study one can simply compute the correlation coefficient r and test its sig nificance A parametric test can be used when the assumption of binormality can safely be assumed to hold or a permutation test when it cannot For the intercept of OLS the confidence int
5. 01728 66 51716 0 01 3 SMA 12 193785 2 119366 64 74023 NA MODEL II REGRESSION USER S GUIDE R EDITION 11 FIGURE 4 Scatter diagram of the Example 3 data number of eggs produced as a function of the mass of unspawned females with the ranged major axis RMA stan dard major axis SMA and or dinary least squares OLS regres sion lines RMA and SMA are the appropriate regression lines in this example The cross indicates the centroid of the bivariate distribu No of eggs 15 20 25 30 35 40 tion The three regression lines Fish mass x100 g pass through this centroid 4 RMA 13 179672 2 086897 64 39718 0 01 Confidence intervals Method 2 5 Intercept 97 5 Intercept 2 5 Slope 97 5 Slope 1 OLS 4 098376 43 63201 1 117797 2 622113 2 MA 36 540814 27 83295 1 604304 3 724398 3 SMA 14 576929 31 09957 1 496721 3 001037 4 RMA 18 475744 35 25317 1 359925 3 129441 Eigenvalues 494 634 17 49327 H statistic used for computing C I of MA 0 02161051 5 4 Highly correlated random variables 5 4 1 Input data 1996 generated a variable X containing 100 values drawn at random from a uniform distribution in the interval 0 10 They then generated two other variables N and No containing values drawn at random from a normal distribution M 0 1 These variables were combined to create two new variables x X N and y X N2 The relationship constructed in this way between x and y is perfect a
6. 53X3 0 139015 X4 MODEL II REGRESSION USER S GUIDE R EDITION 7 This equation was applied to the second data set also 54 patients to produce forecasted survival times In the present example these values are compared to the observed survival times Fig 2 shows the scatter diagram with log observed survival time in abscissa and forecasted values in ordinate The MA regression line is shown with its 95 confidence region The 45 line which would correspond to perfect forecasting is also shown for comparison 5 1 2 Output file MA SMA and OLS equations 95 C I and tests of signifi cance were obtained with the following R commands The RMA method which is optional was not computed since MA is the only appropriate method in this example gt data mod2ex1 gt Exi res lt Imodel2 Predicted_by_model Survival data mod2ex1 nperm 99 gt Exi res Model II regression Call lmodel2 formula Predicted_by_model Survival data mod2ex1 nperm 99 n 54 r 0 8387315 r square 0 7034705 Parametric P values 2 tailed 2 447169e 15 1 tailed 1 223585e 15 Angle between the two OLS regression lines 9 741174 degrees Permutation tests of OLS MA RMA slopes 1 tailed tail corresponding to sign A permutation test of r is equivalent to a permutation test of the OLS slope P perm for SMA NA because the SMA slope cannot be tested Regression results Method Intercept Slope Angle degrees P perm 1 taile
7. MODEL II REGRESSION USER S GUIDE R EDITION PIERRE LEGENDRE CONTENTS 1 Recommendations on the use of model II regression methods 2 2 Ranged major axis regression 4 5 5 6 6 2 Eagle rays and Macomona bivalves 7 3 Cabezon spawning 10 1 12 References 14 Function lmodel2 computes model II simple linear regression using the follow ing methods major axis MA standard major axis SMA ordinary least squares OLS and ranged major axis RMA Information about these methods is avail able for instance in section 10 3 2 of and in sections 14 13 and 15 7 of 1995 Parametric 95 confidence intervals are computed for the slope and intercept parameters A permutation test is available to determine the significance of the slopes of MA OLS and RMA and also for the cor relation coefficient This function represents an evolution of a FORTRAN program written in 2000 and 2001 Bartlett s three group model II regression method described by the above men tioned authors is not computed by the program because it suffers several draw backs Its main handicap is that the regression lines are not the same depending on whether the grouping into three groups is made based on x or y The regression line is not guaranteed to pass through the centroid of the scatter of points and the slope estimator is not symmetric i e the slope of the regression y f x is not the reciprocal of the slope of the regression x f y
8. OMMENDATIONS ON THE USE OF MODEL II REGRESSION METHODS Considering the results of simulation studies Legendre and Legendre 1998 offer the following recommendations to ecologists who have to estimate the parameters of the functional linear relationships between variables that are random and measured with error Table I 1 2 4 If the magnitude of the random variation i e the error varianc on the response variable y is much larger i e more than three times than that on the explanatory variable x use OLS Otherwise proceed as follows Check whether the data are approximately bivariate normal either by look ing at a scatter diagram or by performing a formal test of significance If not attempt transformations to render them bivariate normal For data that are or can be made to be reasonably bivariate normal consider rec ommendations 3 and 4 If not see recommendation 5 For bivariate normal data use major axis MA regression if both vari ables are expressed in the same physical units untransformed variables that were originally measured in the same units or are dimensionless e g log transformed variables if it can reasonably be assumed that the error variances of the variables are approximately equal When no information is available on the ratio of the error variances and there is no reason to believe that it would differ from 1 MA may be used provided that the results are interpreted with caution MA pr
9. ariate distribution The MA re gression line passes through this centroid log observed survival time 5 2 1 Input data The following table presents observations at 20 sites from a study on predator prey relationships 1997 y is the number of bivalves Ma comona liliana larger than 15 mm in size found in 0 25 m quadrats of sediment x is the number of sediment disturbance pits of a predator the eagle ray Myliobatis tenuicaudatus found within circles of a 15 m radius around the bivalve quadrats The variables x and y are expressed in the same physical units and are estimated with sampling error and their distribution is approximately bivariate normal The error variance is not the same for x and y but since the data are animal counts it seems reasonable to assume that the error variance along each axis is proportional to the variance of the corresponding variable The correlation is significant r 0 86 p lt 0 001 RMA and SMA are thus appropriate for this data set MA and OLS are not Fig 3 shows the scatter diagram The various regression lines are presented to allow their comparison 5 2 2 Output file MA SMA OLS and RMA regression equations confidence in tervals and tests of significance were obtained with the following R commands That the 95 confidence intervals of the SMA and RMA intercepts do not include 0 may be due to different reasons 1 the relationship may not be perfectly linear 2 the C I of
10. ch W H Freeman 3rd edition 1995 D PARTEMENT DE SCIENCES BIOLOGIQUES UNIVERSIT DE MONTR AL C P 6128 SUCCURSALE CENTRE VILLE MONTREAL QUEBEC H3C 3J7 CANADA E mail address Pierre Legendre umontreal ca
11. d 1 OLS 0 6852956 0 6576961 33 33276 0 01 2 MA 0 4871990 0 7492103 36 84093 0 01 3 SMA 0 4115541 0 7841557 38 10197 NA Confidence intervals Method 2 5 Intercept 97 5 Intercept 2 5 Slope 97 5 Slope 1 OLS 0 4256885 0 9449028 0 5388717 0 7765204 2 MA 0 1725753 0 7633080 0 6216569 0 8945561 3 SMA 0 1349629 0 6493905 0 6742831 0 9119318 Eigenvalues 0 1332385 0 01090251 H statistic used for computing C I of MA 0 007515993 The interesting aspect of the MA regression equation in this example is that the regression line is not parallel to the 45 line drawn in Fig The 45 line is not included in the 95 confidence interval of the MA slope which goes from tan 0 62166 31 87 to tan 0 89456 41 81 The Figure shows that the forecasting equation overestimated survival below the mean and underestimated it above the mean The OLS regression line which is often erroneously used by researchers for comparisons of this type would show an even greater discrepancy 33 3 angle from the 45 line compared to the MA regression line 36 8 angle 5 2 Eagle rays and Macomona bivalves 8 PIERRE LEGENDRE MA regression MA regression Ry 9 P Confidence limits ai 45 degree line Forecasted FIGURE 2 Scatter diagram of the 2 5 Example 1 data showing the major axis MA regression line and its 95 confidence region The 45 line is drawn for reference The cross indicates the centroid of the biv
12. e of the phenomenon of inversion of the confidence limits described in Fig REFERENCES M J Anderson and P Legendre An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model Journal of Statistical Computation and Simulation 62 271 303 1999 A H Hines R B Whitlatch S F Thrush J E Hewitt V J Cummings P K Dayton and P Legendre Nonlinear foraging response of a large marine predator to benthic prey eagle ray pits and bivalves in a New Zealand sandflat Journal of Experimental Marine Biology and Ecology 216 191 210 1997 P Jolicoeur Bivariate allometry interval estimation of the slopes of the ordinary and standardized normal major axes and structural relationship Journal of Theoretical Biology 144 275 285 1990 P Legendre and L Legendre Numerical ecology Number 20 in Developments in Environmental Modelling Elsevier Amsterdam 2nd edition 1998 B McArdle The structural relationship regression in biology Canadian Journal of Zoology 66 2329 2339 1988 F Mespl M Troussellier C Casellas and P Legendre Evaluation of simple statistical criteria to qualify a simulation Ecological Modelling 88 9 18 1996 J Neter M H Kutner C J Nachtsheim and W Wasserman Applied linear statistical models Richad D Irwin Inc 4th edition 1996 R R Sokal and F J Rohlf Biometry The principles and practice of statistics in biological resear
13. er diagram of the objects RMA should not be used in the presence of outliers be cause they cause important changes to the estimates of the ranges of the variables Outliers that are not aligned fairly well with the dispersion ellipse of the objects MODEL II REGRESSION USER S GUIDE R EDITION 5 may have an undesirable influence on the slope estimate The identification and treatment of outliers is discussed in Section 13 4 Outliers may in some cases be eliminated from the data set or they may be subjected to a winsorizing procedure described by these authors 3 INPUT FILE A data frame with objects in rows and variables in columns 4 OUTPUT FILE The output file obtained by print 1model12 contains the following results 1 The call to the function 2 General regression statistics number of objects n correlation coefficient r coefficient of determination r of the OLS regression parametric P values 2 tailed one tailed for the test of the correlation coefficient and the OLS slope angle between the two OLS regression lines 1m y x and Im x y 3 A table with rows corresponding to the four regression methods Column 1 gives the method name followed by the intercept and slope estimates the angle between the regression line and the abscissa and the permutational probability one tailed for the tail corresponding to the sign of the slope estimate 4 A table with rows corresponding to the four regression
14. erval is computed using the standard formulas found in textbooks of statistics results are identical to those of standard statistical software No such formula providing correct a coverage is known for the other three methods In the program the confidence intervals for the intercepts of MA SMA and RMA are computed by projecting the bounds of the confidence intervals of the slopes onto the ordinate this results in an underestimation of these confidence intervals In MA or RMA regression the bounds of the confidence interval C I of the slope may on occasions lie outside quadrants I and IV of the plane centred on 6 PIERRE LEGENDRE bring 2 75 b FIGURE 1 a Ifa MA regression line has the lower bound of its confidence interval C I in quadrant III this bound has a positive slope 2 75 in example b Likewise if a MA regression line has the upper bound of its confidence interval in quadrant II this bound has a negative slope 5 67 in example the centroid of the bivariate distribution When the lower bound of the confidence interval corresponds to a line in quadrant III Fig 1h it has a positive slope the RMA regression line of example in chapter p provides an example of this phenomenon Likewise when the upper bound of the confidence interval corresponds to a line in quadrant II Fig 1 it has a negative slope In other instances the confidence interval of the slope may occupy all 360 of the plane w
15. he parameters of the regression line The significance of the slope should be tested by permutation however because the distributional assumptions of parametric testing are not satisfied 2 If a straight line is not an appro priate model polynomial or nonlinear regression should be considered When the purpose of the study is not to estimate the parameters of a functional relationship but simply to forecast or predict values of y for given x s use OLS in all cases OLS is the only method that minimizes the squared residuals in y The OLS regression line itself is meaningless Do not use the standard error and confidence bands however unless x is known to be free of error p 545 Table 14 3 this warning applies in particular to the 95 confidence intervals computed for OLS by this program Observations may be compared to the predictions of a statistical or deter ministic model e g simulation model in order to assess the quality of the model If the model contains random variables measured with error use MA for the comparison since observations and model predictions should be in the same units If the model fits the data well the slope is expected to be 1 and the intercept 0 A slope that significantly differs from 1 indicates a difference between observed and simulated values which is proportional to the ob served values For relative scale variables an intercept which significantly differs from 0 suggests the existence of a syste
16. hich results in it having no bounds The bounds are then noted 0 00000 see chapter 5 5 p 12 In SMA or OLS confidence interval bounds cannot lie outside quadrants I and IV In SMA the regression line always lies at a 45 or 45 angle in the space of the standardized variables the SMA slope is a back transformation of 45 to the units of the original variables In OLS the slope is always at an angle closer to zero than the major axis of the dispersion ellipse of the points i e it always underestimates the MA slope in absolute value 5 EXAMPLES 5 1 Surgical unit data 5 1 1 Input data This example compares observations to the values forecasted by a model A hospital surgical unit wanted to forecast survival of patients undergoing a particular type of liver surgery Four explanatory variables were measured on patients The response variable Y was survival time which was log transformed The data are described in detail in Section 8 2 of Neter et al 1996 who also provide the original data sets The data were divided in two groups of 54 patients The first group was used to construct forecasting models whereas the second group was reserved for model validation Several regression models were studied One of them which uses variables X3 enzyme function test score and X4 liver function test score is used as the basis for the present example The multiple regression equation is the following 1 388778 0 0056
17. istic used for computing C I of MA 0 002438452 The noticeable aspect is that with OLS regression the confidence interval of the slope does not include the value 1 and the confidence interval of the intercept does not include the value 0 The OLS slope underestimates the real slope of the bivariate functional relationship which is 1 by construct in this example This illustrates the fact that OLS considered as model I regression method is inadequate to estimate the slope of the functional relationship between these variables As a model II regression method OLS would only be appropriate to predict the values from x point 6 in Table Ip With all the other model II regression methods the confidence intervals of the slopes include the value 1 and the confidence intervals of the intercepts include the value 0 as expected for this data set because of the way the data were generated 5 5 Uncorrelated random variables 5 5 1 Input data Two vectors of 100 random data drawn from a normal distribu tion N 0 1 were generated One expects to find a null correlation with this type of data which were submitted to the model II regression program 5 5 2 Output file MA SMA OLS and RMA equations 95 C I and tests of significance were obtained with the following R commands Fig shows the scatter diagram The various regression lines are presented to allow their comparison gt data mod2ex5 gt Ex5 res lt Imodel2 random_y rand
18. matic difference between ob servations and simulations Mespl et al 1996 4 PIERRE LEGENDRE 8 With all methods the confidence intervals are large when n is small they become smaller as n goes up to about 60 after which they change much more slowly Model II regression should ideally be applied to data sets containing 60 observations or more Some of the examples presented below have fewer observations they are only presented for illustration 2 RANGED MAJOR AXIS REGRESSION Ranged major axis regression RMA is described in Legendre and Legendre 1998 511 512 It is computed as follows 1 Transform the y and x variables into y and 2 respectively whose range is 1 Two formulas are available for ranging depending on the nature of the variables e For variables whose variation is expressed relative to an arbitrary zero interval scale variables e g temperature in C the formula for rang ing is y Yi Ymin or x Ti Tmin 1 Ymax Ymin Lmax Tmin e For variables whose variation is expressed relative to a true zero value ratio scale or relative scale variables e g species abundances or tem perature expressed in K the recommended formula for ranging as sumes a minimum value of 0 eq reduces to u o y 2 Ymax Tmax 2 Compute MA regression between the ranged variables y and 2 Test the significance of the slope estimate by permutation if needed 3 Back transfo
19. nd should have a slope of 1 despite the fact that there is normal error added independently to x and y 5 4 2 Output file The various model II regression methods were applied to this data set with the following results were obtained with the following R commands gt data mod2ex4 gt Ex4 res Imodel2 y x data mod2ex4 interval interval 99 gt Ex4 res Model II regression Call lmodel2 formula y x data mod2ex4 range y interval range x interval nperm 99 n 100 r 0 896898 r square 0 804426 Parametric P values 2 tailed 1 681417e 36 1 tailed 8 407083e 37 12 PIERRE LEGENDRE Angle between the two OLS regression lines 6 218194 degrees Permutation tests of OLS MA RMA slopes 1 tailed tail corresponding to sign A permutation test of r is equivalent to a permutation test of the OLS slope P perm for SMA NA because the SMA slope cannot be tested Regression results Method Intercept Slope Angle degrees P perm 1 tailed 1 OLS 0 7713474 0 8648893 40 85618 0 01 2 MA 0 2905866 00 9602938 43 83962 0 01 3 SMA 0 2703395 0 9643118 43 95915 NA 4 RMA 0 3142417 0 9555996 43 69937 0 01 Confidence intervals Method 2 5 Intercept 97 5 Intercept 2 5 Slope 97 5 Slope 1 OLS 0 2663618 1 2763329 0 7794015 0 950377 2 MA 0 2121756 0 7477724 0 8695676 1 060064 3 SMA 0 1795065 0 6820701 0 8826059 1 053581 4 RMA 0 1848274 0 7702125 0 8651145 1 054637 Eigenvalues 17 56697 0 9534124 H stat
20. oduces unbi ased slope estimates and accurate confidence intervals 1990 MA may also be used with dimensionally heterogeneous variables when the purpose of the analysis is 1 to compare the slopes of the relationships between the same two variables measured under different conditions e g at two or more sampling sites or 2 to test the hypothesis that the major axis does not significantly differ from a value given by hypothesis e g the relationship EF b m where according to the famous equation of Einstein b c being the speed of light in vacuum For bivariate normal data if MA cannot be used because the variables are not expressed in the same physical units or the error variances on the two 2Contrary to the sample variance the error variance on x or y cannot be estimated from the data An estimate can only be made from knowledge of the way the variables were measured MODEL II REGRESSION USER S GUIDE R EDITION 3 axes differ two alternative methods are available to estimate the param eters of the functional linear relationship if it can reasonably be assumed that the error variance on each axis is proportional to the variance of the corresponding variable i e the error variance of y the sample variance of y the error variance of x the sample variance of x This condition is often met with counts e g number of plants or animals or log transformed data 1958 a Ranged major axis regression RMA can be
21. om_x data mod2ex5 interval interval 99 gt Exd5 res Model II regression Call lmodel2 formula random_y random_x data range y interval range x interval nperm mod2ex5 99 MODEL II REGRESSION USER S GUIDE R EDITION 13 FIGURE 5 Scatter diagram of the Example 5 data random numbers showing the major axis MA standard major axis SMA or dinary least squares OLS and ranged major axis RMA OLS re gression lines The correlation co efficient is not significantly differ ent from zero The cross indicates the centroid of the bivariate distri bution The four regression lines pass through this centroid n 100 r 0 0837681 r square 0 007017094 Parametric P values 2 tailed 0 4073269 1 tailed 0 2036634 Angle between the two OLS regression lines 80 41387 degrees Permutation tests of OLS MA RMA slopes 1 tailed tail corresponding to sign A permutation test of r is equivalent to a permutation test of the OLS slope P perm for SMA NA because the SMA slope cannot be tested Confidence interval NA when the limits of the confidence interval cannot be computed This happens when the correlation is 0 or the C I includes all 360 deg of the plane H gt 1 Regression results Method Intercept Slope Angle degrees P perm 1 tailed 1 OLS 0 07074386 0 08010978 4 580171 0 16 2 MA 0 11293376 0 60004584 30 965688 0 16 3 SMA 0 14184407 0 95632810 43
22. plies to the data a female with a mass of 0 is expected to produce no egg Another interesting property of RMA and SMA is that their estimates of slope and intercept change proportionally to changes in the units of measurement One can easily verify that by changing the decimal places in the Example 2 data file and recomputing the regression equations RMA and SMA share this property with OLS MA regression does not have this property this is why it should only be used with variables that are in the same physical units as those of Example 1 5 3 2 Output file MA SMA OLS and RMA equations 95 C I and tests of significance were obtained with the following R commands gt data mod2ex3 gt Ex3 res Imodel2 No_eggs Mass data mod2ex3 relative relative 99 gt Ex3 res Model II regression Call lmodel2 formula No_eggs Mass data mod2ex3 range y relative range x relative nperm 99 n 11 r 0 882318 r square 0 7784851 Parametric P values 2 tailed 0 0003241737 1 tailed 0 0001620869 Angle between the two OLS regression lines 5 534075 degrees Permutation tests of OLS MA RMA slopes 1 tailed tail corresponding to sign A permutation test of r is equivalent to a permutation test of the OLS slope P perm for SMA NA because the SMA slope cannot be tested Regression results Method Intercept Slope Angle degrees P perm 1 tailed 1 OLS 19 766816 1 869955 61 86337 0 01 2 MA 6 656633 2 3
23. rm the estimated slope as well as its confidence interval limits to the original units by multiplying them by the ratio of the ranges Ymin 3 Tmax Tmin bi b Ymax 4 Recompute the intercept bo and its confidence interval limits using the original centroid z y of the scatter of points and the estimates of the slope b and its confidence limits bo y b 7 4 The RMA slope estimator has several desirable properties when the variables x and y are not expressed in the same units or when the error variances on the two axes differ 1 The slope estimator scales proportionally to the units of the two variables the position of the regression line in the scatter of points remains the same irrespective of any linear change of scale of the variables 2 The estimator is sensitive to the covariance of the variables this is not the case for SMA 3 Finally and contrary to SMA it is possible to test the hypothesis that an RMA slope estimate is equal to a stated value in particular 0 or 1 As in MA the test may be done either by permutation or by comparing the confidence interval of the slope to the hypothetical value of interest Thus whenever MA regression cannot be used because of incommensurable units or because the error variances on the two axes differ RMA regression can be used There is no reason however to use RMA when MA is justified Prior to RMA one should check for the presence of outliers using a scatt
24. tation that are equal to the refer ence value of the statistic plus 1 for the reference value itself 10 PIERRE LEGENDRE e GT is the number of values under permutation that are greater than the reference value 5 3 Cabezon spawning 5 3 1 Input data The following table presents data used by Sokal and Rohlf 1995 Box 14 12 to illustrate model II regression analysis They concern the mass x of unspawned females of a California fish the cabezon Scorpaenichthys marmoratus and the number of eggs they subsequently produced y One may be interested to estimate the functional equation relating the number of eggs to the mass of females before spawning The physical units of the variables are as in the table published by Sokal and Rohlf 1905 546 Since the variables are in different physical units and are estimated with er ror and their distribution is approximately bivariate normal RMA and SMA are appropriate for this example MA is inappropriate The OLS regression line is meaningless in model II regression OLS should only be used for forecasting or prediction It is plotted in Fig 4 only to allow comparison The RMA and SMA regression lines are nearly indistinguishable in this example The slope of RMA can be tested for significance H0 bea 0 however whereas the SMA slope cannot The 95 confidence intervals of the intercepts of RMA and SMA although underestimated include the value 0 as expected if a linear model ap
25. the intercepts are underestimated 3 the predators eagle rays may not be attracted to sampling locations containing few prey bivalves gt data mod2ex2 gt Ex2 res Imodel2 Prey Predators data mod2ex2 relative relative 99 gt Ex2 res Model II regression Call lmodel2 formula Prey Predators data mod2ex2 range y relative range x relative nperm 99 n 20 r 0 8600787 r square 0 7397354 Parametric P values 2 tailed 1 161748e 06 1 tailed 5 808741e 07 Angle between the two OLS regression lines 5 106227 degrees Permutation tests of OLS MA RMA slopes i tailed tail corresponding to sign A permutation test of r is equivalent to a permutation test of the OLS slope MODEL II REGRESSION USER S GUIDE R EDITION 9 Bivalves prey Eagle rays predators FIGURE 3 Scatter diagram of the Example 2 data number of bi valves as a function of the num ber of eagle rays showing the major axis MA standard ma jor axis SMA ordinary 50 least squares OLS and ranged major axis RMA regression lines SMA and RMA are the appropriate re gression lines in this example The cross indicates the centroid of the bivariate distribution The four regression lines pass through this centroid P perm for SMA NA because the SMA slope cannot be tested Regression results Method Intercept OLS 20 02675 MA 13 05968 SMA 16 45205 RMA 17 25651 PWN PE Slope Angle
26. used The method is described below Prior to RMA one should check for the presence of outliers using a scatter diagram of the objects b Standard major axis regression SMA can be used One should first test the significance of the correlation coefficient r to determine if the hypothesis of a relationship is supported No SMA regression equation should be computed when this condition is not met This remains a less than ideal solution since SMA slope estimates can not be tested for significance Confidence intervals should also be used with caution simulations have shown that as the slope departs from 1 the SMA slope estimate is increasingly biased and the confidence interval includes the true value less and less often Even when the slope is near 1 e g example 45 5 the confidence interval is too narrow if n is very small or if the correlation is weak If the distribution is not bivariate normal and the data cannot be trans formed to satisfy that condition e g if the distribution possesses two or several modes one should wonder whether the slope of a regression line is really an adequate model to describe the functional relationship between the two variables Since the distribution is not bivariate normal there seems little reason to apply models such as MA SMA or RMA which primarily describe the first principal component of a bivariate normal distribution So 1 if the relationship is linear OLS is recommended to estimate t

MODEL II REGRESSION USER'S GUIDE, R EDITION

Contents

Download Pdf Manuals

Related Search

Related Contents