garcia-berthou

download garcia-berthou

of 4

Transcript of garcia-berthou

  • 8/7/2019 garcia-berthou

    1/4

    Journal of Animal

    Ecology 200170, 708 711

    2001 BritishEcological Society

    Blackwell Science, LtdOxford, UKJAEJournal of Animal Ecology0021-8790British Ecological Society, 20017042001524Misuse of regression residualsE. Garca-Berthou

    FORUM000000Graphicraft Limited, Hong Kong

    On the misuse of residuals in ecology: testing regression

    residuals vs. the analysis of covariance

    EMILI GARCA-BERTHOUDepartament de Cincies Ambientals & Institut dEcologia Aqutica, Universitat de Girona, E-17071 Girona,

    Catalonia, Spain

    Summary

    1. An analysis of variance () or other linear models of the residuals of a simplelinear regression is being increasingly used in ecology to compare two or more groups.Such a procedure (hereafter, residual index) was used in 8% and 2% of the paperspublished during 1999 in the Journal of Animal Ecology and in Ecology, respectively, andhas been recently recommended for studying condition.2. Although the residual index is similar to an analysis of covariance (), it is notidentical and is incorrect for at least four reasons:(i) the regression coefficient used by the residual index differs from the one used in

    and is not the least-squares estimator of the model.(ii) in contrast to the , the error d.f. in the residual index are overestimatedbecause of the estimation of the regression coefficient.(iii) the residual index also assumes the homogeneity of regression coefficients (par-allelism assumption), which should be tested with a special design.(iv) even if the assumptions of the linear model hold for the original variables, they willnot hold for the residuals.3. More importantly, the residual index is an ad hoc sequential procedure with nostatistical justification, unlike the well-known . For these reasons, I suggest thata t-test or an of the residuals should never be used in place of an to studycondition or any other variable.

    Key-words: analysis of variance,, condition factor, regression analysis, statisticalanalysis.

    Journal of Animal Ecology (2001) 70, 708 711

    Introduction

    In ecology and many other disciplines, we often aim tocompare a response variable among several treatmentsor groups after removing the effect of a third, concom-itant variable. An example is the comparison of bodycondition among several animal populations or groups.

    Body condition is often measured as the weight (mass)relative to the length of the animal. Weight and lengthare strongly correlated and the concept of body condi-tion emerges as the plumpness, fatness, or wellbeing ofthe animal after partiall ing out that correlation, i.e. thesize effect. Similar questions in ecology are the com-parisons of richnessarea relationships among regions,

    of root: shoot ratios of plants among several treat-ments, and of C : N ratios among sites. These types ofquestion were dealt with in about 25% (27 of 99 and 44 of227) of ecological papers (Table 1). A classical difficultyencountered in these and many other contexts is how toremove the size effect or concomitant variable. Threemain statistical procedures have been applied to this end.

    The most primitive solution to remove the size effectis to compute a ratio by dividing the response variable(Y) by the concomitant variable (X) or some power ofit (see comprehensive review in Albrecht, Gelvin &Hartman 1993). Among the simplest forms are a simpleratio (Y/X) (Albrecht et al. 1993) or Fultons conditionfactor (Y/X3) in the fish literature (Garca-Berthou &Moreno-Amich 1993). These ratios are then analysedwith general linear models, e.g. comparing differentgroups with a t-test or . Although this proced-ure has been criticized unanimously (see e.g. Atchley,Gaskins & Anderson 1976; Packard & Boardman 1988;

    Correspondence: Dr E. Garca-Berthou, Departament de CinciesAmbientals & Institut dEcologia Aqutica, Universitat deGirona, E-17071 Girona, Catalonia, Spain. Tel: + 34 972418 467.Fax: + 34 972418 150. E-mail: [email protected]

  • 8/7/2019 garcia-berthou

    2/4

    709

    Misuse of

    regression residuals

    2001 BritishEcological Society,Journal of Animal

    Ecology, 70,708711

    Jackson, Harvey & Somers 1990; Albrechtet al. 1993),it is still sometimes used in ecology (Table 1).

    The analysis of covariance () has been pointedto repeatedly as a superior alternative to conditionfactors (see, e.g. Le Cren 1951; Garca-Berthou &

    Moreno-Amich 1993) and similar ratios (Packard &Boardman 1988; Jackson et al. 1990). is usedwidely in ecology, appearing in 22% and 12% of thepapers published during 1999 in the Journal of AnimalEcology and in Ecology, respectively (Table 1).

    A third procedure has appeared recently in the eco-logical literature. This procedure (hereafter residualindex) was recommended recently by Jakob, Marshall& Uetz (1996) for studying condition. The residualindex is computed for each individual (experimentalor sampling unit) as the residual from the simple linearregression of volume or mass (appropriately transformed)

    on the length variable (see below for the formula andFig. 1 for an illustration). These residuals are thencompared among groups with a t-test or an analysis ofvariance (). This procedure was used in 8% and22% of the 1999 papers of the Journal of AnimalEcology and Ecology, respectively, particularly for thestudy of condition (Table 1). The objectives of this paperare: (1) to point out several pitfalls of the residual index,and (2) to emphasize the use of as the correctalternative.

    The residual index vs. the ANCOVA

    The procedure of the residual index is conceptuallysimilar to the standard design of the one-factor. The model of this design is:

    Yij=+i+within (Xij Xi) + ij,

    where Yij are the values of the response (dependent)variable for the jth subject of the ith group, Xij is thecovariate (concomitant variable), is the grand mean(of the variable), i is the Y-intercept (or treatmenteffect) for the ith group, within is the slope, Xi is thecovariate mean for the ith group, and ij is the random

    deviation (see, e.g. Sokal & Rohlf 1995: 503). As sug-gested by Porter & Raudenbush (1987) and Winer, Brown& Michels (1991), this model can be rewritten as:

    Yijwithin (XijXi) =i+ij.

    The term on the right of this expression is similar tothat of a one-factor model (Yij= +i+ij),whereas the term on the left is similar to the residualindex (see, e.g. Jakob et al. 1996):

    Table 1. Number of articles (and first page of the articles, between parentheses; = not given) using certain statistical tests tostudy body condition or other variables in volume 68 of the Journal of Animal Ecology (1999) and volume 80 of Ecology (1999).Total number of papers was 99 and 227, respectively (excluding 4 Forum and 3 Comments papers, respectively, in which nostatistical analyses were given)

    Journal of Animal Ecology Ecology

    Statistical method used Condition Other variables Condition Other variables

    Statistical test of a ratio 0 1 (698) 1 (2793) 12 () 2 (571, 753) 20 () 2 (989, 1289) 25 ()

    Statistical test of regression residuals 4 (73, 571, 595, 753) 4 (205, 726, 815, 1010) 2 (989, 1267) 3 (735, 806, 1522)Any of the above 4 24 4 40

    Papers studying the body condition (i.e. mass relative to length) of animals. Of the seven papers studying condition and not using, five used and two used simple linear regression. Including multiple linear regression with dummy variables (seetext). It is not the sum of papers because more than one method was used in a few papers. Combining papers studying conditionand other variables, the figures for any of the above are 27 for the Journal of Animal Ecology and 44 for Ecology.

    Fig. 1. Hypothetical example of a study of condition withheterogeneity of slopes. Top: weightlength data. The lengthvalues are 2, 3, 4, and 5 and the weight values are 2, 3, 4, and5 for the control and 1, 3, 5 and 8 for the treatment group. Thevariables are supposed to have been log-transformed to fit theconventional weightlength relationship (Y= aXb). Simplelinear regressions for each group are shown. Bottom: theresidual index plotted with length. The residual index iscomputed as the deviation of the weight datum from theoverall regression line (pooling the two groups). +, Treatment;

    , control.

  • 8/7/2019 garcia-berthou

    3/4

    710

    E. Garca-Berthou

    2001 BritishEcological Society,Journal of Animal

    Ecology, 70,708711

    residual index =YijabXij=YijYb(XijX),

    since a = YbX, where a and b are the intercept andregression coefficient, respectively, of the simple linearregression. Although the is thus quite similarto an of the residuals or to an of theadjusted values (Winer et al. 1991: 747), it is not iden-tical and yields a different statistical result for a numberof reasons (see Maxwell, Delaney & Manheimer 1985for a comprehensive discussion). First, standard uses a pooled regression coefficient within groups (bwithin) for adjusting the data (Sokal & Rohlf 1995:507). This b within is a weighted average of the regres-sion coefficients for each group (Keppel 1991: 313) andgenerally differs from the conventional regression coef-ficient found on pooling the data, i.e. ignoring thegroups, termed overall b by Winer et al. (1991: 757).This overall b is the coefficient used by the residualindex. The b within is the least-squares estimator inthemodel, whereas the overall b is the estimatorin the linear regression model (ignoring the groups) butshould not be used if the objective is to compare the

    groups (Maxwell et al. 1985; Winer et al. 1991).Another important difference is that in the standard

    the error degrees of freedom (d.f.) are N-k-1,where Nis the total sample size and kis the number ofgroups (Sokal & Rohlf 1995: 509). In contrast, in a t-testor of the residual index the error d.f. are N-kalthough they should be reduced by one, as was per-formed in a similar procedure by Packard & Boardman(1988), due to the fact that the slope is estimated fromthe same data (Maxwell et al. 1985; Keppel 1991: 312;Winer et al. 1991: 749). The different d.f. affect the errormean square and hence the Fand P-values. Without

    considering the different regression coefficients, an over-estimated error d.f. implies an underestimated errormean square and hence a higher Type I error rate.

    Some numerical examples will illustrate these differ-ences. For instance, Sokal & Rohlf (1995: 512) describethoroughly an example of the standard design ofin which the result for the factor (among groups) isF3,16= 19598, P < 00005. An of the residualindex for the same data set (without correcting thedegrees of freedom) would result in F3,17= 396, P