AnalysisOfVariance

39
The Analysis of Variance Tokelo Khalema University of the Free State Bloemfontein November 02, 2012

Transcript of AnalysisOfVariance

Page 1: AnalysisOfVariance

The Analysis of Variance

Tokelo KhalemaUniversity of the Free State

Bloemfontein

November 02, 2012

Page 2: AnalysisOfVariance

2

Page 3: AnalysisOfVariance

CHAPTER 1

ANALYSIS OF VARIANCE

1.1 Introduction

The analysis of variance (commonly abbreviated as ANOVA or AOV) is amethod of investigating the variability of means between subsamples resultingfrom some experiment. In its most basic form it is a multi-sample generaliza-tion of the t-test but more complex ANOVA models depart greatly from thetwo-sample t-test. Analysis of variance was first introduced in the context ofagricultural investigations by Sir Ronald A. Fisher (1890–1962), but is nowcommonly used in almost all areas involving scientific research.

1.2 One-way Classification

1.2.1 Normal Theory

Suppose we carry out an experiment on N homogeneous experimental units andobserve the following measurements, y1, y2, . . . , yN . Suppose also that of the Nobservations, J were randomly selected to be taken under the same experimen-tal conditions and that overall, we had I different experimental conditions. Weshall refer to these experimental conditions as treatments. The treatmentscould be any categorical or quantitative variable — species, racial group, levelof caloric intake, dietary regime, blood group, genotype, etc. We therefore seethat we could subdivide the N variates into I groups (or treatments), under eachof which there are J observations. Such an experimental design in which thenumber of observations or measurements per treatment are the same is termeda balanced design. A design need not be balanced.Denote the jth observation under the ith treatment by yij where i = 1, . . . , Iand j = 1, . . . , J . Further, assume that Yij ∼ iidN(µi, σ

2) for all i and j. It

1

Page 4: AnalysisOfVariance

2 Chapter 1. Analysis of Variance

might be helpful to visualize the experimental points as forming an array whoseith column represents the ith treatment and jth row represents the jth ob-servations under all the treatments. Ordinarily, measurements taken on severalhomogeneous experimental units under the same experimental conditions shoulddiffer slightly due to some unexplained measurement errors. We assume thesemeasurement errors to be independent and normally distributed with mean zeroand constant but unknown variance, σ2 <∞. That is εij ∼ iidN(0, σ2) for all iand j. The assumption of zero mean is natural rather than arbitrary because,on average, any deviation from the mean in any population should average outto zero. In the analysis of variance we are interested in the overall variabilityof the µi about the grand population mean µ. This implies a fixed differentialeffect αi = µi− µ (or deviation from the grand mean), due to treatment i. Theabove arguments and assumptions lead us to the following linear model,

Yij = µ+ αi + εij = µi + εij (1.1)

for i = 1, . . . , I and j = 1, . . . , J , which describes the underlying data-generatingprocess. It is easy to show that if αi, as in equation 1.1, is to be interpreted asthe differential effect of the ith treatment, then we have the following constraint,

I∑i=1

αi = 0 . (1.2)

The constraint in equation 1.2 above is termed a model identification con-dition. Without it the model we just formulated is said to be unidentifiable1. Different interpretations of the αi lead to different constraints and differentmodel parametrizations. In the sequel we shall stick to the parametrizationabove. Equation 1.1 is usually referred to as the one-way fixed effects model,or Model I. One-way because the data are classified according to one factor, viz.,treatment and the term “fixed” arises from the fact that we have assumed theαi to be fixed instead of random, in which case we would have had a randomeffects model or Model II. Later we introduce Model II and demonstrate howit can be used in practice.

The null hypothesis in the analysis of variance model given in equation 1.1is that the treatment means are all equal; the alternative is that at least onepair of means is different. That is

H0 :µi = µj ,∀i 6= j (1.3)

HA :µk 6= µl for at least one combination of values k 6= l.

But since µi = µ+ αi, we see that µi = µj where i 6= j implies that αi = 0 forall i. This gives an equivalent form of the hypotheses given above, namely

H0 :αi = 0, ∀i ∈ 1, . . . , I (1.4)

HA :αi 6= 0 for at least one i ∈ 1, . . . , I.1Identifiability is a desirable property of models. A model is called identifiable if all its

parameters can be uniquely estimated and inferences can be drawn from it.

Page 5: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 3

This formulation is more commonly met with and is arguably more intuitive—in words the null hypothesis says that there are no differential effects due totreatments. Or simply, that there are no treatment effects. So any apparentdifferences in sample means is not attributable to the treatments but to randomselection. The alternative hypothesis says that there is at least one treatmentwith a differential effect—the negation of the null hypothesis.

Before we present the mathematical derivations of the analysis of variance, letus consider one practical example of an experiment which should be recognizedas a numerical example of the more general design outlined above. This ex-ample was taken from a classic reference by Sokal and Rohlf (1968) [1]. Sokaltested 25 females of each of three lines of Drosophila for significant differencesin fecundity among the three lines. The first of these lines was was selected forresistance against DDT, the second for susceptibility to DDT, and the third wasa nonselected control strain. This is a balanced design with I = 3 treatments,J = 25 observations per treatment, and should also be recognized as Model Isince the exact nature of the treatments was determined by the experimenter.The data are summarized in table 1.1 in which the response is the number ofeggs laid per female per day for the first 14 days of life.

We might want to compute the treatment sample means as a preliminarycheck on the heterogeneity among group means. Dataset drosophila containsthe data presented in table 1.1. In R we issue the following commands:

> library(khalema)

> data(drosophila)

> attach(drosophila)

> tapply(fecundity,line,mean)

1 2 3

25.256 23.628 33.372

The first three commands should be old news by now. The first loads pack-age khalema, the second accesses dataset drosophila and, the third makes thevariables in drosophila available on the search path. The final command com-putes the sample mean under each of the 3 treatments.Note that the mean under the nonselected treatment is appreciably higher thanthose under the other treatments. Of interest in the analysis of variance iswhether this difference is statistically significant or just a result of noise in thedata.

In deriving a test to investigate the significance of group sample mean dif-ferences we will need some statistics and their corresponding sampling distri-butions. Among these are the overall average and the average under the ithtreatment denoted,

Y.. =

I∑i=1

J∑j=1

Yij/N,

Page 6: AnalysisOfVariance

4 Chapter 1. Analysis of Variance

Resistant Susceptible Nonselected

12.8 38.4 35.421.6 32.9 27.414.8 48.5 19.323.1 20.9 41.834.6 11.6 20.319.7 22.3 37.622.6 30.2 36.929.6 33.4 37.316.4 26.7 28.220.3 39.0 23.429.3 12.8 33.714.9 14.6 29.227.3 12.2 41.722.4 23.1 22.627.5 29.4 40.420.3 16.0 34.438.7 20.1 30.426.4 23.3 14.923.7 22.9 51.826.1 22.5 33.829.5 15.1 37.938.6 31.0 29.544.4 16.9 42.423.2 16.1 36.623.6 10.8 47.4

Table 1.1: Number of eggs laid per female per day for the 1st 14 days of life.

and

Yi. =

J∑j=1

Yij/J,

respectively. Recall that N = IJ is the total number of observations. We definethe following statistic which should be interpreted as summarizing the totalvariability in the sample,

SST =

I∑i=1

J∑j=1

(Yij − Y..)2.

This is called the total sum of squares. But the total variability in a samplecan be partitioned into variability within treatments and variability betweentreatments. In fact, it can easily be shown that

SST = SSB + SSW (1.5)

Page 7: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 5

where

SSB = J

I∑i=1

(Yi. − Y..)2

and

SSW =

I∑i=1

J∑j=1

(Yij − Yi.)2

denote the sum of squares between and the sum of squares within treatmentsrespectively. The statistic SSB summarizes variation in the sample attributableto treatment; SSW summarizes variation attributable to error and is sometimeswritten SSE. Note that under the assumption of homoscedastic variance, eachof the I terms,

J∑j=1

(Yij − Yi.)2/(J − 1),

furnishes an estimate of the error variance, σ2. It is thus reasonable to estimateσ2 by pooling these terms together to obtain the pooled estimate of the commonvariance,

s2p =

1

I(J − 1)

I∑i=1

J∑j=1

(Yij − Yi.)2 =SSW

I(J − 1).

The reader will recall that if Yi ∼ iidN(µ, σ2) for i = 1, . . . , n then,

(n− 1)S2/σ2 ∼ χ2n−1, (1.6)

where

S2 =

n∑i=1

(Yi − Y )2/(n− 1)

denotes the sample variance and

Y =

n∑i=1

Yi/n

the sample mean. This now familiar result will be an important template inproving the following theorem.

Page 8: AnalysisOfVariance

6 Chapter 1. Analysis of Variance

Theorem 1. Under the assumption that the random errors, εij ∼ iidN(0, σ2),for i = 1, . . . , I and j = 1, . . . , J , we have the following results:

1. SST/σ2 =

I∑i=1

J∑j=1

(Yij − Y..)2/σ2 ∼ χ2N−1, if H0 : αi = 0 ∀i is true,

2. SSW/σ2 =

I∑i=1

J∑j=1

(Yij − Yi.)2/σ2 ∼ χ2I(J−1), whether or not H0 is true,

3. SSB/σ2 =

I∑i=1

J∑j=1

(Yi. − Y..)2/σ2 ∼ χ2I−1, if H0 : αi = 0 ∀i is true, and

4. SSW/σ2 and SSB/σ2 are independently distributed.

Proof. To prove the first part of the theorem we note that if H0 is true, thenwe have a common mean µ under each treatment and thus Yij ∼ iidN(µ, σ2) fori = 1, . . . , I and j = 1, . . . , J. Accordingly,

I∑i=1

J∑j=1

(Yij − Y..)2/(N − 1)

denotes the sample variance of a sample of size N = IJ from a N(µ, σ2) popu-lation, hence using the result given in equation 1.6 concludes the proof.

For the second part we note that,

J∑j=1

(Yij − Yi.)2/(J − 1)

denotes the sample variance of the ith treatment, hence, whether or not H0 istrue,

J∑j=1

(Yij − Yi.)2/σ2 ∼ χ2J−1 independently for all i = 1, . . . , I.

Summing all I of these terms and using the property of the sum of independentChi-square random variables yields the stated result.

Further, if H0 is true, the third part results from the subtraction property ofthe Chi-square distribution. Lastly, to proof the independence of...

In addition to the statistics we have defined thus far, it is customary todefine the mean square due to treatment and the mean square due to error as,

MSB = SSB/(I − 1) and,

MSW = SSW/I(J − 1),

Page 9: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 7

respectively. We are now in a position to derive a test for the hypotheses

H0 : αi = 0, ∀i ∈ 1, . . . , I

versus

HA : αi 6= 0 for at least one i ∈ 1, . . . , I.

In the following theorem we use the statistics defined above and their samplingdistributions to derive the generalized likelihood ratio test for H0 and HA.

Theorem 2. The generalized likelihood ratio test statistic for testing the nullhypothesis of no treatment effects as in equation 1.4 is given by:

F =MSB

MSW,

and H0 is rejected at 100(1− α)% if F > F 1−αI−1,I(J−1).

Proof. Recall from our earlier discussion that in addition to some distributionalassumptions we assumed the following:

Yij = µ+ αi + εij ,

where the restrictionI∑i=1

αi = 0

is imposed on the αi. It follows then that, for i = 1, . . . , I and j = 1, . . . , J ,

f(yij) =1

σ√

2πexp

−1

2

(yij − µ− αi

σ

)2

From independence of the yij we have the following likelihood,

L(µ, αi, σ2|y) = (2πσ2)−IJ/2 exp

− 1

2σ2

I∑i=1

J∑j=1

(Yij − µ− αi)2

(1.7)

and log-likelihood

l = logL = −IJ2

log(2πσ2)− 1

2σ2

I∑i=1

J∑j=1

(Yij − µ− αi)2

Under the alternative hypothesis we have the following parameter space,

Ω = (µ, αi, σ2)| −∞ < µ,αi <∞, σ2 > 0.

Page 10: AnalysisOfVariance

8 Chapter 1. Analysis of Variance

Differentiating the log-likelihood with respect to µ and equating the derivativeto zero gives,

∂l

∂µ=

1

σ2

I∑i=1

J∑j=1

(Yij − µ− αi) = 0,

which implies that

µΩ = Y..

Once again we differentiate with respect to αi to obtain,

∂l

∂αi=

1

σ2

J∑j=1

(Yij − µ− αi) = 0.

This yields

αiΩ = Yi. − Y..Finally we differentiate with respect to σ2 and proceed just as we did above.We have,

∂l

∂σ2= − IJ

2σ2+

1

2σ4

I∑i=1

J∑j=1

(Yij − µ− αi)2 = 0,

which gives the following MLE,

σ2Ω = N−1

I∑i=1

J∑j=1

(Yij − Yi.)2

Substituting these estimates into equation 1.7 we have the following likelihoodsupremum under H1,

supΩL(µ, αi, σ

2|y) = exp

−IJ

2

·

IJ

I∑i=1

J∑j=1

(Yij − Yi.)2

−IJ/2

.

Under the null hypothesis we have one less parameter since the αi are hypoth-esised to be zero. The parameter space is,

ω = (µ, σ2)| −∞ < µ <∞, σ2 > 0.

In this case we maximize the following log-likelihood,

l = logL = −IJ2

log(2πσ2)− 1

2σ2

I∑i=1

J∑j=1

(Yij − µ)2.

It is left to the reader to show that the parameter estimates in this case are,

µω = Y..

Page 11: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 9

and

σ2ω = N−1

I∑i=1

J∑j=1

(Yij − Y..)2

The likelihood supremum is then given by,

supωL(µ, σ2|y) = exp

−IJ

2

·

IJ

I∑i=1

J∑j=1

(Yij − Y..)2

−IJ/2

.

After some cancellation and the use of the identity we established earlier, thegeneralized likelihood ratio test statistic takes the following form,

Λ =

supωL

supΩL

=

I∑i=1

J∑j=1

(Yij − Y..)2

I∑i=1

J∑j=1

(Yij − Yi.)2

−N/2

=

I∑i=1

J∑j=1

(Yij − Yi.)2 + J

I∑i=1

(Yi. − Y..)2

I∑i=1

J∑j=1

(Yij − Yi.)2

−N/2

.

The generalized likelihood ratio test rejects H0 for small values of Λ and wesee that small values of Λ correspond to large values of SSB/SSW . That is wereject H0 if

SSBSSW

> k

or if

F =SSB/(I − 1)

SSW/I(J − 1)=

MSB

MSW> k

I(J − 1)

I − 1= c

where c is chosen such that Pr(F > c|H0) = α, the desired type I error. But wehave already derived the null distribution of F from which we have,

c = F 1−αI−1,I(J−1)

or the 100(1−α) percentile of the F -distribution with I−1 and I(J−1) degreesof freedom. This completes of the proof.

The reader who closely followed the foregoing proof should have been aware thatthe likelihood ratio test statistic would not have been arrived at had the iden-tification condition not been taken into account. We see then that inferences

Page 12: AnalysisOfVariance

10 Chapter 1. Analysis of Variance

cannot be drawn from an unidentifiable model. In fact, this is what unidentifi-able means in statistical literature. Cassella & Berger (1992) [2] touch lightlyon model identification.For obvious reasons, the test just derived is called the F -test. We will proceedto demonstrate how it can be applied in practice.

Example 1. Consider the data presented earlier in table 1.1. It is vital to testfor any significant violations of model assumptions before we draw inferences.First let us test the validity of the constant variance assumption. Figure 1.1affords a visual check on the group variances. There is not much reason tobelieve that the constant variance assumption could be unduly flawed. Thedistributions also look reasonably symmetrical, hence normal theory could beapplied safely.

Resistant Susceptible Nonselected

10

15

20

25

30

35

40

45

50

Res

pons

e

Figure 1.1: Side-by-side boxplots for the Drosophila fecundity data.

We proceed with the analysis and calculate the sum of squares, mean squares,and the F -statistic. In R the command to fit the linear model is:

> lm(fecundity~line,drosophila)

And the command,

> anova(lm(fecundity~line,drosophila))

Page 13: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 11

Source of variation df SS MS F p-value

Between 2 1362.2 681.11 8.6657 0.0004Within 72 5659.0 78.60

Total 74 7021.2

Table 1.2: Anova table for the Drosophila fecundity data.

gives the anova table. An anova table compactly summarizes the results of anF -test.

From the table above, the F -statistic is significant at a level of 5%. Say thep-value was not reported, as would be the case if one were not using a computer.Then we would refer to the F table in the appendix, approximate F2,72(.97) byF2,62(.97) and report

p-value = Pr(F ≥ 8.6657) < Pr(F ≥ 3.15) = 5%.

But before we run into conclusions we test the validity of the distributionalassumption of the random errors. To estimate these, we plug in the MLE’s ofµ and αi into equation 1.1 to obtain,

εij = Yij − Y.. − Yi. + Y.. = Yij − Yi.

for i = 1, . . . , I and j = 1, . . . , J . These are termed model residuals. By virtueof the invariance property of maximum likelihood estimates, εij furnishes amaximum likelihood estimate of εij . We are interested in testing whether theseresiduals can be considered as Gaussian white noise. But recall that maximumlikelihood estimates are asymptotically normal. To obtain the residuals in R weissue the command below:

> Residuals <- lm(fecundity~line,drosophila)$residuals

but this is only one of several ways to obtain model residuals in R. A look atfigure 1.2 shows that the residuals are not far from normal. In particular, thehistogram shows a sense of symmetry about zero. Hence we can safely read theanova table and conclude that the F -test conclusively rejects the null hypothesisof no treatment effects. In ordinary parlance this means that of the I = 3 lines,at least one was much more or much less fecund than the rest. Figure 1.1 revealsthat the nonselected line had much more fecundity than the resistant and thesusceptible lines.

At this point we find it worthwhile to interpolate some comments on theassumptions underlying the analysis of variance which should always be bornein mind each time an analysis of variance is carried out. We assume that in themodel given in equation 1.1, we have,

1. normally distributed random errors εij ,

Page 14: AnalysisOfVariance

12 Chapter 1. Analysis of Variance

−2 −1 0 1 2

−20

−10

010

20

Theoretical Quantiles

Ord

ered

Res

idua

ls

Residuals

Fre

quen

cy

−20 0 10 200

510

15

Figure 1.2: A histogram and a normal quantile-quantile plot of the model resid-uals.

2. constant (or homoscedastic) error variance σ2, and

3. independent random errors.

The assumption of normality is not a particularly stringent one. The F -testhas been shown to be robust against mild to moderate departures from normal-ity, especially if the distribution is not saliently skewed. Several good tests ofnormality exist in the literature. The Shapiro-Wilk test is one of those mostcommonly used in practice. Its R directive is shapiro.test() and its nullhypothesis is that the sample comes from a normal parent distribution. Apply-ing this test on the residuals from our previous example we obtain a p-value of0.45. So the Shapiro-Wilk test conclusively accepts the hypothesis of normallydistributed random errors. You will recall from example 1 that we were quitecontent with the validity of the normality assumption from the qq-plot and thehistogram created therein. In examples to follow, we shall stick to the samediagnostic procedure with the hope that any undue departures from normalitywill be noticed by the naked eye and not bother ourselves with carrying out thenormality test.

The problem of heteroscedasticity (or nonconstant variance) has slightly dif-ferent implications depending on whether a design is balanced or otherwise. Inthe former case, slightly lower p-values than actual ones will be reported; in the

Page 15: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 13

latter, higher or lower p-values than actual ones will be reported according aslarge σ2

i are associated with large ni, or large σ2i are associated with small ni

(see Miller (1997) [3] pp. 89-91).While there will usually be remedies to non-normality and heteroscedastic vari-ance, dependence of errors will usually not be amenable to any alternativemethod available to the investigator, at least if it is in the form of serial corre-lation. Dependence due to blocking, on the other hand, can easily be handledby adding an extra parameter to the model to represent the presence of block-ing. We will see later how blocking can purposely be introduced to optimize anexperimental plan. It has been shown (see...) that if there is serial correlationwithin (rather than across) samples, then the significance level of the F -testwill be smaller or larger than desired according as the correlation is negative orpositive. The presence of serial correlation of lag 1 can be detected by visuallyinspecting plots of variate pairs (yij , yi,j+1). The hope should be not to spotany apparent linear relationship between the lagged pairs if the F -test is to beemployed.Outliers can also be a nuisance in applying the F -test. Since the sample meanand variance are not robust against outliers, such outlying observations cangreatly augment the within-group mean square which in turn would render theF−test conservative2. Usually no transformation will remedy the situation ofoutlying observations. One option to deal with outliers would be to use thetrimmed mean in the calculation of the sum of squares. Another is the use ofnonparametric methods. We discuss nonparametric methods in section 1.2.3.

Usually for a design to yield observations that have all three of the charac-teristics enumerated above, the experimenter should ensure random allocationof treatments. That is, experimental units must be allocated at random to thetreatments. Randomization is very critical in all of experimental design. It alsomakes possible the calculation of unbiased estimates of the treatment effects.

One important concept that has thus far only received brief mention is thatof unbalanced designs. If in stead of the same number J of replicates undereach treatment we suppose that we have ni observations under treatment i,where the ni need not be equal, then it can easily be shown that the identity inequation 1.5 becomes

I∑i=1

ni∑j=1

(Yij − Y..)2 =

I∑i=1

ni(Yi. − Y..)2 +

I∑i=1

ni∑j=1

(Yij − Yi.)2.

Otherwise the analysis remains the same as in the balanced design and ananalogous F -test can be derived. The next example, adapted from Snedecor& Cochran (1980) [4], illustrates points we made in the last few paragraphsincluding the possibility of an unbalanced design.

Example 2. For five regions in the United States in 1977, public school ex-penditures per pupil per state were recorded. The data are shown in table 1.3.

2A conservative test is “reluctant” to reject—i.e. it has a smaller type I error than desired.

Page 16: AnalysisOfVariance

14 Chapter 1. Analysis of Variance

South North MountainNortheast Southeast Central Central Pacific

1.33 1.66 1.16 1.74 1.761.26 1.37 1.07 1.78 1.752.33 1.21 1.25 1.39 1.602.10 1.21 1.11 1.28 1.691.44 1.19 1.15 1.88 1.421.55 1.48 1.15 1.27 1.601.89 1.19 1.16 1.67 1.561.88 1.26 1.40 1.241.86 1.30 1.51 1.451.99 1.74 1.35

1.53 1.16

Table 1.3: Public school expenditures per pupil per state (in $1 000).

Otherwise for R users the relevant data-frame is named pupil. The questionof interest is the same old one, namely, are the region to region expendituredifferences statistically significant or are they due to chance alone?

Figure 1.3 shows that the distribution cannot be judged to be very symmet-rical, nor can we be overly optimistic about constant variance. Since overall,there is not too much skewness, it is about the latter that we should be mostworried. No outliers are visible so there really is not much that calls normaltheory into question. The R command for creating the plot in figure 1.3 isplot(expenditure~region,pupil).We seek now for an appropriate variance stabilizing transformation. Since allthe values are nonnegative, we could try the log-transformation, or even thesquare-root transformation. A plot of the log-transformed data is shown infigure 1.4.

The log-transformed distribution does not look vaguely more symmetrical.After a few trials, we finally take the reciprocal of the square of the observations,which yields the plot depicted in figure 1.5.

This time the variance looks reasonably constant across treatments. A littlequestion mark over symmetry remains though. But there is not strong enoughskewness to warrant too much concern. To investigate this further, we create anormal qqplot and a histogram of residuals. These are shown in figure 1.6 fromwhich we see a slight deviation from normality.

But earlier we pointed out that the F -test is not too sensitive to moderatedepartures from normality. The anova table on the transformed response is ob-tained by issuing the command, anova(lm(expenditure^-2~region,pupil))in R, and is shown in table 1.4.

From table 1.4 we see a highly significant F -statistic. That is, strong evidencesuggests that expenditures vary from region to region.

Page 17: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 15

Northeast Southeast S. Central N. Central M. Pacific

1.2

1.4

1.6

1.8

2

2.2

Res

pons

e

Figure 1.3: Side-by-side boxplots for the public school expenditures data.

Source of variation df SS MS F p-value

Between 4 0.78114 0.195285 11.62 0.0000Within 43 0.72263 0.016805

Total 47 1.50377

Table 1.4: Anova table for the expenditures per pupil per state data.

1.2.2 Multiple Comparisons

Despite all its merits, the omnibus F -test is not without deficiencies of its own.From the previous example we concluded that expenditures varied from region toregion. For all we know, such a conclusion could have been reached because onlyone of the regions had a sample mean much greater or less than the rest. Usuallywe would be interested in knowing which pair of groups differ significantly. Thecurrent section addresses this problem by introducing commonly used methodsof multiple comparisons that can be used in lieu of the omnibus F -test, orafter the F -test has rejected the null hypothesis. It was shown earlier that twotreatment means, µi and µi′ , can be concluded to be different at level α if the

Page 18: AnalysisOfVariance

16 Chapter 1. Analysis of Variance

Northeast Southeast S. Central N. Central M. Pacific

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Log−

tran

sfor

med

Res

pons

e

Figure 1.4: Side-by-side boxplots for the log-transformed data.

100(1− α)% confidence interval for their difference,

Yi. − Yi′. ± tν,1−α/2sp√

1

ni+

1

ni′, (1.8)

does not contain zero, or equivalently, if

|Yi. − Yi′.| > tν,1−α/2sp

√1

ni+

1

ni′.

If all k =(I2

)intervals are to be considered as a family, the statement given by

equation 1.8 above does not hold with probability 1 − α; the coverage proba-bility, or as commonly called, the family-wise rate (FWR), will be lower. Forthe special case of ni = ni′ = J , one commonly used remedial measure wasdeveloped by John Tukey. He showed that the variate,

maxi,i′

|(Yi. − µi)− (Yi′. − µi′)|sp/√J

,

follows the so-called Tukey studentized range distribution with parameters I andI(J − 1), where the pooled sample variance s2

p equals the mean square of error.If we denote the 100(1− α) percentile of this distribution by qI,I(J−1)(α), thenwe have the following probability statement,

Page 19: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 17

Northeast Southeast S. Central N. Central M. Pacific

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Rec

ipro

cal−

of−

squa

re−

tran

sfom

ed R

espo

nse

Figure 1.5: Side-by-side boxplots for the reciprocal-of-square-transformed data.

Pr

[maxi,i′|(Yi. − µi)− (Yi′. − µi′)| ≤ qI,I(J−1)(α)sp/

√J

]= 1− α, (1.9)

from which we obtain the following family of confidence intervals of the differ-ences µi − µi′ ,

Yi. − Yi′. ± qI,I(J−1)(α)sp/√J,

with family-wise error rate exactly equal to α. Accordingly, any pair of treat-ment sample means will be significantly different at level α if

|Yi. − Yi′.| > qI,I(J−1)(α)sp/√J.

Methods to deal with unbalanced designs have also been devised. Onemethod that gives very good results despite its crudity is due to Bonferroni.From the Bonferroni equality, it can be shown that to ensure a family-wise er-ror rate of at most α, then each of the k tests of µi = µi′ should be carriedout at significance level α/k. Where N =

∑Ii=1 ni denotes the total number of

observations, we then have the following family of confidence intervals,

Yi. − Yi′. ± tα/2kN−Isp

√1

ni+

1

ni′, where k =

(I

2

),

Page 20: AnalysisOfVariance

18 Chapter 1. Analysis of Variance

−2 −1 0 1 2

−0.

2−

0.1

0.0

0.1

0.2

Theoretical Quantiles

Ord

ered

Res

idua

ls

Residuals

Fre

quen

cy

−0.3 −0.1 0.1 0.30

24

68

1012

14

Figure 1.6: A histogram and a normal quantile-quantile plot of the model resid-uals.

which should have coverage probability of at least 1 − α. We call these theBonferroni confidence intervals. Let us consider an example.

Example 3. Since the previous example dealt with unequal sample sizes, weemploy the Bonferroni method to carry out multiple comparisons. We calculatedthe pooled sample variance to be s2

p = MSE = .017. We also have k = 10comparisons. According to Bonferroni’s method, a pair of sample means (ofsizes ni and ni′) that differ by an absolute amount greater than

.1296× t43(.9975)×√

1

ni+

1

ni′,

will be considered significantly different at level α = .05. Following are Rcommands to compute and compactly display absolute differences of all possiblecombinations of sample means in a 5 by 5 array.

> data(pupil)

> attach(pupil)

> X <- Y <- tapply(expenditure^-2,region,mean)

> diff <- abs(outer(X,Y,"-"));diff

Consider for instance, the Mountain Pacific and North Central regions. Theabsolute value of their sample means difference is 0.033, which is far less than

Page 21: AnalysisOfVariance

1.2. ONE-WAY CLASSIFICATION 19

the critical value of .164. In fact, the 99.75% confidence interval of the differenceof means can be shown to be (−0.130, 0.197) or (−0.197, 0.130), depending onhow the difference is taken. This interval obviously contains zero. So the tworegions’ levels of expenditure cannot be considered to be statistically different.Next, let us consider the Northeast and South Central regions. Their samplemeans differ by an absolute amount of 0.366, which exceeds the critical valueof .176. The corresponding confidence interval is (.190, .543) or (−.543,−.190).The last 8 comparisons can be made similarly. The reader will find that, overall,4 pairs are significantly different, viz., Northeast and South Central, Northeastand Southeast, South Central and Mountain Pacific, and South Central andNorth Central.

Other commonly used multiple comparison methods for unbalanced designsinclude that due to Sheffe and a variant of Tukey’s method which we discussedearlier, called the Tukey-Kramer method. Both give conservative results, as doesthe Bonferroni method. Because of their “conservatism”, one should considerusing Tukey’s method whenever a balanced design is dealt with, which shouldgive shorter confidence intervals. Sheffe’s confidence intervals for the differenceµi − µi′ are given by,

Yi. − Yi′. ± sp√

(I − 1)FαI−1,N−I

√1

ni+

1

ni′,

where FαI−1,N−I denotes the 100(1− α) percentile of the F -distribution with Iand N − I degrees of freedom. On applying Sheffe’s method to the expendituredata, it is striking to see that we reach similar conclusions as those reached inexample 3 above under Bonferroni’s method. But Sheffe’s intervals are signifi-cantly broader.It is still not too clear whether the Tukey-Kramer method gives intervals withcoverage probability of at least 1−α or approximately 1−α. But it too gives re-sults good enough to merit its mention. Confidence intervals under this methodare given by,

Yi. − Yi′. ± qI,N−I(α)sp

√1

2

(1

ni+

1

ni′

).

An abundance of other multiple comparison procedures have been proposed butnot all are good enough to enter the fray.

1.2.3 Nonparametric Methods

If the assumptions underlying the analysis of variance do not hold and no trans-formation is available to make the F -test more applicable, nonparametric meth-ods are often used in stead. The Kruskal-Wallis test is by far the most com-monly used nonparametric analog of the one way analysis of variance. Unlikethe F -test, it makes no distributional assumptions about the observations; forit to be applicable, the observations need only be independent.

Page 22: AnalysisOfVariance

20 Chapter 1. Analysis of Variance

In this method, we denote by Rij , the rank of yij in the combined sample of all

N =∑Ii=1 ni observations. Then define

Ri. =

I∑i=1

Rij/ni,

and

R.. =

I∑i=1

Ri./N,

as the average rank score of the ith sample and the grand rank score, respec-tively. Finally we compute the following statistic,

K =12

N(N + 1)

I∑i=1

ni(Ri. − R..)2

=12

N(N + 1)

(I∑i=1

niR2i.

)− 3(N + 1),

which has been shown to have a limiting χ2 distribution with I − 1 degrees offreedom under the null hypothesis of equal location parameters under each ofthe I groups. The null hypothesis is rejected for large values of K.Just as in the two sample case, tied observations will be assigned average ranks.The K-statistic defined above should perform reasonably well if there are nottoo many ties. Otherwise some correction factor will have to be applied.

Example 4. Table 1.5 presents ranks of the expenditure data from example 2.From these data we calculate a highly significant value of K = 21.83. The Rcommand to compute the p-value is, 1-pchisq(21.83,4).

It is well to realize the sum of squares occurring in the expression for the K-statistic as the between-groups sum of squares in the analysis of variance. Thenthe value of K can easily be calculated by performing the usual analysis of vari-ance on the ranks and then multiplying the between-groups sum of squares by12/N(N + 1).The Kruskal-Wallis test has an implementation in R. However, it will usuallygive a different value for K than that obtained from using the foregoing ex-pression. This is because in calculating the statistic, R uses some weights thatwill make the distribution of the K-statistic as χ2 as possible. Here are the Rcommands and output for the previous example.

> kruskal.test(expenditure,region,data=pupil)

Kruskal-Wallis rank sum test

data: expenditure and region

Kruskal-Wallis chi-squared = 24.0387, df = 4, p-value = 7.846e-05

Page 23: AnalysisOfVariance

1.3. TWO-WAY CLASSIFICATION 21

South North MountainNortheast Southeast Central Central Pacific

18 33 6 36.5 3913.5 20 1 40 3847 9.5 12 21 31.546 9.5 2 16 3524 8 3.5 42.5 2329 26 3.5 15 31.544 8 6 34 30

42.5 13.5 22 1141 17 27 2545 36.5 19

28 6

Table 1.5: Ranks of the Public school expenditures data.

Since the Kruskal-Wallis test works with ranks rather than actual numericalvalues of the observations, it will greatly eliminate the effect of outliers. Inpractice, one will usually resort to this test if there are too many outliers in thedata, if normal theory is not applicable, or if the data are already in the formof ranks.

1.3 Two-way Classification

1.3.1 Introduction

Up to this point we have assumed, at least tacitly, that the experiments wedeal with yield observations that can only be grouped according to one factor.This need not be the case; several factors can be considered simultaneously.For example, consider an experiment in which the amount of milk produced bya hundred cows is studied. It is natural to consider breed and age-group aspossible factors in such a study. There could also be a third, and even a fourthfactor, etc., all of which are considered simultaneously. We introduce hereinmethods of analyzing such experimental designs. We will only treat the case oftwo factors in which case the design is called two-way analysis of variance , butthe reader should, however, be aware that the order of classification is abitrary.In the general case we speak of N -way analysis of variance.

For the ease of reference we shall call the factors with which we deal, factorA and factor B. It is also common in the literature to call these row and columnfactors. It is natural then to speak of a treatment/column or row effect accordingas the effect due to factor A or that due to factor B is referred to. Treatmentand row effects are also referred to as main efects to distinguish them from theso-called interaction effect. We explain what interaction means shortly.

Page 24: AnalysisOfVariance

22 Chapter 1. Analysis of Variance

1.3.2 Normal Theory

The analysis in the two-way classification departs slightly from that in the one-way classification as more variables come into play. In particular, the...occassionsthe need to extend our notation from the previous sections. If we assume thatfactor A has I levels and factor B has J levels and that in the cell determinedby level i of factor A and level j of factor B there are k observations (or repli-cations), then we use yijk to symbolize the kth observation under such a cell.If each of factors A and B contributes to the response variable an amount inde-pendent of that contributed by the other, the model is termed as an additivemodel and is formulated,

Yijk = µ+ αi + βj + εijk, (1.10)

with identification conditions,

I∑i=1

αi = 0,

andJ∑j=1

βj = 0,

where i = 1, . . . , I and j = 1, . . . , J . Just as before, the random errors, εijk, areassumed to be independently and identically normally distributed about zeromean with constant variance σ2.

If the contribution to the response variable by factor A depends on thelevel of factor B, or conversely, then the simple additive model is not totallyrepresentative of the design and a phenomenon called interaction is said toexist. We introduce another variable, %ij , that will represent this interactioneffect. Hence for example, %23 will be negative or positive according as factorsA and B have opposing or synergistic effects under level 2 of factor A and level3 of factor B. This full model which takes interaction into account is given by,

Yijk = µ+ αi + βj + %ij + εijk, (1.11)

with identification conditions,

I∑i=1

αi = 0,

J∑j=1

βj = 0,

andI∑i=1

%ij =

J∑j=1

%ij = 0,

Page 25: AnalysisOfVariance

1.3. TWO-WAY CLASSIFICATION 23

where i = 1, . . . , I and j = 1, . . . , J .In addition to testing the significance of the main effects in two-way analysis

of variance (or any factorial anova for that matter), there is need to also testfor interaction effects. We thus have a total of three null hypotheses to test.In dealing with many null hypotheses we will have reason to vary our usualnotation. Specifically, we superscript each null hypothesis with a naught toavoid confusing HA for an alternative hypothesis, for instance. That is the nomain effects null hypotheses are denoted,

H0A : αi = 0 ∀i ∈ 1, . . . , I,

andH0B : βj = 0 ∀j ∈ 1, . . . , J,

and the no interaction effect null hypothesis is written,

H0I : %ij = 0 for all combinations of i and j.

In anticipation of their need ahead, we give expressions for the sums ofsquares, which are a little more involved than those in the one-way layout.Also, some identities and statistics other than the sum of sqaures which willprovide tests of the hypotheses stated above will be derived just as we did inthe one-way layout.

The next theorem constructs a generalized likelihood ratio test for H0A, H0

B ,and H0

I .

Theorem 3. The generalized likelihood ratio test statistics for testing the nullhypotheses of no main and interaction effects are given by:

1.

FA =MSA

MSE,

where H0A is rejected at 100(1− α)% if FA > F 1−α

I−1,IJ(K−1),

2.

FB =MSB

MSE,

where H0B is rejected at 100(1− α)% if FB > F 1−α

J−1,IJ(K−1), and

3.

FI =MSI

MSE,

where H0I is rejected at 100(1− α)% if FI > F 1−α

(I−1)(J−1),IJ(K−1).

Proof. Since a complete proof to each part of the theorem can easily span twoand half pages, we will proof the first part and leave the last two to the reader.We have for i = 1, . . . , I, j = 1, . . . , J , and k = 1, . . . ,K,

f(yijk) =1

σ√

2πexp

−1

2

(yijk − µ− αi − βj − %ij

σ

)2.

Page 26: AnalysisOfVariance

24 Chapter 1. Analysis of Variance

Thus the likelihood is given by

L(µ, αi, βj , %ij , σ2|y) =(2πσ2)−IJK/2×

exp

−1

2

I∑i=1

J∑j=1

K∑k=1

(Yijk − µ− αi − βj − %ij

σ

)2,

from the assumption of independence. For ease of maximization we use thelog-likelihood,

l = logL = −IJK2

log (2πσ2)− 1

2σ2

I∑i=1

J∑j=1

K∑k=1

(Yijk − µ− αi − βj − %ij)2.

The parameter space under the general alternative hypothesis which states thatall effects are non-zero is denoted,

Ω = (µ, αi, βj , %ij , σ2)| −∞ < µ,αi, βj , %ij <∞, σ2 > 0 .

Proceeding to find the ML estimates under Ω we have,

∂l

∂µ=

1

σ2

I∑i=1

J∑j=1

K∑k=1

(Yijk − µ− αi − βj − %ij) = 0,

which implies that µΩ = Y.... Similarly, it is easily verified that

∂l

∂αi=

1

σ2

J∑j=1

K∑k=1

(Yijk − µ− αi − βj − %ij) = 0

implies αiΩ = Yi.. − Y....

∂l

∂βj=

1

σ2

I∑i=1

K∑k=1

(Yijk − µ− αi − βj − %ij) = 0,

yields βiΩ = Y.j. − Y.... Likewise,

∂l

∂%ij=

1

σ2

K∑k=1

(Yijk − µ− αi − βj − %ij) = 0

implies %ijΩ = Yij. − Yi.. − Y.j. + Y.... Finally

∂l

∂σ2= −IJK

2σ2+

1

2σ4

I∑i=1

J∑j=1

K∑k=1

(Yijk − µ− αi − βj − %ij)2 = 0

yields

σ2Ω = N−1

I∑i=1

J∑j=1

K∑k=1

(Yijk − Yij.)2.

Page 27: AnalysisOfVariance

1.3. TWO-WAY CLASSIFICATION 25

These give an expression for the supremum of the likelihood under Ω, namely

supΩL(µ, αi, βj , %ij , σ

2) = exp

−IJK

2

·

IJK

I∑i=1

J∑j=1

K∑k=1

(Yijk − Yij.)2

−IJK/2

.

Under HA, the parameter space is given by

ωA = (µ, βj , σ2)| −∞ < µ, βj <∞, σ2 > 0 .

Similar arguments give the following expression for the supremum of the likeli-hood,

supωA

L(µ, βj , %ij , σ2|y) = exp

−IJK

2

·

IJK

I∑i=1

J∑j=1

K∑k=1

(Yijk − Y.j.)2

−IJK/2

Hence the generalized likelihood ratio is given by,

ΛA =

supωA

L

supΩL

=

I∑i=1

J∑j=1

K∑k=1

(Yijk − Y.j.)2

I∑i=1

J∑j=1

K∑k=1

(Yijk − Yij.)2

−IJK/2

=

I∑i=1

J∑j=1

K∑k=1

(Yijk − Yij.)2 + JK

I∑i=1

(Yi.. − Y...)2

I∑i=1

J∑j=1

K∑k=1

(Yijk − Yij.)2

−IJK/2

The generalized likelihood ratio test then rejectsH0A for large values of SSA/SSW .

That is we reject H0A if

SSA

SSW> k,

or equivalently, if

FA =SSA/(I − 1)

SSW/IJ(K − 1)=

MSA

MSW> k

IJ(K − 1)

I − 1= c

where c is chosen such that Pr(F > c|H0A) = α. From the distribution of

this F -statistic, which we derived earlier, it is immediately evident that c =F 1−αI−1,IJ(K−1). This completes the proof to the first part of the theorem. By

noting that similar restrictions have been imposed on the βj as on the αi,one will note that it is not necessary to construct the proof to part 2 ab initio.

Page 28: AnalysisOfVariance

26 Chapter 1. Analysis of Variance

However, he need only permute some subscripts and use the appropriate degreesof freedom to complete the proof. But the reader who feels unsated with theproposed logic should convince himself by going through all the steps. The proofto the last part can be completed similarly to the proof just presented and isleft as an exercise.

Example 5. In an experiment to test 3 types of adhesive, 45 glass to glassspecimens were set up in 3 different types of assemblies and tested for tensilestrength. The types of adhesive were, 047, 00T, and 001 and the types of assem-blies were cross-lap, square-center, and round-center. Each of the 45 entries oftable 1.6 represents the recorded tensile strength of the glass to glass assemblies[data from Johnson and Leone [5]]. These data can be found under datasetglass under this book’s package.

Glass-Glass Assembly

Adhesive Cross-Lap Square-Centre Round-Center

047 16 17 1314 23 1919 20 1418 16 1719 14 21

00T 23 24 2418 20 2121 12 2520 21 2921 17 24

001 27 14 1728 26 1814 14 1326 28 1617 27 18

Table 1.6: Table of bond strength of glass-glass assembly.

Figure 1.7 shows slight symmetry, no outliers, and not enough violation of theconstant variance assumption to warrant suspicion. Not the exact same can besaid about figure 1.8 which calls the constant variance assumption into question.At least by now we know the risk entailed by blatantly ignoring such a clearindication of heteroscedasticity. R commands to view both figures at the sametime follow.

> data(glass)

> par(mfrow=c(1,2))

> plot(strength~adhesive+assembly)

Page 29: AnalysisOfVariance

1.3. TWO-WAY CLASSIFICATION 27

Cross−lap Square−center Round−center

12

14

16

18

20

22

24

26

28

Res

pons

e

Figure 1.7: Boxplots for the glass data plotted according to assembly type.

Page 30: AnalysisOfVariance

28 Chapter 1. Analysis of Variance

047 00T 001

12

14

16

18

20

22

24

26

28

Res

pons

e

Figure 1.8: Boxplots for the glass data plotted according to adhesive type.

Fitting a model to the raw (i.e. untransformed) data gives significant results foradhesives and interactions. But we might want to think twice before concludingthat these factors are indeed significant. To this end, we seek a transformation tostabilize the variance. The square-root transformation seems to work reasonablywell for us, but it is seen to greatly upset normality. Boxplots for the transformeddata are not shown for purposes of space. A histogram and qqplot of residualsare shown in figure 1.9.The histogram shows a slightly ragged character, a long and fat left tail, andlack of symmetry. The qqplot also shows gross departure from linearity. Thefollowing set of commands will create figure 1.9.

> m2 <- lm(strength^.5~adhesive*assembly,glass)

> r <- m2$resid

> par(mfrow=c(1,2))

> qqnorm(r,ylab="Ordered Residuals",main="");qqline(r,col=2)

> hist(r,xlab="Residuals",main="")

The Shapiro-Wilk test of normality shows that we have not lost much; it givesa p-value of 0.084, while the untransformed variable has a (slightly higher)p-value of 0.158. We also know that the F -test is robust against departuresfrom normality. We therefore accept the square-root transformation as a goodcompromise. Table 1.7 summarizes the results of fitting a linear model to the

Page 31: AnalysisOfVariance

1.3. TWO-WAY CLASSIFICATION 29

−2 −1 0 1 2

−1.

0−

0.5

0.0

0.5

Theoretical Quantiles

Ord

ered

Res

idua

ls

Residuals

Fre

quen

cy

−1.0 0.0 0.5

02

46

8

Figure 1.9: A histogram and a normal quantile-quantile plot of the model resid-uals.

transformed variable. As you should have expected, the p-values are slightlylower but the adhesives are still significant. The interactions on the other handare slightly short of the 5% significance level. We conclude that the type ofadhesive influences bond strength while the type of assembly does not.

Source of variation df SS MS F p-value

Adhesive 2 1.5682 0.78409 3.4548 0.0424assembly 2 0.0749 0.03745 0.1650 0.84854Interaction 4 2.3003 0.57509 2.5339 0.05699Within 36 8.1704 0.22696

Total 44 12.1138

Table 1.7: Anova table for the glass to glass assembly data.

Page 32: AnalysisOfVariance

30 Chapter 1. Analysis of Variance

1.3.3 Multiple Comparisons

1.3.4 Randomized Complete Blocks

Randomized blocks are a form of unreplicated two-way analysis of variance inwhich the two factors forming the design are the treatment and another factorknown to have an effect on the response under investigation. This second factoris called a block. Each block is assigned all treatments at random in such away that within each block, each treatment appears once and only once. Ablock effect is rarely tested in practice; of primary interest is the treatmenteffect since the blocks are, by assumption, already expected to have an effect.Randomized blocks were first developed for agricultural experiments and muchof the terminology has remained unchanged. The term “block” was traditionallyunderstood to refer to a block of land, but with the wide appreciation andpopularity of randomized complete blocks over the years, it is now used to referto any factor that plays an analogous role in more recent adaptations of suchexperiments.

In a study to compare the effects of I fertilizers (or treatments in the moregeneral case) on the yield, J blocks of land are subdivided into I homogeneousplots and the fertilizers are allocated at random to these plots. This is a classicalproblem for which the method of randomized complete blocks was developed.Other uses of this design can be found in several other fields.The statistical model for randomized complete block design is,

Yij = µ+ αi + βj + εij ,

where

I∑i=1

αi =

J∑j=1

βj = 0.

The sums of squares are the same as those under the two-way additive modelbut with K = 1. The null hypotheses of the no treatment and no block effectsare,

H0A : αi = 0 ∀i ∈ 1, . . . , I,

and

H0B : βj = 0 ∀j ∈ 1, . . . , J,

respectively. But remember that only the former is of interest. In the fertilizerexperiment presented above, an experimenter will hardly be as interested inwhether block A was the most productive as he would be in whether fertilizerII yielded the most crop.

Page 33: AnalysisOfVariance

1.4. LATIN SQUARES 31

Theorem 4. The generalized likelihood ratio test statistics for testing the nullhypotheses of no treatment and block effects are given by:

1.

FA =MSA

MSI,

where H0A is rejected at 100(1− α)% if FA > F 1−α

I−1,(I−1)(J−1), and

2.

FB =MSB

MSI,

where H0B is rejected at 100(1− α)% if FB > F 1−α

J−1,(I−1)(J−1).

Proof. The details of this proof are left to the reader.

1.4 Latin Squares

Latin squares arise as natural extensions of randomized complete blocks—theyare a form of three-way analysis of variance without replication. If heterogeneityis known to be two-dimensional in some investigation, then two blocking factorscan be incorporated in an unreplicated design, effectively forming a square withN row blocks and N column blocks. We then speak of a row effect, a columneffect, and a treatment effect. But as in randomized blocks, it is only the latterthat will be of concern to the investigator. These designs have found wideapplication in industry because of their optimality and impressive performance.

A prototype of a Latin square design is an experiment in which a fertilizer(i.e. the treatment) is to be tested at N levels on a field that is known to varyin intrinsic fertility, say, in a north-south direction and in soil depth, say, inan east-west direction. The field is then subdivided to form an N × N arrayof subplots and the fertilizers are randomly allocated to the subplots in bothdirections in such a manner that all N levels of the treatment occur once andonly once in either direction.Let τi denote the differential effect of the ith row block, βj the differential effectof the jth column block and, γk the differential effect of the kth treatment.Then the statistical model is

Yijk = µ+ τi + βj + γk + εijk, (1.12)

where

N∑i=1

τi =

N∑j=1

βj =

N∑k=1

γk = 0.

Page 34: AnalysisOfVariance

32 Chapter 1. Analysis of Variance

Theorem 5. If we assume that the random errors, εijk ∼ iidN(0, σ2), for i =1, . . . , N , j = 1, . . . , N , and k = 1, . . . , N , then we have the following results:

1. SST/σ2 =

N∑i=1

N∑j=1

N∑k=1

(Yijk − Y...)2 ∼ χ2N2−1

2. SSA/σ2 = N2

N∑i=1

(Yi.. − Y...)2/σ2 ∼ χ2N−1

3. SSB/σ2 = N2

N∑j=1

(Y.j. − Y...)2/σ2 ∼ χ2N−1

4. SSC/σ2 = N2

N∑j=1

(Y..k − Y...)2/σ2 ∼ χ2N−1

5. SSE/σ2 =

N∑i=1

N∑j=1

N∑k=1

(Yijk − Yi.. − Y.j. − Y..k + 2Y...)2 ∼ χ2

(N−1)(N−2)

6. The above variates are mutually independent.

Proof. We present the proof shortly....

Theorem 6. The generalized likelihood ratio test statistics for testing the nullhypotheses of no row, no column and no treatment effects are given by:

1.

FA =MSA

MSE,

where H0A is rejected at 100(1− α)% if FA > F 1−α

N−1,(N−1)(N−2),

2.

FB =MSB

MSE,

where H0B is rejected at 100(1− α)% if FB > F 1−α

N−1,(N−1)(N−2), and

3.

FC =MSB

MSE,

where H0C is rejected at 100(1− α)% if FC > F 1−α

N−1,(N−1)(N−2).

Proof. From the statistical model given in equation 1.12 we have,

f(yijk) =1

σ√

2πexp

−1

2

(yijk − µ− τi − βj − γk

σ

)2.

Page 35: AnalysisOfVariance

1.4. LATIN SQUARES 33

The likelihood function takes the form,

L(µ, τi, βj , γk, σ2|y) = (2πσ2)−N

3/2 exp

−1

2

N∑i=1

N∑j=1

N∑k=1

(Yijk − µ− τi − βj − γk

σ

)2.

Then we have,

l = logL = −N3

2log(2πσ2)− 1

2σ2

N∑i=1

N∑j=1

N∑k=1

(Yijk − µ− τi − βj − γk)2.

Under the hypothesis of no effects, we have the following parameter space.

Ω = (µ, τi, βj , γk, σ2)| −∞ < µ, τi, βj , γk <∞, σ2 > 0

The maximum likelihood estimates are obtained in the usual way. From

∂l

∂µ=

1

σ2

N∑i=1

N∑j=1

N∑k=1

(Yijk − µ− τi − βj − γk) = 0,

we have µΩ = Y....

∂l

∂τi=

1

σ2

N∑j=1

N∑k=1

(Yijk − µ− τi − βj − γk) = 0,

∂l

∂βj=

1

σ2

N∑i=1

N∑k=1

(Yijk − µ− τi − βj − γk) = 0,

∂l

∂γk=

1

σ2

N∑i=1

N∑j=1

(Yijk − µ− τi − βj − γk) = 0,

Finally,

∂l

∂σ2= −N

3

2σ2+

1

2σ4

N∑i=1

N∑j=1

N∑k=1

(Yijk − µ− τi − βj − γk)2 = 0,

gives,

σ2Ω = N−3

N∑i=1

N∑j=1

N∑k=1

(Yijk − Yi.. − Y.j. − Y..k + 2Y...)2.

Putting everything together we obtain,

supΩL = exp

−N

3

2

·

N3

N∑i=1

N∑j=1

N∑k=1

(Yijk − Yi.. − Y.j. − Y..k + 2Y...)2

−N3/2

.

Page 36: AnalysisOfVariance

34 Chapter 1. Analysis of Variance

The parameter space under the first null hypothesis HA is,

ωA = (µ, βj , γk, σ2)| −∞ < µ, βj , γk <∞, σ2 > 0.

Similar arguments to those above give the following supremum under ωA,

supωA

L = exp

−N

3

2

·

N3

N∑i=1

N∑j=1

N∑k=1

(Yijk − Y.j. − Y..k + Y...)2

−N3/2

.

ΛA =

supωA

L

supΩL

=

N∑i=1

N∑j=1

N∑k=1

(Yijk − Y.j. − Y..k + Y...)2

N∑i=1

N∑j=1

N∑k=1

(Yijk − Yi.. − Y.j. − Y..k + 2Y...)2

−N3/2

It can be shown that the numerator sum of squares can be decomposed to give,N∑i=1

N∑j=1

N∑k=1

(Yi.. − Y...)2 +

N∑i=1

N∑j=1

N∑k=1

(Yijk − Y.j. − Y..k + Y...)2

N∑i=1

N∑j=1

N∑k=1

(Yijk − Yi.. − Y.j. − Y..k + 2Y...)2

−N3/2

.

The term in brackets simplifies to 1+SSA/SSE, hence the generalized likelihoodratio test rejects HA for large values of SSA/SSE, or equivalently, if

FA =SSA/(N − 1)

SSE/(N − 1)(N − 2)=MSA

MSE> c ,

where it is easily verified that

c = F 1−αN−1,(N−1)(N−2).

Example 6.

1.5 Summary and Addenda

Page 37: AnalysisOfVariance

1.6. EXERCISES 35

Source of variation df SS MS F p-value

Carbon Grade 4 1787.4 446.8 2.3894 0.10888pH 4 14165.4 3541.3 18.9370 0.00004Quantity 4 3194.6 798.6 4.2706 0.02233Residuals 12 2244.1 187.0

Total 24 21391.5

Table 1.8: Anova table for the purification process data.

1.6 Exercises

1. Show that...Just a template

2. Given thatYijk = µ+ αi + βj + %ij + εijk,

show that...

Page 38: AnalysisOfVariance

36 Chapter 1. Analysis of Variance

Page 39: AnalysisOfVariance

BIBLIOGRAPHY

[1] Sokal, R. R., and Rohlf, F. J. (1968). Biometry: The principles and practiceof statistics in biological research. Freeman

[2] Cassella, B. and Berger, R. L. (1992). Statistical Inference. Duxbury

[3] Miller, R. G., Jr. (1997). Beyond Anova: Basics of applied statistics. Chap-man & Hall

[4] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods. Iowa State

[5] Johnson, N. L. and Leone, F. C. (1964). Statistics and Experimental Design:in Engineering and the Physical Sciences. volume II. Wiley

[6] Development Core Team (2012). R: A language and environment for statis-tical computing. R Foundation for Statistical Computing, Vienna, Austria.ISBN 3-900051-07-0, URLhttp://www.R-project.org/.

37