Stat 502 Design and Analysis of Experiments One-Factor...

Stat 502Design and Analysis of Experiments

One-Factor ANOVA

Fritz Scholz

Department of Statistics, University of Washington

January 7, 2014

1

One-Factor ANOVA

ANOVA is an acronym for Analysis of Variance.

The primary focus is the di�erence in means of severalpopulations or the di�erence in mean response under severaltreatments

Variance in ANOVA alludes to the analysis technique.

The overall data variation is decomposed into several variationcomponents.

How much of that variation is due to changing the sampledpopulation or changing the treatment?

How much variation cannot not be attributed to suchsystematic changes?

2

ANOVA Illustrated

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

mea

sure

men

ts

2530

35

full data set

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

grand mean

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

samplemeans

residual variation

broken down intoindividual samples

decomposedvariation

3

The Notion of Factor in One-Factor ANOVA

It is di�cult to explain the notion of 2-dimensional space tosomeone who has lived only in 1-dimensional space, or3-dimensional space to someone who lives in �atland or4-dimensional space to us in the �real� 3-dimensional world.

The term Factor similarly alludes to di�erent possibledirections/dimensions in which changes can take place inpopulations or in treatments.

Example: In soldering circuit boards we could have severaltypes of �ux (say 3) and also several methods of cleaning theboards (say 4).

Combining each with each, we thus could have 3× 4 = 12distinct treatments.

However, it is more enlightening to view the e�ects of �ux andcleaning method separately. Each would be called a factor, the�ux factor and the cleaning factor.

We can then ask which factor is responsible for changes in themean response.

4

More Than 2 Treatments or Populations

Again we deal with circuit boards.

Now we investigate 3 types of �uxes: X, Y, Z.

With 18 circuit boards, randomly assign each �ux to 6 boards.

In principle, this gives us the randomization referencedistribution and thus a logical basis for a test of the hypothesisH0 : no �ux di�erences.

Randomize the order of soldering/cleaning, coating, andhumidity chamber slots. These randomizations avoidunintended biases from hidden factors (dimensions).(186

)(126

)(66

)= 18, 564 · 924 · 1 = 17, 153, 136 �ux allocations.

Note growth in number of splits, dividing 18 into 3 groups of 6.

The full randomization reference distribution may be pushingthe computing limits =⇒ simulated reference distribution.

5

The Flux3 Data

SIR Responses

X Y Z

9.9 10.7 10.99.6 10.4 11.09.6 9.5 9.59.7 9.6 10.09.5 9.8 11.710.0 9.9 10.2

units log10(Ohm)X Y Z

9.5

10.0

10.5

11.0

11.5

Flux

SIR

( l

og10

(Ohm

) )

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

6

Di�erences in the Fluxes?

To examine whether the �uxes are in some way di�erent intheir e�ects we could again focus on di�erences between themeans of the SIR responses.

Denote these means by µ1 = µX , µ2 = µY , and µ3 = µZ .

Mathematically, X ≡ Y and Y ≡ Z =⇒ X ≡ Z .It would seem that testing H0,XY : X ≡ Y andH0,YZ : Y ≡ Z might su�ce.Statistically, X ≈ Y and Y ≈ Z allows for thepossibility that X and Z are su�ciently di�erent.

To guard against this we could perform all 3 possible 2-sampletests for the respective hypothesis testing problems:

H0,XY : X ≡ Y vs. H1,XY : µX 6= µYH0,YZ : Y ≡ Z vs. H1,YZ : µY 6= µZH0,XZ : X ≡ Z vs. H1,XZ : µX 6= µZ

7

Probability of Overall Type I Error?

If we do each such test at level α, what is our chance ofgetting a rejection by at least one of these tests when in factall 3 �uxes are equivalent? (2 versus 4 engines on aircraft,controversy between Boeing and Airbus)

If these 3 tests are independent of each other we would have

P0(Overall Type I Error)

= P0(reject at least one of the hypotheses)

= 1− P0(accept all of the hypotheses)= 1− P0( accept H0,XY ∩ accept H0,XZ ∩ accept H0,YZ )= 1− (1− α)3 = 0.142625 for α = .05 .

P0 indicates that all 3 �uxes are the same and that we aredealing with the null or randomization reference distribution.

8

Engine Failure

If pF = probability of in �ight shutdown = 1/10000, thechance of at least one shutdown on a �ight with k engines is

P(at least one shutdown) = 1− (1− pF )k ≈ k × pF .

k 1− (1− pF )k k × pF2 .00019999 .00024 .00039994 .0004

(1− pF )k assumes that engine shutdowns are independent.This independence is the goal of ETOPS (Extended-rangeTwin-engine Operational Performance Standards)http://en.wikipedia.org/wiki/ETOPS

E.g., di�erent engines are serviced by di�erent mechanics, orat separate time maintenance events.

9

http://en.wikipedia.org/wiki/ETOPS

The Multiple Comparison Issue

If you expose yourself to multiple rare opportunities of makinga wrong decision, the chance of making a wrong decision atleast once (the overall type I error) is much higher thanplanned for in the individual tests.

This problem is referred to as the multiple comparison issue.

How much higher is it?

Calculation based on independence is not correct.

Any 2 comparisons involve a common sample ⇒ dependence.Boole's inequality bounds the overall type I error probability:

π0 = P0(Overall Type I Error )

= P0(reject H0,XY ∪ reject H0,XZ ∪ reject H0,YZ )≤ P0(reject H0,XY ) + P0(reject H0,XZ ) + P0(reject H0,YZ )= 3α = .15 when α = .05

= expected number of false rejections

How much smaller than this upper bound is the true π0?

10

Overall Type I Error Probability

Evaluate it for the randomization reference distribution.

Get the randomization reference distribution of X̄ − Ȳ forsplits of the 18 SIR values into 3 groups of 6.

Take the di�erence of averages for the �rst two groups.

Do this by simulation: Nsim0 = 10000 times.

For α = .05 get the .95-quantile tcrit of this simulated|X̄ − Ȳ | reference distribution. It serves equally well for testsbased on |X̄ − Z̄ | or |Ȳ − Z̄ |. Why?Then simulate another Nsim1 = 10000 such splits,computing |X̄ − Ȳ |, |X̄ − Z̄ |, and |Ȳ − Z̄ | each time, and tallythe proportions of each individually exceeding tcrit and theproportion of at least one of them exceeding tcrit.

The resulting proportions are: 0.0451 0.0460 0.0491for the individual tests (≈ the targeted α = .05) and 0.1186for the overall type I error rate.

The code typeIerror.rateRand is posted on web.

11

A Global Testing View

Rather than using all 3 pairwise tests statistics |X̄ − Ȳ |,|X̄ − Z̄ |, and |Ȳ − Z̄ | separately, we will address this in aglobal way, using a single discrepancy statistic.

For now we will focus on the population view.

In the context of a 3 population model we will test thehypothesis H0 : µ1 = µ2 = µ3 (common value unspeci�ed=⇒ composite hypothesis) against the alternativeH1 : µi 6= µj for some i 6= j .More generally we may have t treatments and ni observationsYi ,1, . . . ,Yi ,ni for the i

th treatment, i = 1, . . . , t.

Test H0 : µ1 = . . . = µt against H1 : µi 6= µj for some i 6= j .For Flux3 data: t = 3, n1 = n2 = n3 = 6, a balanced design.

When the ni are not all the same =⇒ unbalanced design.

12

Useful Models for Treatment Variation

We have measurements Yij , the jth response under the i th

treatment, j = 1, . . . , ni and i = 1, . . . , t.

A total of N = n1 + . . .+ nt measurements.

Treatment Means Model:

Yij = µi + �ij with E (�ij) = 0 and var(�ij) = σ2

View �ij (i.i.d.) as response variation/error/noise.

Treatment E�ects Model:

Yij = µ+ τi + �ij with E (�ij) = 0 and var(�ij) = σ2

µ = µ̄ =∑

ij µi/N =∑

i niµi/N = grand meanor ni/N-weighted average of the µi

τi = µi − µ = µi − µ̄ is the i th treatment e�ect�ij (i.i.d.) = within treatment variation,E (�ij) = 0, var(�ij) = σ2.

Note: the τi satisfy the constraint:∑

ij τi =∑

i niτi = 0.

13

The Reduced Model

In contrast to the full model with varying treatment means, asdiscussed on the previous slide, we assume in the reducedmodel a single mean for all observations:

Yij = µ+ �ij with E (�ij) = 0 with var(�ij) = σ2 ,

i.e., there is no variation or change due to treatments.

The reduced model corresponds to our stated hypothesis

H0 : µ1 = . . . = µt or equivalently H0 : τ1 = . . . = τt = 0

This is a special case of our previous full population model.

Test this hypothesis by �tting the full model and the reducedmodel to the data and compare the quality of �ts relative toeach other via some discrepancy metric.

14

Full Model Fitting by Least Squares (Gauss/Legendre)

Minimize the Sum of Squares criterion

SS(µ1, . . . , µt) =t∑

i=1

ni∑j=1

(Yij − µi )2 over µ = (µ1, . . . , µt) .

Using Ȳi. =∑ni

j=1 Yij/ni and the fact∑ni

j=1(Yij − Ȳi.) = 0:SS(µ1, . . . , µt) =

t∑i=1

ni∑j=1

(Yij−Ȳi.+Ȳi.−µi )2 (a+b)2 = a2+b2+2ab

=t∑

i=1

ni∑j=1

(Yij − Ȳi.)2 +t∑

i=1

ni∑j=1

(Ȳi. − µi )2

+2t∑

i=1

ni∑j=1

(Yij − Ȳi.)(Ȳi. − µi )

=t∑

i=1

ni∑j=1

(Yij − Ȳi.)2 +t∑

i=1

ni∑j=1

(Ȳi. − µi )2 ≥t∑

i=1

ni∑j=1

(Yij − Ȳi.)2

least squares estimates (LSE) µ̂i = Ȳi. minimizeSS(µ1, . . . , µt) =⇒ SS(µ̂1, . . . , µ̂t) =

∑ti=1

∑nij=1(Yij − Ȳi.)2.

15

The Dot Notation

If a1, . . . , an are n numbers then

a. =n∑

i=1

ai and ā. =n∑

i=1

ai/n .

For an array of numbers aij , i = 1, . . . ,m, j = 1, . . . , n, write

a.j =m∑i=1

aij ā.j =m∑i=1

aij/m ai. =n∑

j=1

aij āi. =n∑

j=1

aij/n

a.. =m∑i=1

n∑j=1

aij and ā.. =m∑i=1

n∑j=1

aij/(mn)

Similarly for higher dimensional arrays aijk , i = 1, . . . ,m,j = 1, . . . , n, k = 1, . . . , `

aij. =∑̀k=1

aijk and āij. =∑̀k=1

aijk/` and so on.

16

Reduced Model Fitting by Least Squares

Minimize the SS-criterion SS(µ) =∑t

i=1

∑nij=1(Yij − µ)2

Ȳ.. =∑ij

Yij/∑i

ni =∑i

(ni/N)Ȳi. =⇒∑i

∑j

(Yij − Ȳ..) = 0

=⇒ SS(µ) =∑ij

(Yij − µ)2 =∑ij

(Yij − Ȳ.. + Ȳ.. − µ)2

=∑ij

(Yij − Ȳ..)2 +∑ij

(Ȳ.. − µ)2

+2∑ij

(Yij − Ȳ..)(Ȳ.. − µ)

=∑ij

(Yij − Ȳ..)2 +∑ij

(Ȳ.. − µ)2 ≥∑ij

(Yij − Ȳ..)2

The least squares estimate (LSE) µ̂ = Ȳ.. minimizes SS(µ)=⇒ SS(µ̂) =

∑ij(Yij − Ȳ..)2.

17

Means and Variances of Least Squares Estimates

E (Ȳi.) = E 1ni

ni∑j=1

Yij

= 1ni

ni∑j=1

E (Yij) =1

ni

ni∑j=1

µi = µi

var(Ȳi.) = var 1ni

ni∑j=1

Yij

= 1n2i

ni∑j=1

var(Yij) =1

n2i

ni∑j=1

σ2 =σ2

ni

E (Ȳ..) = E 1N

t∑i=1

ni∑j=1

Yij

= E ( t∑i=1

niN

Ȳi.)

=t∑

i=1

niNµi = µ̄

var

(Ȳ..)

= var

(t∑

i=1

niN

Ȳi.)

=t∑

i=1

(niN

)2var(Ȳi.) =

t∑i=1

niN2

σ2 =σ2

N

18

Sum of Squares (SS) Decomposition

Using∑ni

j=1(Yij − Ȳi.) = 0 =⇒ sum of squares decomposition

SST =t∑

i=1

ni∑j=1

(Yij − Ȳ..)2 =t∑

i=1

ni∑j=1

(Yij − Ȳi. + Ȳi. − Ȳ..)2

=t∑

i=1

ni∑j=1

(Yij − Ȳi.)2 +t∑

i=1

ni∑j=1

(Ȳi. − Ȳ..)2

+2t∑

i=1

ni∑j=1

(Yij − Ȳi.)(Ȳi. − Ȳ..)

=t∑

i=1

ni∑j=1

(Yij − Ȳi.)2 +t∑

i=1

ni∑j=1

(Ȳi. − Ȳ..)2

= SSE + SSTreat

ANOVA decomposition of total SS variation SST :SST = SSE + SSTreat = SSW + SSB .SST = error variation+treatment variationSST = variation within samples + variation between samples.

19

How to Compare the Model Fits?

How should we compare the two model �ts

SSE =t∑

i=1

ni∑j=1

(Yij−Ȳi.)2 and SST =t∑

i=1

ni∑j=1

(Yij−Ȳ..)2 ?

Under H0 (reduced model) both �ts should be somewhatcomparable, except that the full model �t gave us morefreedom in minimizing the sum of squares.The previous slide showed

SSTreat+SSE = SST =t∑

i=1

ni∑j=1

(Yij−Ȳ..)2 ≥t∑

i=1

ni∑j=1

(Yij−Ȳi.)2 = SSE

with SST − SSE = SSTreat =t∑

i=1

ni∑j=1

(Ȳi. − Ȳ..)2 .

For fair comparison make allowances for this extra freedom.

Understand E (SST ) and E (SSE ) when H0 is true or false.

20

Unbiasedness of s2: E (s2) = σ2

Assume X1, . . . ,Xn are i.i.d. with mean µ and variance σ2.

E

(1

n − 1

n∑i=1

(Xi − X̄ )2)

= E (s2) = σ2 ⇒ s2 is unbiased

Using E (Y 2) = var(Y ) + [E (Y )]2

⇒ E ((n − 1)s2) = E

(n∑

i=1

(Xi − X̄ )2)

= E

(n∑

i=1

(X 2i − 2Xi X̄ + X̄ 2

))

= E

(n∑

i=1

X 2i − nX̄ 2)

= n(σ2 + µ2)− n(var(X̄ ) + [E (X̄ )]2)= n(σ2 + µ2)− n(σ2/n + µ2) = (n − 1)σ2

=⇒ E (s2) = σ2

21

E (MSE ) = σ2

s2i =

ni∑j=1

(Yij − Ȳi.)2/(ni − 1) =⇒t∑

i=1

(ni − 1)s2i = SSE

and the result from the previous slide shows

E

t∑i=1

ni∑j=1

(Yij − Ȳi.)2 = E ( t∑

i=1

(ni − 1)s2i

)=

t∑i=1

(ni−1)σ2 = (N−t)σ2

or the Mean Square for Error

MSE =SSEN − t

=

∑ti=1

∑nij=1(Yij − Ȳi.)2N − t

is an unbiased estimate for σ2

True whether H0 : µ1 = . . . = µt holds or not (without normality).

22

E (MSTreat) = σ2+?

SSTreat =t∑

i=1

ni (Ȳi. − Ȳ..)2 =t∑

i=1

ni (Ȳ2

i. − 2Ȳi.Ȳ.. + Ȳ 2..)

=t∑

i=1

ni Ȳ2

i. − NȲ 2..

=⇒ E (SSTreat) =t∑

i=1

niE (Ȳ2

i.)− N E (Ȳ 2..) (with/without normality)

=t∑

i=1

ni (var(Ȳi.) + [E (Ȳi.)]2)− N(var(Ȳ..) + [E (Ȳ..)]2)

=t∑

i=1

ni (σ2/ni + µ

2

i )− N(σ2/N + µ̄2)

= (t − 1)σ2 +t∑

i=1

ni (µi − µ̄)2

E (MSTreat) = E

(SSTreatt − 1

)= σ2 +

t∑i=1

ni(µi − µ̄)2

t − 1= σ2 +

t∑i=1

niτ2i

t − 1.

23

A Test Statistic for H0

Under H0 both MSTreat and MSE are unbiased estimates of σ2

H0 is false=⇒

∑ti=1 ni (µi−µ̄)2/(t−1) > 0 =⇒ E (MSTreat) > E (MSE )

MSTreat will generally be somewhat larger than MSE .

More so when the µi are more dispersed.

The ni act as magni�ers!

This suggests F = MSTreat/MSE as a plausible test statistic.

Looking at ratio makes more sense than looking at di�erence.

Any such di�erence should be viewed relative to MSE .

Use this test statistic also in our randomization test.

24

Equivalent Form for the F -Statistic under Randomization

In SST = SSTreat + SSE the sum SST stays constant over allpartitions of the full data set into t groups of sizes n1, . . . , nt .

SSTreat =t∑

i=1

ni Ȳ2

i. − NȲ 2.. = Fequiv − NȲ 2..

Fequiv =∑t

i=1 ni Ȳ2i. varies with partition splits.

Ȳ.. also stays constant over all such partition splits.

F =N − tt − 1

SSTreatSSE

=N − tt − 1

SSTreatSST − SSTreat

=N − tt − 1

Fequiv − NȲ 2..SST − (Fequiv − NȲ 2..)

↗ in Fequiv

Thus the randomization distribution of F is in 1-1correspondence with the randomization distribution of Fequiv .

We can then take it as an alternate and more easily calculabletest statistic for computing p-values under H0.

25

Randomization Distribution for Flux3

Simulated Randomization Distribution

F−equivalent Test Statistic

Fre

quen

cy

1830 1831 1832 1833 1834 1835

010

0020

0030

0040

0050

00

F−equivalent test statistic 1832.3

p−value = 0.04296

based on 1e+05 simulations

26

R Code for Randomization Distribution

Ftest.rand

F -Distribution as Approximation to the Randomization Distribution

As in the case of the 2-sample problem one �nds that theFt−1,N−t distribution often provides a good approximation tothe randomization distribution of F .

The randomization distribution of F is obtained from that ofFequiv via

F =N − tt − 1

Fequiv − NȲ 2..SST − (Fequiv − NȲ 2..)

The next slide shows the quality of this approximation for theFlux3 data set.

28

Randomization Distribution for Flux3


F−Statistic

Den

sity

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

F−statistic 3.6452

p−value = 0.04296 & p−value = 0.05126 from F−distribution


superimposed F−density

29

Assuming Normality

We now assume that the Yij are independent, normal r.v.'swith the previously indicated model parameters.

For H0 : µ1 = . . . = µt true or not ⇒ (ni − 1)s2i ∼ σ2χ2ni−1.Further, s21 , . . . , s

2t are independent and thus

SSE =t∑

i=1

(ni − 1)s2i ∼ σ2χ2n1−1 + . . .+ σ2χ2nt−1 ∼ σ

2χ2N−t

SSE is independent of Ȳ1., . . . , Ȳt., since s2i and Ȳi. areindependent for all i and all pairs (s2i , Ȳi.) are independent.=⇒ SSE and SSTreat are independent.Is SSTreat =

∑ti=1 ni Ȳ

2i. − NȲ 2.. ∼ σ2χ2?

What degrees of freedom f ?

Under H0 we expect f = t − 1 sinceE (MSTreat) = E (SSTreat/(t − 1)) = σ2.

30

The Distribution of F

The previous slide and Appendix A, slide 148, establish thefollowing: SSE and SSTreat are independent and

SSE/σ2 ∼ χ2N−t and SSTreat/σ2 ∼ χ2t−1,λ with λ =

t∑i=1

ni (µi−µ̄)2/σ2

=⇒ F = SSTreat/(t − 1)SSE/(N − t)

∼ Ft−1,N−t,λ

For H0 : µ1 = . . . = µt this becomes the Ft−1,N−t distribution.

We reject H0 wheneverF ≥ Ft−1,N−t(1− α) = Fcrit = qf(1− α,t− 1,N− t)= (1− α)-quantile of the Ft−1,N−t distribution.Power function: β(λ) = Pλ(F ≥ Ft−1,N−t(1− α)) =

1− pf(Fcrit,t− 1,N− t, λ)

31

R's anova and lm Applied to Flux3

> SIR SIR[1] 9.9 9.6 9.6 9.7 9.5 10.0 10.7 10.4 9.5 9.6

[11] 9.8 9.9 10.9 11.0 9.5 10.0 11.7 10.2> FLUX FLUX[1] "X" "X" "X" "X" "X" "X" "Y" "Y" "Y" "Y" "Y" "Y"

[13] "Z" "Z" "Z" "Z" "Z" "Z"> anova(lm(SIR~as.factor(FLUX))) # see ?anova & ?lmAnalysis of Variance Table

Response: SIRDf Sum Sq Mean Sq F value Pr(>F)

as.factor(FLUX) 2 2.1733 1.0867 3.6452 0.05126 .Residuals 15 4.4717 0.2981---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

32

Discussion of Noncentrality Parameter λ

The power of the ANOVA F -test is a monotone function ofλ =

∑ti=1 ni (µi − µ̄)2/σ2 (See Appendix B, slide 151).

Let us consider the drivers in λ.

λ↗ as σ ↘, provided the µi are not all the same.The more di�erence between the µi the higher λ.

Increasing the sample sizes will magnify ni (µi − µ̄)2.For �xed σ and µi not all equal, the power increases strictly

∂λσ2

∂ni= (µi − µ̄)2 −

∑j

2nj(µj − µ̄)(µi − µ̄)

N= (µi − µ̄)2 > 0

since∂µ̄

∂ni=∂(∑

j njµj/∑

j nj

)∂ni

=(µi − µ̄)

N

The sample sizes we can plan for.

Later: Blocking units into homogeneous groups ⇒ smaller σ.33

Optimal Allocation of Sample Sizes?

We have N experimental units available for testing the e�ectsof t treatments and suppose that N is a multiple of t, sayN = n × t (n and t integer).It would seem best to use samples of equal size n for each ofthe t treatments i.e., we would opt for a balanced design.

Then we would not emphasize one treatment over any other.

Optimality criterion that could be used as justi�cation?

Plan for a balanced design upfront. How large should n be?

Then something goes wrong with a few observations and theyhave to be discarded from analysis.

Thus we need to be prepared for unbalanced designs.

34

A Sample Size Allocation Rationale

We may be concerned with alternatives where all means butone are the same.

We want to achieve a given power β against such a mean,which deviates by ∆ from the other means (which coincide).

Since we won't know upfront which mean sticks out, we wouldwant to maximize the minimum power against all suchcontingencies. Max-Min Strategy!

If µ1 = µ+ ∆ and µ2 = . . . = µt = µ then µ̄ = µ+ n1∆/N.

With a bit of algebra we get

λ1 =t∑

i=1

ni (µi − µ̄)2/σ2 =N∆2

σ2n1N

(1− n1

N

)=

N∆2

σ2R1

similarly λi =N∆2

σ2niN

(1− ni

N

)=

N∆2

σ2Ri for the other cases.

35

The Max-Min Solution

For �xed σ > 0 and ∆ 6= 0 the following maxmin power

maxn1,...,nt

min1≤i≤t

[λi ] = maxn1,...,nt

min1≤i≤t

[N∆2

σ2Ri

]is achieved when n1 = . . . = nt . Here Ri = (ni/N)(1− ni/N).Reason: Ri = (ni/N)(1− ni/N) increases for ni/N ≤ 1/2.Since n1 + . . .+ nt = N = nt is �xed, can increase thesmallest Ri only at the expense of lowering some higher Rj .

This increase only happens when something is left to lower.

⇒ maxn1,...,nt

min1≤i≤t

[λi ] =N∆2

σ2n

N

(1− n

N

)for n = n1 = . . . = nt

= n∆2

σ2

(1− n

nt

)= n

∆2

σ2t − 1t

= n · λ0 .

Interpret λ0 =∆2

σ2t−1t more generally as

∑(µi − µ̄)2/σ2.

36

An Alternate Rationale (Dean and Voss, p. 52)

Let C∆ = {µ = (µ1, . . . , µt) : max(µ)−min(µ) ≥ ∆} for ∆ > 0.For �xed σ > 0,∆ > 0 �nd sample sizes n1, . . . , nt (with∑

ni = nt = N �xed), to maximize the power, i.e.,

minµ∈C∆

λ(µ) = N minµ∈C∆

∑ti=1 pi (µi − µ̄)2

σ2with pi =

niN

It can again be shown that equal sample size allocation, i.e.,n1 = . . . = nt = n, is the optimal (max-min) strategy.Suppose µ1 ≤ µ2, . . . , µt−1 ≤ µt = µ1 + ∆, then λ(µ) isminimized over the restricted µ2, . . . , µt−1 when these are = µ̄

µ̄ = (p2 + . . .+ pt−1)µ̄+ p1µ1 + pt(µ1 + ∆)

=⇒ µ̄ = p1p1 + pt

µ1 +pt

p1 + pt(µ1 + ∆) = p̃1µ1 + p̃t(µ1 + ∆)

=⇒ µt − µ̄ = ∆p̃1 and µ1 − µ̄ = −p̃t∆

minµ∈C∆

λ(µ) =N(p1 + pt)∆

2

σ2(p̃1p̃

2

t + p̃2

1p̃t) =

N∆2

σ2p1pt

p1 + pt

For �xed p1 + pt maximized by p1 = pt , i.e., n1 = nt q.e.d.37

Some Discussion of maxmin Results

The previous two maxmin situations are di�erent.

In the �rst we stipulated one mean di�erent from the others,the latter assumed without treatment e�ect, i.e., the same.

In the second we assumed two means to di�er by ∆ > 0, whilethe others were somewhere in between.

i.e., we focus on maximum treatment di�erence.

In either case nothing was known about the indices of thedi�ering treatment e�ects.

My hunch is that more general results like this can beformulated and solved with the same resolution.

Research problem for formulation and resolution!?

38

sample.sizeANOVA (see web page)

Just as in the case of planning appropriate sample sizes for thetwo-sample situation, the F -test encounters the samedi�culties in terms of the varying impacts of the commonsample size n per treatment.

n a�ects the critical point of the level α F -test throughtcrit=qf(1-alpha,t-1,N-t)=qf(alpha,t-1,n*t-t).

n also enters the power function1-pf(tcrit,t-1,n*t-t,lambda) and n enters thepower function through λ. Here λ = n(∆/σ)2(t − 1)/t orλ = n(∆/σ)2/2.

In either case we should have a reasonable upper bound σu forσ, or express ∆ not in absolute terms but in relation to theunknown σ by specifying ∆/σ.

To facilitate the choice of appropriate n per treatment, thefunction sample.sizeANOVA is given on class web page.

39

Usage of sample.sizeANOVA

function (delta.per.sigma=.5,t.treat=3, nrange=2:30,alpha=.05,power0=NULL)

{# delta.per.sigma is the ratio of delta over sigma for which# one wants to detect a delta shift in one mean while all other# means stay the same, or delta is the maximum difference# between any two means to be detected. t.treat is the number of# treatments. alpha is the desired significance level. nrange is a# range of sample sizes over which the power will be calculated# for that delta.per.sigma. power0 is an optional value for the# target power that will be highlighted on the plot.....}

40

Example Usage of sample.sizeANOVA

The following three function calls invoke the defaultt.treat=3 to produce the plots on the next three slides.

> sample.sizeANOVA()> sample.sizeANOVA(nrange=30:100)> sample.sizeANOVA(nrange=70:100,power0=.9)

n = 77 the minimal sample size under the �rst rationale.

n = 103 the minimal sample size under the alternate rationale.

41

Sample Size Determination

5 10 15 20 25 30

0.0

0.2

0.4

0.6

0.8

1.0

minimum sample size n per each of t = 3 treatments

ββ((λλ))

= ββ

((n××

λλ 0))

ββ((λλ))

= ββ

((n××

λλ 1))

∆∆σσ

== 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

2

××t −− 1

t , λλ1 ==

1

2××

∆∆σσ

2

all µµi are same, except for one, differing by ±± ∆∆max ( µµ1,, ……,, µµt ) −− min ( µµ1,, ……,, µµt ) == ∆∆

42

Sample Size Determination (increased n)

30 40 50 60 70 80 90 100

0.0

0.2

0.4

0.6

0.8

1.0


ββ((λλ))

= ββ

((n××

λλ 0))

ββ((λλ))

= ββ

((n××

λλ 1))

∆∆σσ

== 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

2

××t −− 1

t , λλ1 ==

1

2××

∆∆σσ

2


43

Sample Size Determination (magni�ed)

70 80 90 100 110

0.0

0.2

0.4

0.6

0.8

1.0


ββ((λλ))

= ββ

((n××

λλ 0))

ββ((λλ))

= ββ

((n××

λλ 1))

∆∆σσ

== 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

2

××t −− 1

t , λλ1 ==

1

2××

∆∆σσ

2

ββ == 0.9


44

The E�ect of t

Even though the number of treatments does not a�ect λ1 ita�ects the power function through the degrees of freedom

tcrit

Sample Size Determination (magni�ed)

70 80 90 100 110 120 130 140

0.0

0.2

0.4

0.6

0.8

1.0


ββ((λλ))

= ββ

((n××

λλ 0))

ββ((λλ))

= ββ

((n××

λλ 1))

∆∆σσ

== 0.5 , αα == 0.05 , λλ0 ==∆∆σσ

2

××t −− 1

t , λλ1 ==

1

2××

∆∆σσ

2

ββ == 0.9


46

Degrees of Freedom and Geometry � Single Sample

X1X2.........Xn

=

X̄X̄.........X̄

⊥+

X1 − X̄X2 − X̄

...

...

...Xn − X̄

⊥ because (X̄ , . . . , X̄ ) ·

X1 − X̄X2 − X̄

...

...

...Xn − X̄

= X̄ ·

n∑i=1

(Xi − X̄ ) = 0

(X̄ , . . . , X̄ ) varies in one dimension, along 1′ = (1, . . . , 1)The residual vector (X1 − X̄ , . . . ,Xn − X̄ ) varies in its(n − 1)-dimensional orthogonal complement.The n residuals thus have n − 1 degrees of freedom.

47

Orthogonal Decomposition of Sample Vector

●

●

●

●

x1

x2

x = ( x , x )

x = ( x1 , x2 )

x − x = ( x1 − x , x2 − x )

Pythagoras: |x|2 = |x̄|2 + |x− x̄|2 =∑i

x̄2 +∑i

(xi − x̄)2

= nx̄2 +∑i

(xi − x̄)2 our previous SS decomposition

48

Degrees of Freedom and Geometry in t Samples

Decomposition of total dimension N =∑

ni into subspace dimensions

N = 1 +∑

(ni − 1) + t − 1

Y11...

Y1n1......

Yt1...

Ytnt

=

Ȳ.....

Ȳ........

Ȳ.....

Ȳ..

⊥+

Y11 − Ȳ1....

Y1n1 − Ȳ1.......

Yt1 − Ȳt....

Ytnt − Ȳt.

⊥+

Ȳ1. − Ȳ.....

Ȳ1. − Ȳ........

Ȳt. − Ȳ.....

Ȳt. − Ȳ..

∑i

∑j

Y 2ij =∑i

∑j

Ȳ 2.. +∑i

∑j

(Yij − Ȳ..)2

=∑i

∑j

Ȳ 2.. +∑i

∑j

(Yij − Ȳi.)2 +∑i

∑j

(Ȳi. − Ȳ..)2

49

Orthogonalities

∑i

∑j

Ȳ..(Ȳi. − Ȳ..) = Ȳ..∑i

ni (Ȳi. − Ȳ..)

= Ȳ..(∑i

∑j

Yij − NȲ..) = 0

∑i

∑j

Ȳ..(Yij − Ȳi.) = Ȳ..∑i

(ni Ȳi. − ni Ȳi.) = 0

∑i

∑j

(Ȳi. − Ȳ..)(Yij − Ȳi.) =∑i

(Ȳi. − Ȳ..)∑j

(Yij − Ȳi.) = 0

50

Dimensions of Subspaces or Degrees of Freedom

Let 1′n = (1, 1, . . . , 1) denote an n-vector of 1's. The vectors

Ȳ1. − Ȳ.....

Ȳ1. − Ȳ........

Ȳt. − Ȳ.....

Ȳt. − Ȳ..

= (Ȳ1. − Ȳ..)

1n10...0

+ . . .+ (Ȳt. − Ȳ..)

0...01nt

= (Ȳ1. − Ȳ..)E1 + . . .+ (Ȳt. − Ȳ..)Et = D

span a (t − 1)-dimensional subspace M ⊂ RN , because E1, . . . ,Et spana t-dimensional subspace of RN and D is always orthogonal to 1N ∈M ,since 1′ND =

∑ti=1 ni (Ȳi. − Ȳ..) = 0.

Note that∑t

i=1 aiEi ⊥ 1N = (E1 + . . .+ Et) ⇐⇒∑t

i=1 niai = 0.

51

More on Dimensions and Degrees of Freedom

Using the standard orthonormal basis vectors eij (with 1 in vectorposition (i , j) and 0 in all other positions) we have that

R =

Y11 − Ȳ1....

Y1n1 − Ȳ1.......

Yt1 − Ȳt....

Ytnt − Ȳt.

=

(Y11 − Ȳ1.)e11 + . . .+ (Y1n1 − Ȳ1.)e1n1+. . .. . .

+(Yt1 − Ȳt.)et1 + . . .+ (Ytnt − Ȳt.)etnt⊥ Ei ∀i

because∑ni

j=1(Yij − Ȳi.) = 0 for all i . Thus R lives in the N − tdimensional orthogonal complement MN−t of E1, . . . ,Et .Any vector v ∈ MN−t is of form v = a11e11 + . . .+ atntetnt with∑ni

j=1 aij = 0 for i = 1, . . . , t. Thus the R vectors span MN−t .

52

Orthogonal Decomposition of Sample Space

●

( Y..

,, ……,, Y

.. )

( Y1 . −− Y.. ,, ……,, Yt . −− Y.. )

dimension = t−1

dim

ensi

on =

1

( Y11−− Y1 .

,, ……,, Yt nt−−

Yt . )

dimension

= N−t

|(Y11, . . . ,Ytnt )|2 = |(Ȳ.., . . . , Ȳ..)|2 + |(Y11 − Ȳ1., . . . ,Ytnt − Ȳt.)|2+|(Ȳ1. − Ȳ.., . . . , Ȳ1. − Ȳ.., . . . , Ȳt. − Ȳ.., . . . , Ȳt. − Ȳ..)|2

∑i

∑j

Y 2ij =∑i

∑j

Ȳ 2.. +∑i

∑j

(Yij − Ȳi.)2 +∑i

∑j

(Ȳi. − Ȳ..)2

53

Coagulation Example

Import data: coag coag$ctime[1] 59 60 62 63 63 64 65 66 67 71 66 67 68 68 68[16] 71 56 59 60 61 62 63 63 64> coag$diet[1] A A A A B B B B B B C C C C C C D D D D D D D[24] DLevels: A B C D

54

Plot for Coagulation Example

AA

AA B

BBBB

B

CCCCC

C

D

DDDDDDD

diet

coag

ulat

ion

time

(sec

)

5055

6065

7075

80

nA == 4 nB == 6 nC == 6 nD == 8

55

ANOVA for Coagulation Example

The plot used jitter(coag$ctime) to plot ctime in thevertical direction and to plot its horizontal mean lines.

This perturbs observations a small random amount to maketied observations more visible.

The means for diet A and D would have coincided otherwise.

> anova(lm(ctime~diet,data=coag))# assumes data frame coag with variables ctime and diet# is in the work spaceAnalysis of Variance Table

Response: ctimeDf Sum Sq Mean Sq F value Pr(>F)

diet 3 228 76.0 13.571 4.658e-05 ***Residuals 20 112 5.6---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

56

lm for Coagulation Example

> out names(out)[1] "coefficients" "residuals" "effects"[4] "rank" "fitted.values" "assign"[7] "qr" "df.residual" "contrasts"

[10] "xlevels" "call" "terms"[13] "model"> out$coefficients # or out$coef

(Intercept) dietB dietC dietD6.100000e+01 5.000000e+00 7.000000e+00 -2.515253e-15

Note that these are the estimates µ̂A = 61 (Intercept),µ̂B − µ̂A = 5, µ̂C − µ̂A = 7, µ̂D − µ̂A = 0.

57

Residuals from lm for Coagulation Example

> out$residuals1 2 3 4

-2.000000e+00 -1.000000e+00 1.000000e+00 2.000000e+005 6 7 8

-3.000000e+00 -2.000000e+00 -1.000000e+00 1.111849e-169 10 11 12

1.000000e+00 5.000000e+00 -2.000000e+00 -1.000000e+0013 14 15 16

-5.534852e-17 -5.534852e-17 -5.534852e-17 3.000000e+0017 18 19 20

-5.000000e+00 -2.000000e+00 -1.000000e+00 -1.663708e-1621 22 23 24

1.000000e+00 2.000000e+00 2.000000e+00 3.000000e+00

Numbers such as -5.534852e-17 should be treated as 0(computing quirks).

58

Rounded Residuals from lm for Coagulation Example

> round(out$resid,4)1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

-2 -1 1 2 -3 -2 -1 0 1 5 -2 -1 0 0 0 3 -5 -2 -1 0

21 22 23 241 2 2 3

59

Fitted Values from lm for Coagulation Example

> out$fitted.values1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

61 61 61 61 66 66 66 66 66 66 68 68 68 68 68 68 61 61 61

20 21 22 23 2461 61 61 61 61

60

Randomization Test for Coagulation Example


F−equivalent Test Statistic

Fre

quen

cy

98300 98400 98500 98600

050

010

0015

0020

0025

0030

0035

00

F−equivalent test statistic 98532

p−value = 7e−05


61

F -Approximation to Coagulation Randomization Test


F−Statistic

Den

sity

0 5 10 15 20 25

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F−statistic 13.571

p−value = 7e−05

p−value = 4.658e−05 from F−distribution


62

Comparing Treatment Means Ȳi.

When H0 : µ1 = . . . = µt is not rejected at level α, there islittle purpose looking closer at di�erences between the Ȳi.for the various treatments.

Any such perceived di�erences could easily have come aboutby simple random variation, even when the hypothesis is true.

Why then read something into randomness?

It would be like reading tea leaves!

However, when the hypothesis is rejected it is quite natural toask in which way the hypothesis was contradicted.

63

Con�dence Intervals for µi

A �rst step in understanding di�erences in the µi is to look attheir estimates µ̂i = Ȳi. and their con�dence intervals.In any such con�dence interval we can now use the pooledvariance s2 from all t samples and not just the variance s2ifrom the i th sample, i.e. we get

µ̂i ± tN−t,1−α/2 ×s√ni

as our 100(1− α)% con�dence interval for µi .This follows as before from the independence of µ̂i and s,

the fact that (µ̂i − µi )/(σ/√ni ) ∼ N (0, 1)

and s2/σ2 ∼ χ2N−t/(N − t) and combining this to

µ̂i − µis/√ni

=(µ̂i − µi )/(σ/

√ni )

s/σ∼ tN−t

64

Validity of Pooling?

Using s2 instead of s2i improves (narrows) the con�denceintervals for µi .

This narrowing comes about because tN−t,1−α/2 then usesmuch higher degrees of freedom (N − t � ni − 1) and thusshrinks, up to a point (see later plot).

The validity of this improvement depends strongly on theassumption that the population variances σ2 behind all tsamples are the same, or at least approximately so.

Recall earlier discussion of this issue for 2-sample t-test.

65

Standard Errors SE (θ̂)

Suppose θ̂ is an estimator for a parameter θ of interest. Wedenote by σ2

θ̂= var(θ̂) = g(θ, ψ) its sampling variance and by

σθ̂ =√g(θ, ψ) its sampling standard deviation.

The estimated sampling standard deviation of θ̂, i.e.,

σ̂θ̂ =√

g(θ̂, ψ̂) = SE (θ̂), is the standard error of θ̂.

Example 1: µ̂ = X̄ as estimate of µ has variancevar(µ̂) = σ2/n ⇒ SE (µ̂) = s/

√n.

Example 2: s2 ∼ σ2χ2n−1/(n − 1) as estimate of σ2 hassampling variance

var(s2) =σ4 2(n − 1)

(n − 1)2=

2σ4

n − 1=⇒ SE (s2) = s2

√2

n − 1

Note the di�erent roles of (θ, ψ) in these two examples.

Example 1: θ = µ and ψ = σ2 and we only use ψ̂ in SE (θ̂).

Example 2: θ = σ2 and there is no ψ. We only use θ̂ in SE (θ̂).

66

95%-Rule of Thumb Using SE s

If θ̂ ≈ N (θ, σ2θ̂) ≈ N (θ,SE 2(θ̂)), as is often the case, then

θ̂ ± 2× SE (θ̂) is an approximate 95% con�dence interval for θThis results from z.975 = qnorm(.975) = 1.959964 ≈ 2.This works especially well for Student-t based intervals

µ̄i ± tf ,.975 ×s√ni

= Ȳi. ± tN−t,.975 ×s√ni

because tf ,.975 ≈ z.975 for large f , see next slide.

67

tf ,.975 → z.975 = 1.96 ≈ 2

20 40 60 80 100 120 140

1.6

1.8

2.0

2.2

2.4

t 0.9

75 ,

f

degrees of freedom f

68

Why Rule of Thumb Works for s2

Why should the rule of thumb work for s2 as estimator of σ2?

Recall: s2 ∼ σ2χ2n−1/(n − 1).CLT =⇒ approximate normality for s2 since

(n − 1)s2

σ2= χ2n−1 =

n−1∑i=1

Z 2i ≈ N (n − 1, 2(n − 1))

=⇒ s2 ≈ N(σ2, 2σ4/(n − 1)

)=⇒ s2 ± 2× SE (s2) = s2 ± 2× s2

√2

n − 1

since SE (s2) = s2√2/(n − 1) is the estimate of

σ2√2/(n − 1), the sampling standard deviation of s2.

69

Table of Con�dence Intervals for Flux3 Data

Although for testing H0 : µ1 = µ2 = µ3 in the case of the Flux3data the p-value of .05126 was not signi�cant at level α = .05 weillustrate the concepts of the di�erent types of con�dence intervalsfor the means.

95% intervals 95% intervalsFlux µ̂i si s using si using s

X 9.717 0.194 0.546 [9.513, 9.920] [ 9.242, 10.192]

Y 9.983 0.471 0.546 [9.489, 10.477] [ 9.508, 10.458]

Z 10.550 0.797 0.546 [9.714, 11.386] [10.075 , 11.025]

70

Plots of Con�dence Intervals for Flux3 DataS

IR

Flux X Flux Y Flux Z8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

●

●

●

●

●

●

using pooled s2 == ∑∑i==1

tsi

2((ni −− 1)) ((N −− t))

using individual si2

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

71

Tables of Con�dence Intervals for the Coagulation Data

For testing H0 : µ1 = µ2 = µ3 = µ4 in the case of thecoagulation data the p-value of 4.7 · 10−5 is highly signi�cant.We again illustrate the concepts of the di�erent types ofcon�dence intervals for the means.

95% intervals 95% intervalsDiet µ̂i si s using si using s

A 61 1.9 2.2 [57.9, 64.1] [58.7, 63.3]

B 66 2.1 2.2 [63.8, 68.2] [64.1, 67.9]

C 68 1.5 2.2 [66.4, 69.6] [66.1, 69.9]

D 61 2.6 2.2 [58.8, 63.2] [59.4, 59.4]

72

Plots of Con�dence Intervals for Coagulation Dataco

agul

atio

n tim

e (s

ec)

diet A diet B diet C diet D

5560

6570

7580

●

●

●

●●

●

●

●

using pooled s2 == ∑∑i==1

tsi

2((ni −− 1)) ((N −− t))

using individual si2

●

●

●

●●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●●

●

73

Simultaneous Con�dence Intervals

When constructing intervals of the type:

µ̂i ± tN−t,1−α/2s√ni

or µ̂i ± tni−1,1−α/2si√ni

for i = 1, . . . , t

we should be aware that these intervals don't simultaneouslycover their respective targets µi with probability 1− α.They do so individually. For example

P

(µi ∈ µ̂i ± tni−1,1−α/2

si√ni, i = 1, . . . , t

)

=t∏

i=1

P

(µi ∈ µ̂i ± tni−1,1−α/2

si√ni

)= (1− α)t < 1− α.

To get simultaneous 1− α coverage probability we shouldchoose 1−α? for individual interval coverage probability to get

(1− α?)t = 1− α or α? = 1− (1− α)1/t ≈ αt

= α̃t .

A problem in using a pooled estimate s: No independence!

74

α? = 1− (1− α)1/t ≈ α/t

0.00 0.02 0.04 0.06 0.08 0.10

0.00

0.01

0.02

0.03

0.04

desired overall α

α t*=

1−

(1−

α)(1

t) o

r α~

t=

αt

t = 10

t = 3

t = 5

t = 6

αt* = 1 − (1 − α)(1 t)

α~t = α t

75

Dealing with Dependence from Using Pooled s

Using a pooled estimate s for the standard deviation σ, theprevious con�dence intervals are no longer independent.

However, it can be shown that

P

(µi ∈ µ̂i ± tN−t,1−α?/2

s√ni, i = 1, . . . , t

)

≥t∏

i=1

P

(µi ∈ µ̂i ± tN−t,1−α?/2

s√ni

)= (1− α?)t = 1− α

This comes from the positive dependence between con�denceintervals through s.

If one interval is more (less) likely to cover µi due to s, so arethe other intervals more (less) likely to cover their µj .

Using the same compensation as in the independence casewould let us err on the conservative side, i.e., give us highercon�dence than the targeted 1− α.

76

Boole's and Bonferroni's Inequality

For any m events E1, . . . ,Em Boole's inequality states

P(E1 ∪ . . . ∪ Em) ≤ P(E1) + . . .+ P(Em)

For any m events E1, . . . ,Em Bonferroni's inequality states

P(E1 ∩ . . . ∩ Em) ≥ 1−m∑i=1

(1− P(Ei )) =m∑i=1

P(Ei )− (m − 1)

The statements are equivalent by taking complements.

If Ei ={µi ∈ µ̂i ± tN−t,1−α̃/2 s√ni

}with P(Ei ) = 1− α̃, then

the simultaneous coverage probability is bounded from below

P

(t⋂

i=1

Ei

)≥ 1−

t∑i=1

(1−P(Ei )) = 1−tα̃ = 1−α if α̃ = α̃t = α/t ,

We can achieve at least 1−α probability coverage by choosingthe individual coverage appropriately, namely 1− α̃ = 1− α/t.Almost same adjustment.

77

Decomposing the Mean Vector µ

The µi variation is best understood via familiar decomposition:

µ =

µ1...µ1......µt...µt

= µ̄ · 1N +

µ1 − µ̄...

µ1 − µ̄......

µt − µ̄...

µt − µ̄

The two vectors on the right are orthogonal to each other.

The �rst vector represents the projection of µ onto 1N .

The second represents the projection of µ onto Vt−1, a(t − 1)-dimensional subspace of the (N − 1)-dimensionalorthogonal complement VN−1 to 1N . See slide 51.

The second vector captures all aspects of variation in µ.78

Motivating Contrasts

Any linear function of (µ1 − µ̄, . . . , µt − µ̄) has to be of theform C =

∑ti=1 ciµi with

∑ti=1 ci = 0.

t∑i=1

ai (µi − µ̄) =t∑

i=1

aiµi −t∑

i=1

ai

t∑j=1

njNµj

=t∑

i=1

aiµi −t∑

i=1

niNµi

t∑j=1

aj =t∑

i=1

ciµi

with ci = ai −niN

t∑j=1

aj

where

t∑i=1

ci =t∑

i=1

ai −t∑

i=1

niN

t∑j=1

aj =t∑

i=1

ai −t∑

j=1

aj = 0 .

Such a function C =∑t

i=1 ciµi of the µi , with∑t

i=1 ci = 0,is called a contrast.

79

Examples of Contrasts

Say we have 4 treatments with respective means µ1, . . . , µ4.

We may be interested in contrasts of the following formC12 = µ1 − µ2 with c′ = (c1, . . . , c4) = (1,−1, 0, 0).Similarly for the other di�erences Cij = µi − µj . There are(42

)= 6 such contrasts.

Sometimes one of the treatments, say the �rst, is singled outas the control. We may then be interested in just the 3contrasts C12,C13 and C14 or we may be interested inC1.234 = µ1 − µ2+µ3+µ43 with c

′ = (1,−13 ,−13 ,−

13).

Sometimes the �rst 2 treatments share something in commonand so do the last 2. One might then try:C12.34 =

µ1+µ22− µ3+µ4

2with c′ = ( 1

2, 12,− 1

2,− 1

2)

di�erence of average treatment e�ect between between the 2camps.

80

Estimates and Con�dence Intervals for Contrasts

A natural estimate of C =∑t

i=1 ciµi is Ĉ =∑t

i=1 ci µ̂i =∑t

i=1 ci Ȳi..

We have E (Ĉ ) = E

(t∑

i=1

ci Ȳi.)

=t∑

i=1

ciE(Ȳi.)

=t∑

i=1

ciµi = C

and var(Ĉ ) = var

(t∑

i=1

ci Ȳi.)

=t∑

i=1

c2i var(Ȳi.)

=t∑

i=1

c2i σ2/ni .

Under the normality assumption for the Yij we have

Ĉ − C

s√∑t

i=1 c2

i /ni

∼ tN−t where s2 =∑t

i=1(ni − 1)s2iN − t

=

∑ij(Yij − Ȳi.)2N − t

= MSE .

=⇒ Ĉ ± tN−t,1−α/2 · s ·

√√√√ t∑i=1

c2i /ni

is a 100(1− α)% con�dence interval for C .

81

Testing H0 : C = 0

Based on the duality of testing and con�dence intervals we cantest the hypothesis H0 : C = 0 by rejecting it whenever theprevious con�dence interval does not contain C = 0.

Similarly, reject H0 : C = C0 by rejecting it whenever theprevious con�dence interval does not contain C = C0Another notation for this interval is

Ĉ ± tN−t,1−α/2 · SE (Ĉ ) where SE (Ĉ ) = s ·

√√√√ t∑i=1

c2i /ni .

SE (Ĉ ) is the standard error of Ĉ .

82

Simultaneous Con�dence Intervals for Contrasts

As with simultaneous con�dence intervals for means we needto face the issue of simultaneous coverage probability inrelation to the individual coverage probability for each interval.

We will introduce/compare several such procedures, althoughthere are still others.

Multiple comparisons is a very active research area.

Simultaneous Statistical Inference by Miller (1966)Multiple Comparison Procedures by Hochberg and Tamhane (1987)Multiple Comparisons: Theory and Methods by Hsu (1996)Multiple Comparisons and Multiple Tests by Westfall (2000)Multiple Comparisons Using R by Bretz, Hothorn, Westfall (2011)

83

Paired Comparisons: Fisher's Protected LSD Method

After rejecting H0 : µ1 = . . . = µt one is often interested inlooking at all

(t2

)pairwise contrasts Cij = µi − µj .

The following procedure is referred to as Fisher's ProtectedLeast Signi�cant Di�erence (LSD) Method.

It consists of possibly two stages:

1) Perform α level F -test for testing H0. If H0 is not rejected,stop.

2) If H0 is rejected, form all(t2

)(1− α)-level con�dence intervals

for Cij = µi − µj :

Îij = µ̂i − µ̂j ± tN−t,1−α/2 · s ·

√1

ni+

1

nj

and declare all µi − µj 6= 0 for which Îij does not contain zero.

Here LSD = tN−t,1−α/2 · s ·√

1ni

+ 1njis the Least Signi�cant Di�erence.

84

Comments on Fisher's Protected LSD Method

If H0 is true, the chance of making any statementscontradicting H0 is at most α.

This is the protected aspect of this procedure.

However, when H0 is not true there are many possiblecontingencies, some of which can give us a higher than desiredchance of pronouncing a signi�cant di�erence, when in factthere is none.

E.g., if all but one mean (say µ1) are equal and µ1 is far awayfrom µ2 = . . . = µt our chance of rejecting H0 is almost 1.

However, among the intervals for µi − µj , 2 ≤ i < j we may�nd a signi�cantly higher than α proportion of cases withwrongly declared di�erences.

This is due to the multiple comparison issue.

85

Pairwise Comparisons: Tukey-Kramer Method

The Tukey-Kramer method is based on the distribution of

Qt,f = max1≤i

Tukey-Kramer Method: Unequal Sample Sizes

The simultaneous intervals for all pairwise mean di�erenceswas due to Tukey.

It was limited by the requirement of equal sample sizes.

This was addressed by Kramer in the following way.

In the above con�dence intervals replace n in 1/√n by n?ij ,

where n?ij is the harmonic mean of ni and nj , i.e.,1/n?ij = (1/ni + 1/nj)/2 or n

?ij = 2ninj/(ni + nj).

Di�erent adjustment for each pair (i , j)!

It was possible to show

P(µi − µj ∈ µ̂i − µ̂j ± qt,f ,1−α s/

√n?ij ∀ i < j

)≥ 1− α

simultaneous con�dence intervals with coverage ≥ 1− α.

87

Tukey-Kramer Method for Coagulation Data

coag.tukey

Tukey-Kramer Method for Coagulation Data (continued)

s

Tukey-Kramer Results for Coagulation Data

> coag.tukey()[,1] [,2]

[1,] -9.275446 -0.7245544[2,] -11.275446 -2.7245544[3,] -4.056044 4.0560438[4,] -5.824075 1.8240748[5,] 1.422906 8.5770944[6,] 3.422906 10.5770944

Declare signi�cant di�erences inµ1 − µ2, µ1 − µ3, µ2 − µ4 , and µ3 − µ4.Under H0 the risk of declaring any signi�cant di�erences ≤ α.

PH0

(0 6∈ µ̂i − µ̂j ± qt,f ,1−α s/

√n?ij for some i < j

)≤ α

90

Bonferroni Con�dence Intervals for Pairwise Contrasts

Applying Bonferroni's method for simultaneous con�dencestatement, use α̃ = α/

(t2

)for individual con�dence statements

µi − µj ∈ µ̂i − µ̂j ± tN−t,1−α̃/2 · s ·

√1

ni+

1

nj

The individual coverage probability is 1− α̃.The joint coverage probability for all pairwise contrasts is

P(µi − µj ∈ µ̂i − µ̂j ± tN−t,1−α̃/2 · s ∀i < j)

≥ 1−(t

2

)(1− (1− α̃)) = 1−

(t

2

)α̃ = 1− α

91

Sche�é's Con�dence Intervals for All Contrasts

Sche�é took the F -test for testing H0 : µ1 = . . . = µt andconverted it into a simultaneous coverage statement aboutcon�dence intervals for all contrasts c′µ =

∑ti=1 ciµi :

P

c′µ ∈ c′µ̂±√(t − 1) · Ft−1,N−t,1−α · s ·( t∑i=1

c2i /ni

)1/2∀ c

= 1− α

Coverage statement for an in�nite number of contrasts.

It can be applied conservatively to all pairwise contrasts.

The resulting intervals tend to be quite conservative.

But it compares well with Bonferroni type intervals if appliedto many contrasts.

92

Pairwise Comparison Intervals for Coagulation Data

(simultaneous) 95%-Intervalsmean Fisher's Bonferroni Sche�é's all

di�erence Tukey-Kramer protected LSD inequality contrasts methodµ1 − µ2 -9.28 -0.72 -8.19 -1.81 -9.47 -0.53 -9.66 -0.34µ1 − µ3 -11.28 -2.72 -10.19 -3.81 -11.47 -2.53 -11.66 -2.34µ1 − µ4 -4.06 4.06 -3.02 3.02 -4.24 4.24 -4.42 4.42µ2 − µ3 -5.82 1.82 -4.85 0.85 -6.00 2.00 -6.17 2.17µ2 − µ4 1.42 8.58 2.33 7.67 1.26 8.74 1.10 8.90µ3 − µ4 3.42 10.58 4.33 9.67 3.26 10.74 3.10 10.90

Using any of the four methods declare signi�cant di�erences inµ1 − µ2, µ1 − µ3, µ2 − µ4 , and µ3 − µ4.

93

Simultaneous Paired Comparisons (95%)

−15

−10

−5

05

1015

µµ1 −− µµ2µµ1 −− µµ3



Pairwise Comparisons of Means (Coagulation Data): 1 −− αα == 0.95

Tukey−Kramer pairwise comparisonsFisher's protected LSDBonferroni intervalsScheffe's intervals for all contrasts

94

Simultaneous Paired Comparisons (99%)

−15

−10

−5

05

1015




Pairwise Comparisons of Means (Coagulation Data): 1 −− αα == 0.99

Tukey−Kramer pairwise comparisonsFisher's protected LSDBonferroni intervalsScheffe's intervals for all contrasts

95

Model Diagnostics

Model: Yij = µi + �ij , j = 1, . . . , ni , i = 1, . . . , t.

We made the following assumptions:

A1: {�ij} are independent;A2: var(�ij) = var(Yij) = σ

2 for all i , j(homogeneity of variances or homoscedasticity);

A3: {�ij} are normally distributed.These assumption allow us to

i perform the F -test for homogeneity of means,ii do power calculations,iii plan sample sizes to achieve a desired power,iv obtain simultaneous con�dence intervals for means/contrasts.

We will examine A2 and A3.

Won't deal with A1. Use judgment, examine serial correlation?

96

Checking Normality

Here we would like to check normality of�ij = Yij − µi , j = 1, . . . , ni , i = 1, . . . , t.Not knowing µi we estimate the error term �ij via�̂ij = Yij − µ̂i = Yij − Ȳi..If normality holds then a normal QQ-plot of all theseN = n1 + . . .+ nt estimated error terms (also called residuals)should look roughly linear with intercept near zero.

qqnorm(residual.vector) =⇒ normal QQ-plot.Slope ≈ σ. We have done this before in the single samplesituation and won't show repeats.

It is also possible to perform the formal EDF-based tests of �t(KS, CvM, and AD), but they would require minormodi�cations in the package nortest, not available now.

97

Checking Normality by Simulation

Can adapt the KS, CvM, and AD EDF test of �t criteria andsimulate their null distribution.

Limiting results by Pierce (1978) support this.

Use them to judge signi�cant non-normality in residuals.

DKS = max

{maxi

[i

N− U(i)

],max

i

[U(i) −

i − 1N

]}DCvM =

N∑i=1

[U(i) −

2i − 12N

]2+

1

12N

DAD = −N −1

N

N∑i=1

(2i − 1)[log(U(i)) + log(1− U(i))

]where Uij = Φ

(Yij − Ȳi.

s

)and U(1) ≤ . . . ≤ U(N) are the Uij in increasing order.

98

The Simulation

The distribution of

Uij = Φ

(Yij − Ȳi.

s

)= Φ

((Yij − µi )/σ − (Ȳi. − µi )/σ

s/σ

)does not depend on any unknown parameters.

Thus we may as well simulate the Yiji.i.d.∼ N (0, 1), compute

Ȳi., i = 1, . . . , t and s and then Uij , sort these values andcompute the respective EDF criteria.

Repeat this over and over, say Nsim = 10000 times, andcompare the EDF criteria for the actual data set against thesesimulated null distributions to obtain estimated p-values.

View this as potential homework.

It may be advantageous to modify the above EDF criteria ifsample sizes are quite di�erent (uncharted territory).

99

Hermit Crab Count Data

Hermit Crab counts were obtained at 6 di�erent coastline sites.

Each site obtained counts at 25 randomly selected transects.

Download the data �le crab.csv from the web into yourwork directory. crab

Plot of Hermit Crab Counts

●●

●

●

●

●●●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●● ●●

●

●●●●●

●

●

●

●●

●

●●●

●

●

●

●

●

●●●

●●●●●●●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●● ●●●●●●●●●●●

●

●

●

●●

●

●

●

●

●

●●●● ●●●●●●●

●

●●●

●

●

●

●●

●●

●●

●

●●●● ●●●●●●●

●●●●

●

●

●

●●

●

●

●●

●

●

●

●●

1 2 3 4 5 6

010

020

030

040

0

site

coun

t

101

ANOVA for Hermit Crab Count Data

> out.lm anova(out.lm)Analysis of Variance Table

Response: crab$countDf Sum Sq Mean Sq F value Pr(>F)

as.factor(crab$site) 5 76695 15339 2.9669 0.01401 *Residuals 144 744493 5170---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

> qqnorm(out.lm$residuals)> qqline(out.lm$residuals)

produced the (not so) normal QQ-plot for the ANOVA residuals onthe next slide.

102

Normal QQ-Plot of Hermit Crab Count ANOVA Residuals

●●

●

●

●

●●●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

● ●

●

●●

● ●●

●

●

●

●●

●

●● ●

●

●

●

●

●

●●●

●●●

●●

●● ●●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●●

●●●●●● ●●●●●

●

●

●

● ●

●

●

●

●

●

●●●●● ●● ●●● ●

●

●●●

●

●

●

● ●

●●

●●

●

●●

●●●●● ● ●●●●

●●●

●

●

●

●●

●

●

●●

●

●

●

●●

−2 −1 0 1 2

010

020

030

040

0

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

103

Checking for Homoscedasticity

The appropriate indicators for checking a constant varianceover all t treatment groups would seem to be s21 , . . . , s

2t .

There are various rules of thumb involvingFmin = min(s

21 , . . . , s

2t )/max(s

21 , . . . , s

2t ).

For example, if Fmin > 1/3 the constant variance assumptionshould be OK while for Fmin < 1/7 we should deal with it.

Where the 1/3 or 1/7 come from and what to do in between isnot clear.

With R it is simple to simulate the distribution for Fmin.

104

Fmin.test

The R function Fmin.test → on the class web site.It simulates the Fmin distribution, assuming normal sampleswith equal variances.

The sample sizes may vary.

The documentation for Fmin.test inside function body.

Use it to explore any desired rule of thumb, calculating theproportion of Fmin values ≤ to the rule of thumb value.If Fmin,observed is provided, it calculates the estimated p-valuefrom this simulated distribution.

See the next two slides for examples.

The validity of this test depends strongly on data normality.

105

Fmin.test(k=3,n=8,a.recip=7)

k = 3 , n = 8 , Nsim = 10000 , a = 1/ 7

min ((s12,, ……,, sk

2)) max ((s12,, ……,, sk

2))

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

050

100

150

200

0.04

6

106

Fmin.test(k=3,n=c(3,3,4),a.recip=7,

Fmin.observed=.1)

k = 3 , n = ( 3, 3, 4 ) , Nsim = 10000 , a = 1/ 7 , Fmin.observed = 0.1

min ((s12,, ……,, sk

2)) max ((s12,, ……,, sk

2))

Fre

quen

cy

0.0 0.2 0.4 0.6 0.8 1.0

050

100

150

200

250

300

350

0.41

3

0.30

7

107

Diagnostic Plots for Checking Homoscedasticity

One �rst diagnostic is to plot the residuals Yij − Ȳi. versus thecorresponding �tted values Ȳi. for j = 1, . . . , ni , i = 1, . . . , t.Compare the di�erence in information in the next two plots.

The second display: ⇒ variability increases with �tted value.Often there is a relationship between variability and the mean.

There are ways to deal with this by using variance stabilizingtransforms of the Yij .

108

plot(crab$site,out.lm$residuals,col="blue",

xlab="site",ylab="residuals",cex.lab=1.3)

●●

●

●

●

●●●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●●●●

●

●

●

●●

●

●●●

●

●

●

●

●

●●●

●●●●●●●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●●●●●●●●●●●

●

●

●

●●

●

●

●

●

●

●●●● ●●●●●●●

●

●●●

●

●

●

●●

●●

●●

●

●●●● ●●●●●●●

●●●●

●

●

●

●●

●

●

●●

●

●

●

●●

1 2 3 4 5 6

010

020

030

040

0

site

resi

dual

s

109

plot(out.lm$fitted.values,out.lm$residuals,

col="blue",xlab="fitted values",

ylab="residuals",cex.lab=1.3)

●●

●

●

●

●●●●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●●●●●

●

●

●

●●

●

●●●

●

●

●

●

●

●●●

●●●●●●●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●●●

●●●●●●●●●●●

●

●

●

●●

●

●

●

●

●

●●●● ●●●●●●●

●

●●●

●

●

●

●●

●●

●●

●

●●●● ●●●●●●●

●●●●

●

●

●

●●

●

●

●●

●

●

●

●●

10 20 30 40 50 60 70

010

020

030

040

0

fitted values

resi

dual

s

110

Levene's Test for Homoscedasticity

The modi�ed Levene test looks at Xij = |Yij − Ỹi |, where Ỹidenotes the median of the i th treatment sample.

Originally used Ȳi. in place of Ỹi , whence �modi�ed.�The idea is as follows: If the standard deviations in the tsamples Yi1, . . . ,Yini , i = 1, . . . , t are the same, then onewould expect to have roughly equal means for the Xij .

Check this by performing an ANOVA F -test on the Xij values.

The ANOVA F -test for means is not as sensitive to thenormality assumption as the F -test or Fmin.test forcomparing variances.

111

Levene's Test for Crab Count Data

crab.levene

A Multiplicative Error Model

Variability in the crab count data seemed proportional to thecount averages.

The variability did not show much normality.

Some random phenomena are not so much driven by additiveaccumulation of random contributions but more so bymultiplicative accumulations.

A crab colony could have started with a starting group size X0.

This group produced a random number X0 · X1 of new crabs,where X1 represents the reproduction rate per crab.

This rate is variable or random.

The next generation would have X0 · X1 · X2 crabs, and so on.This motivates the following variation model:Y = µ× � = µ · (X1 · X2 · . . .), where the random term � hasmean µ� and standard deviation σ�.

⇒ var(Y ) = µ2 · var(�) and µY = E (Y ) = µ · E (�).σY is proportional to µY since both are proportional to µ.

113

Variance Stabilization and Normality under log-Transform

Multiplicative error model =⇒ σ ∝ µ.Using log(Y ) = log(µ) + log(�) breaks the link

E (log(Y )) = log(µ) + E (log(�)) and var(log(Y )) = var(log(�))

µ a�ects the mean E (log(Y )) but no longer var(log(Y )).

An example of variance stabilization!

There is further bene�t in viewing the multiplicative error term� as a product of several random contributors.

By taking the transform log(Y ):

V = log(Y ) = log(µ)+log(�) = log(µ)+log(X1)+log(X2)+. . .

we can appeal to the CLT for the sum of the log(Xi ) terms.

This justi�es a normal additive error model for V , i.e.,V = µ̃+ �̃ with �̃ ∼ N (0, σ2).Apply this to the count data ⇒ the following familiar model:

Vij = log(Yij) = µ̃i + �̃ij with �̃iji.i.d.∼ N (0, σ2).

114

The Problem of Zero Counts

Some observed counts are zero ⇒ problem of log(0).We look at two ways of dealing with it.1. Adding a small fraction, say 1/6, to all counts.

1/6 > 0 is somewhat arbitrary.This is a technical solution, keeping all the data.

2. Eliminate all zero counts.

This may be justi�ed if a zero count just means that there

were no crabs in that transect to begin with.

Not a matter of not seeing them since population size is small.

This reduces the count data to 150− 33 = 117 counts.

115

Box Plots for count and log(count+1/6)

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●●

●

1 2 3 4 5 6

010

020

030

040

0co

unt

n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

●

●●●

●● ●●●

●

●●

●

●●●

●

●

●

●

1 2 3 4 5 6

0.2

1.0

5.0

20.0

100.

0

n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

site

coun

t (lo

g−sc

ale)

116

Normal QQ-Plots of 150 Residuals

● ● ● ● ● ● ● ● ●

●●●●●

●●●●●●●●●●●●●●●●●●●

●

●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●

●●●●●

●●●●●●●●●●●●●

●●●

●●●● ●

● ●● ● ● ●

●●

−2 −1 0 1 2

−4

−2

02

4

Normal QQ−Plot


Sam

ple

Qua

ntile

s

● ● ● ● ● ● ● ● ●

●●●●●

●●●●●●●●●●●●●●●●●●●

● ~ 0 counts

●

●

●●

●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●●

●

● ●

●

−2 −1 0 1 2

−2

−1

01

2

Simulated Normal QQ−Plot

Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●●

●●

●

●

●

●●●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

117

ANOVA for log(count+1/6)

Analysis of Variance Table

Response: log(count + 1/6)Df Sum Sq Mean Sq F value Pr(>F)

as.factor(site) 5 54.73 10.95 2.3226 0.04604 *Residuals 144 678.60 4.71---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

118

Box Plots for count and log(count[count>0])

●

●●

●

●

●

●

●

●

●

●●

●

●●●

●

●●

●

1 2 3 4 5 6

010

020

030

040

0co

unt

n1 == 25 n2 == 25 n3 == 25 n4 == 25 n5 == 25 n6 == 25

●

●●

●●●

●

●●

●

●

●

●

●

1 2 3 4 5 6

12

510

5020

0

n1 == 20 n2 == 21 n3 == 20 n4 == 19 n5 == 21 n6 == 16

site

coun

t (lo

g−sc

ale)

119

Normal QQ-Plots of 117 Residuals

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

● ●

●

●●

●

●

●

● ●

●●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●●

●

●

●●

●

●

●●

●

●

●

−2 −1 0 1 2

−3

−2

−1

01

23

Normal QQ−Plot


Sam

ple

Qua

ntile

s

●

●

●●

●●

●

●

●●

●●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●●

●

●

●●

●

●

●

● ●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●●

●

●

●●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●●

●●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●●

−2 −1 0 1 2

−2

−1

01

2


Sam

ple

Qua

ntile

s

●●

●

●

●

Stat 502 Design and Analysis of Experiments One-Factor...

Documents

Transcript of Stat 502 Design and Analysis of Experiments One-Factor...