Xuhua Xia
Head of the statistics Division at the Rothamsted ExperimentalStation in Hertfordshire. One of the three founders of theoretical population genetics. Developer of statistical methods, especially the likelihood methods. Published The Genetical Theory of Natural Selection in 1930, in which he proposed the fundamental theory of natural selection:
“To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.”
Ronald A. Fisher (1890-1962)
Xuhua Xia
Analysis of Variance (ANOVA)
• ANOVA was mainly developed by Ronald A. Fisher• The F statistic was named after him.• The essence of ANOVA is to partition the total variation
into its components. • Assumptions
– Normality– Equal variance among treatment groups
• Alternative methods
Xuhua Xia
xij = + i + ij vs. xij = + ij
One-way ANOVA Model
Is this effect zero?This is the same model for t-test, except that the subscript i is 1 and 2 in t-test, but 1, 2, ..., n in one-way ANOVA
Xuhua Xia
t-test and ANOVAMale Female193 175188 173185 168183 165180 163178170
n 7 5Mean 182.4286 168.8SS 329.7143 104.8Pooled Var 43.45143PooledSE 3.859745t 3.530951df 10P 0.0054Equal Var.? P= 0.4939
Groups Count Sum Average VarianceMale 7 1277 182.4286 54.95238Female 5 844 168.8 26.2
ANOVASource SS df MS F P-value
Between Groups541.7357 1 541.7357 12.46762 0.005438Within Groups434.5143 10 43.45143
Total 976.25 11
Xuhua Xia
Variance and Sum of Squares
1
)(1
2
2
1
N
xxs
N
xx
N
ii
N
ii
Sum of Squared Deviations
Degree of Freedom
Xuhua Xia
2X1X 3X
X
Within-groupdeviation
Between-groupdeviation
Partition of Variance
Grand Mean
Xuhua Xia
Treatment Low-fat food Medium-fat food High-fat food Weight Gain 0
2 4 6
8 10
Mean 1 5 9 SSB 2(1-5)2=32 2(5-5)2=0 2(9-5)2=32 SSW (0-1)2+(2-1)2 = 2 2 2 Grand Mean = (0 + 2 + 4 +…+ 10) / 6 = 5 SST = (0-5)2 + (2-5)2 +…+ (10-5)2 = 70, with df = 5 SSB = 32 + 0 + 32 = 64, with df = 2 SSW = 2 + 2 + 2 = 6, with df = 3 MSB = 64/2 = 32 MSW = 6/3 = 2 F = MSB/MSW = 16, DFnum = 2, DFdenom = 3, p = 0.0251
Numerical Illustration of One-Way ANOVA
1 5 91 5 9
Now repeat the ANOVA computation with the addition of the numbers in red. Email me SSB, SSW, DFnum, and DFdenom.
Xuhua Xia
Dependent variable: Weight Gain
Source DF SS MS F p
Model 2 64.0 32.0 16.0 0.0251
Error 3 6.0 2.0
Total 5 70.0
ANOVA Table
Xuhua Xia
Mean1 s12 Mean2 s2
2 s12/s2
2
3 3 1
3 2 1.5
F-distribution
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 0.5 1 1.5 2 2.5 3 3.5
F
f
1.4
1.6
0.6
...
2.4
3.0
2.6
2.9
Empirical F distribution
Xuhua Xia
Low-fat food Medium-fat food High-fat food
Weight 0 4 8gain 2 6 10
The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of fooddiffer significantly in their effect on weight gain of rabbits. In particular, Medium-fat and High-fat foods are significantly better than Low-fat food.However, Medium-fat and High-fat foods do not differ in their effect onrabbit weight gain.
One-way experimental design
Xuhua Xia
75 8276 8080 8577 8580 7877 8773 8277 82
n 8 8Mean 76.875 82.625Var 5.554 8.554GrandMean 79.750 79.750SST 231.000 subtotalSSB 66.125 66.125 132.250SSW 38.875 59.875 98.750dfT 15dfB 1dfW 14MSB 132.250MSW 7.054F = 18.749 P = 0.0007
Assumptions
75 8276 8080 8577 8580 7877 8773 8277 82
200n 8 9Mean 76.875 95.667Var 5.554 1538.250GrandMean 86.824 86.824SST 13840.471 subtotalSSB 791.786 703.810 1495.596SSW 38.875 12306.000 12344.875dfT 16dfB 1dfW 15MSB 1495.596MSW 822.992F = 1.817 P = 0.1976
Xuhua Xia
1
2
1
2
How should we allocate the two crop varieties to the plots? What comparison would be fair?
Block 1
Block 2
Block 3
Block 4
Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect) in evaluating the protein content of two wheat variaties.
Paired-sample t-test: 3
2
1
2
1
1
1
2
2
1
1
2
2
Xuhua Xia
13 2
24 3
31 4
34
1
The three crop varieties are randomly allocated to the plots within each block.
Block 1
Block 2
Block 3
Block 4
Using blocks to reduce confounding environmental factors (Everything else being equal except for the treatment effect).
Randomized Complete Blocks: Plots
4
1
2
2
11 1
22 2
33 3
44
4
1
2
3
4
Xuhua Xia
Which of the six strains of clover has the highest protein content? The experimenter divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly assigned his 6 strains to the 6 plots within each block. After harvesting, he determined the nitrogen content for each strain in each plot.
Randomized complete blocks
3dok1
3dok1
3dok4
3dok1
3dok1
3dok4
3dok4
3dok1
3dok4 3dok4
3dok5 3dok13 3dok13 3dok7 compo
3dok5
3dok13
3dok5
3dok5
3dok13
3dok5
3dok7
3dok7
3dok7
3dok13
3dok13
compo
compo
compo
compo
3dok13 3dok4Block 1
Block 2
Block 3
Block 4
Block 5
3dok13 3dok4
3dok13 3dok4
3dok13 3dok4
3dok13 3dok4
If only two strains:
Xuhua Xia
Bartlett’s Test
Feed 1 Feed 2 Feed 3 Feed 460.8 68.7 102.6 87.9
57 67.7 102.1 84.265 74 100.2 83.1
58.6 66.3 96.5 85.761.7 69.8 90.3
k 4 <==Number of groups SUMn 5 5 4 5 19SS 37.568 34.26 22.97 33.552 128.35v 4 4 3 4 15Inversev 0.25 0.25 0.333333 0.25 1.083333Var 9.392 8.565 7.656667 8.388lnVar 2.239858 2.147684 2.035577 2.126802v*lnVar 8.959433 8.590737 6.10673 8.507208 32.16411PooledVar 8.556667lnPooledVar 2.146711B 0.036552 <==More accurate than that in Zar (1996)C 1.112963Bc 0.032842P 0.998433
The null hypothesis for the F-test (or variance ratio test):
H0: v1 = v2.
The null hypothesis for Bartlett’s or Levene test:
H0: v1 = v2 = ... = vn.
The formulae in this sheet use defined variables in EXCEL:
Insert|name|define
Xuhua Xia
Class Levels Values
STRAIN 6 3dok1 3dok13 3dok4 3dok5 3dok7 compos
Number of observations in data set = 30
Analysis of Variance Procedure
Dependent Variable: NITROGEN Sum of MeanSource DF Squares Square F Value Pr > F
Model 5 847.046667 169.409333 14.37 0.0001Error 24 282.928000 11.788667Corrected Total 29 1129.974667
R-Square C.V. Root MSE NITROGEN Mean 0.749616 17.26515 3.43346 19.8867
Do Six Strains of Clover Differ?
Xuhua Xia
Duncan's Multiple Range Test for variable: NITROGEN
NOTE: This test controls the type I comparisonwise error rate, not the experimentwise error rate
Alpha= 0.05 df= 24 MSE= 11.78867
Difference spanning Number of Means 2 3 4 5 6 Critical Range 4.482 4.707 4.852 4.954 5.031
Means with the same letter are not significantly different.
Duncan Grouping Mean N STRAIN A 28.820 5 3dok1 B 23.980 5 3dok5 C B 19.920 5 3dok7 C D 18.700 5 compos E D 14.640 5 3dok4 E 13.260 5 3dok13
Multiple Comparison
Means are arranged in descending order.
Xuhua Xia
Comparisonwise & Experimentwise Errors
• Type I comparisonwise error rate is the probability of a Type I error for an individual test of hypothesis, symbolized by c.
• Type I experimentwise error rate is the probability of making at least one Type I error for a set of hypothesis tests, symbolized by e.
• If c = 0.05, and N hypotheses are tested, then e 1 – (1 - c)N.
• For 5 treatments in our case, there are a total of 10 pairwise comparisons between means. Thus, c = 0.05 would imply e 0.40. That is, if all means are in fact equal, there is roughly a probability of 0.4 that at least one hypothesis will be incorrectly rejected.
• If we are to control the experimentwise error rate below 0.05, we can set e = 0.05:
• e 1 – (1 - c)N = 1 – (1 - c)10 = 0.05
• and solve the equation, which yield c = 0.005. This of course would increase the difficulty to reject a null hypothesis, even if the null hypothesis is false.
Xuhua Xia
SAS output: I
Dependent Variable: nitrogen
Sum of Source DF Squares Mean Square F Value Pr > F
Model 9 1045.201333 116.133481 27.40 <.0001
Error 20 84.773333 4.238667
Corrected Total 29 1129.974667
R-Square Coeff Var Root MSE nitrogen Mean
0.924978 10.35268 2.058802 19.88667
Source DF Anova SS Mean Square F Value Pr > F
strain 5 847.0466667 169.4093333 39.97 <.0001
Block 4 198.1546667 49.5386667 11.69 <.0001
Xuhua Xia
Duncan's Multiple Range Test for nitrogen
NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate.
Alpha = 0.05, DFE = 20 MSE = 4.238667
Number of Means 2 3 4 5 6Critical Range 2.716 2.851 2.937 2.997 3.041 Means with the same letter are not significantly different. Duncan Grouping Mean N strain A 28.820 5 3dok1 B 23.980 5 3dok5 C 19.920 5 3dok7 C 18.700 5 compos D 14.640 5 3dok4 D 13.260 5 3dok13
Multiple Comparison
Xuhua Xia
Subjects Drug 1 Drug 2 Drug 31 164 152 1782 202 181 2223 143 136 1324 210 194 2165 228 219 2456 173 159 1827 161 157 165
Ex. ANOVA with repeated measures
What is the treatment effect? What is the block?
Analyze the data with SAS. Write a concise 1-page report. Submit at the beginning of the next class in hardcopy.
Xuhua Xia
Fresh food Rancid food
Male 695.67 535.33
Female642.67 517.33
Food 709, 679, 699 592, 538, 476 Consumed
657, 594, 677 508, 505, 539
Testing the effect of food and sex on rabbit food consumption
Two-way experimental design
Xuhua Xia
Dependent Variable: CONSUMED Sum of MeanSource DF Squares Square F Value Pr > F
Model 3 65903.5833 21967.8611 15.06 0.0012
Error 8 11666.6667 1458.3333
Corrected Total 11 77570.2500
R-Square C.V. Root MSE CONSUMED Mean
0.849599 6.388646 38.1881 597.750
Source DF Anova SS Mean Square F Value Pr > FFOOD 1 61204.0833 61204.0833 41.97 0.0002SEX 1 3780.7500 3780.7500 2.59 0.1460FOOD*SEX 1 918.7500 918.7500 0.63 0.4503
What is the interaction effect?
Xuhua Xia
What is Interaction?
When the effect of FOOD is independent of SEX, e.g., when fresh food is preferred by both males and females to the same extent, then there is no interaction term. When the effect of FOOD depends on SEX, e.g., when males eat more fresh food than rancid food but females eat less rancid food than fresh food, then there is an interaction effect.
0200400600800
1000120014001600
Male Female
Sex
Consumption
500
550
600
650
700
Male Female
Sex
ConsumptionFresh
Rancid
FreshRan
cid
Xuhua Xia
Fresh food Rancid food
Male 568.67 695.67
Female642.67 517.33
Food 592, 538, 576 709, 679, 699Consumed
657, 594, 677 508, 505, 539
Interaction Effect: Example
Xuhua Xia
Significant Interaction
Dependent Variable: CONSUMED Sum of MeanSource DF Squares Square F Value Pr > FModel 3 55920.2500 18640.0833 23.06 0.0003Error 8 6466.6667 808.3333Total 11 62386.9167
R-Square C.V. Root MSE CONSUMED Mean0.896346 4.690973 28.4312 606.083
Source DF Anova SS Mean Square F Value Pr > F
FOOD 1 47754.0833 47754.0833 59.08 0.0001SEX 1 2.0833 2.0833 0.00 0.9608FOOD*SEX 1 8164.0833 8164.0833 10.10 0.0130
Can we conclude that SEX has no effect on food consumption?
Xuhua Xia
proc format; value sexLevel 1='male' 2='female'; value foodLevel 1='fresh' 2='rancid';data assign63;do food=1 to 2; do sex=1 to 2; do n=1 to 3; input Consumed @@; output; end; end;end;format sex sexLevel. food foodLevel.;cards;709 679 699 657 594 677 592 538 476 508 505 539;proc anova; class food sex; model Consumed=food|sex; means food / duncan;run;
SAS Program for two-way ANOVA
Ex.
1. Rewrite the “data” block of the SAS program by using:
data assign63;input food sex consumed;cards;
......
;
2. Run the resulting program to check if the rewriting is correct.
Xuhua Xia
Race Sex Fresh Rancid
Short-ear Male
647.5 515.5
Female611 500.5
Long-ear Male
706 594.5
Female
652.5 548
Short-ear Male 650, 645 511, 520Female 610, 612 500, 501
Long-ear Male 700, 712 601, 588Female 650, 655 550, 546
Three-way ANOVA
Xuhua Xia
SAS Program
proc format; value sex 1='male' 2='female'; value food 1='fresh' 2='rancid'; value race 1='short-ear' 2='long-ear';
format sex sex. food food. race race.;
data assign71;input race sex food Consumed;cards;1 1 1 650 1 1 1 645 1 1 2 511 1 1 2 520 1 2 1 610 1 2 1 612 1 2 2 500 1 2 2 5012 1 1 700 2 1 1 712 2 1 2 601 2 1 2 588 2 2 1 650 2 2 1 655 2 2 2 550 2 2 2 546;proc anova; class food sex race; model Consumed=food|sex|race;
Optional, but will increase clarity in the output
Need to be in a new line, i.e., not
2 2 2 546;
Xuhua Xia
Dependent Variable: CONSUMED Sum of MeanSource DF Squares Square F Value Pr > F
Model 7 72138.4375 10305.4911 354.60 0.0001Error 8 232.5000 29.0625Corrected Total 15 72370.9375
R-Square C.V. Root MSE CONSUMED Mean 0.996787 0.903104 5.39096 596.938
Source DF Anova SS Mean Square F Value Pr > F
FOOD 1 52555.5625 52555.5625 1808.36 0.0001SEX 1 5738.0625 5738.0625 197.44 0.0001FOOD*SEX 1 203.0625 203.0625 6.99 0.0296RACE 1 12825.5625 12825.5625 441.31 0.0001FOOD*RACE 1 175.5625 175.5625 6.04 0.0395SEX*RACE 1 588.0625 588.0625 20.23 0.0020FOOD*SEX*RACE 1 52.5625 52.5625 1.81 0.2156
ANOVA Table
Xuhua Xia
data assign71;do race=1 to 2; do sex=1 to 2; do food=1 to 2; do n=1 to 2; input Consumed @@; output; end; end; end;end;cards;650 645 511 520 610 612 500 501700 712 601 588 650 655 550 546;proc anova; class food sex race; model Consumed=food|sex|race;run;
data assign71;input race sex food Consumed;cards;1 1 1 650 1 1 1 645 1 1 2 511 1 1 2 520 1 2 1 610 1 2 1 612 1 2 2 500 1 2 2 5012 1 1 700 2 1 1 712 2 1 2 601 2 1 2 588 2 2 1 650 2 2 1 655 2 2 2 550 2 2 2 546;
SAS program listing
Xuhua Xia
Class N MeanMembers of Royal family 97 64.04Clergy 945 69.49Lawyers 294 68.14Medical Profession 244 67.31English aristocracy 1179 67.31Gentry 1632 70.22Trade and commerce 513 68.74Officers in the Royal Navy 366 68.40English literature and science 395 67.55Officers of the Army 569 67.07Fine arts 239 65.96
The Efficacy of Prayer
Other data collected by Galton:1. Rate of successful delivery between church-going parents and others2. Life span of believers and non-believers from insurance companies
Galton’s data could be analyzed by an one-way ANOVA. One criterion for a good ANOVA design is that everything else being equal except for the treatment effect. Does the data set above satisfy this criterion?
(1822-1911)
Xuhua Xia
Replicate
1 2 1 32 6 7 5
Metabolic rate in rabbit liver cells, taken for two samples of liver tissue
1. Model I ANOVA tests the differential effects of the fixed treatment.xij = + i + ij
where i stands for fixed treatment effects (e.g., between male and femle).
2. Model II ANOVA tests the differential effects of a random variable and estimates itscontribution to total variance relative to that from measurement errors (for facilitatingexperimental design).xij = + Ai + ij
where Ai stands for random treatment effects (e.g., between randomly sampled rabbits).
Model I and Model II ANOVA
How can we optimize the experiment? More rabbits or more replicates?
Xuhua Xia
3.28 3.52 2.88
3.09 3.48 2.80
2.46 1.87 2.19
2.44 1.92 2.19
2.77 3.74 2.55
2.66 3.44 2.55
3.78 4.07 3.31
3.87 4.12 3.32
Determining Calcium Content in Leaves
Xuhua Xia
SAS Programdata turnip;Input plant leaf calcium @@;cards;1 1 3.28 1 1 3.09 1 2 3.52 1 2 3.481 3 2.88 1 3 2.80 2 1 2.46 2 1 2.442 2 1.87 2 2 1.92 2 3 2.19 2 3 2.193 1 2.77 3 1 2.66 3 2 3.74 3 2 3.443 3 2.55 3 3 2.55 4 1 3.78 4 1 3.874 2 4.07 4 2 4.12 4 3 3.31 4 3 3.31;proc nested; class plant leaf; var calcium;run;proc glm; class plant leaf; model calcium=plant leaf(plant);run;
Xuhua Xia
SAS Output: NESTED
Nested Random Effects Analysis of Variance for Variable CALCIUM
Variance DF Sum of ErrorSource Squares F Value Pr > F TermTOTAL 23 10.270396PLANT 3 7.560346 7.665 0.0097 LEAFLEAF 8 2.630200 49.409 0.0000 ERRORERROR 12 0.079850
Variance Variance PercentSource Mean Square Component of Total
TOTAL 0.446539 0.532938 100.0000PLANT 2.520115 0.365223 68.5302LEAF 0.328775 0.161060 30.2212ERROR 0.006654 0.006654 1.2486
Mean 3.01208333 Standard error of mean 0.32404445
Xuhua Xia
SAS Output: GLM
Dependent Variable: calcium Sum of Source DF Squares Mean Square F Value Pr > F Model 11 10.19054583 0.92641326 139.22 <.0001 Error 12 0.07985000 0.00665417 Corrected Total 23 10.27039583
R-Square Coeff Var Root MSE calcium Mean 0.992225 2.708195 0.081573 3.012083
Source DF Type I SS Mean Square F Value Pr > F plant 3 7.56034583 2.52011528 378.73 <.0001 leaf(plant) 8 2.63020000 0.32877500 49.41 <.0001
Source DF Type III SS Mean Square F Value Pr > F plant 3 7.56034583 2.52011528 378.73 <.0001 leaf(plant) 8 2.63020000 0.32877500 49.41 <.0001
Top Related