15. ANOVA p1
Transcript of 15. ANOVA p1
Assignment #7
Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20th by 2pm in your TA’s homework box
Lab Report
• Posted on web-site • Dates
– Rough draft due to TAs homework box on Monday Nov. 16th – Rough draft returned in your lab section the week of Nov. 23rd – Final draft due at start of your registered lab section the week of Nov. 30th
• 10% of course grade – Rough Draft - 5% – Final draft - 5% – If you’re happy with your rough draft mark, you can tell your TA to use it for
the final draft
• Read the “Writing a Lab Report” section of your lab notebook for guidance!!
Reading
For Today: Chapter 15 For Thursday: Chapter 15
In-class Exercise Do people use more paper when they know it will be recycled? • People given paper and told to test scissors. • Recycling bin either present or not No recycling bin: 4,4,4,4,4,4,4,5,8,9,9,9,9,12,12,13,14,14,14,14,15,23 Recycling bin: 4,5,8,8,8,9,9,9,12,14,14,15,16,19,23,28,40,43,129,130 1. Make histograms and identify options for test 2. Choose an test that you can do in class and conduct it
No Bin Present
Freq
uenc
y
0 20 40 60 80 100 120
02
46
8
Bin Present
Freq
uenc
y
0 20 40 60 80 100 120
0.0
1.0
2.0
3.0
Mann-Whitney U-Test
H0: The two groups do not differ in paper use HA: The two groups differ in paper use
Value 4 4 4 4 4 4 4 5 8 9 9 9 9 12 12 13 14 14 14 14 15 23 Rank 4.5 4.5 4.5 4.5 4.5 4.5 4.5 9.5 12.5 18 18 18 18 23 23 25 28.5 28.5 28.5 28.5 32.5 36.5
Value 4 5 8 8 8 9 9 9 12 14 14 15 16 19 23 28 40 43 129 130 Rank 4.5 9.5 12.5 12.5 12.5 18 18 18 23 28.5 28.5 32.5 34 35 36.5 38 39 40 41 42
No Bin Present:
Bin Present:
n1= 22 R1=379.5
n2= 20 R2=523.5
Calculating the test statistic, U
€
U1 = n1n2 +n1 n1+1( )2
− R1
€
U2 = n1n2 −U1
U1 = 22(20)+22 23( )2
−379.5= 313.5
U2 = 22(20)−313.5=126.5
U = 313.5
Mann-Whitney: Large sample approximation For n1 and n2 both greater than 10, use
€
Z =2U − n1n2
n1n2 n1+ n2 +1( ) / 3
Z = 2(313.5)− 22(20)22(20) 22+ 20+1( ) / 3
= 2.34
P = 2(0.00964) = 0.01928
P<0.05 so we reject the null hypothesis. People use more paper when there is a recycling bin
Chapter 14 Review
Goals of experiments
• Eliminate bias
• Reduce sampling error (increase precision and power)
Precise Imprecise
Biased
Unbiased
Design features that reduce bias
• Controls
• Random assignment to treatments
• Blinding
Controls
A group which is identical to the experimental treatment in all respects
aside from the treatment itself.
Randomization
The random assignment of treatments to
units in an experimental study
Breaks the association between possible confounding variables and the
explanatory variable.
Randomization
Supplemental Oxygen (Explanatory variable)
Survive Mt. Everest (Response variable)
Preparedness (Confounding variable)
?
Blinding
• Preventing knowledge of patient and/or experimenter of which treatment is given to whom – Single blind – blind patient – Double blind – blind patient and experimenter
• Unblinded studies usually find much larger effects (sometimes threefold higher), showing the bias that results from lack of blinding
Reducing sampling error
€
t =Y 1 −Y 2
sp2 1
n1+1n2
#
$ %
&
' (
Increasing the signal to noise ratio
"Signal"
"Noise"
Reducing sampling error Increasing the signal to noise ratio
If the "noise" is smaller, it is easier to detect a given "signal".
Can be achieved with smaller s or larger n. €
sp2 1n1
+1n2
"
# $
%
& ' .
Design features that reduce the effects of sampling error
• Replication
• Balance
• Blocking
• Extreme treatments
Replication
The application of every treatment to
multiple, independent experimental units
Balance
In a balanced experimental design, all
treatments have equal sample size.
Balance increases precision
€
SEY 1 − Y 2
= sp2 1
n1+1n2
#
$ %
&
' ( .
For a given total sample size (n1+n2), the standard error is smallest when n1=n2.
Blocking
The grouping of experimental units that have similar properties.
Within each block, treatments are
randomly assigned to experimental units.
Extreme Treatments
• Treatment effects are easiest to detect when they are large.
• Stronger treatments can increase the signal-to-noise ratio.
• Caution: effects may not scale linearly
Experiments with more than one factor
A factor is a single treatment variable whose effects are of interest to the researcher Multiple factors to: • Make more efficient use of money and
resources • Estimate effects of interaction between
factors
Interaction between explanatory variables
The effect of one variable depends on the
state of a second variable
Example of factorial design and interaction
What if we can’t do experimental studies?
Observational studies are still useful to detect patterns and generate hypotheses
Best observational studies Minimize bias: • Controls • Randomization • Blinding Minimize sampling error: • Replication • Balance • Blocking • Extreme treatments
Matching Every individual in the treatment group is paired with a control individual having the same or very similar values for the suspected confounding variables Does not account for all confounding variables (like randomization does), but only those used to match participants.
Chapter 15: ANOVA
Comparing the means of more than two groups
Analysis of variance (ANOVA)
• Like a t-test, but can compare more than two groups
• Asks whether any of two or more means is different from any other.
• In other words, is the variance among groups greater than 0?
Null hypothesis for simple ANOVA
• H0 : Variance among groups = 0
OR
• H0 : µ1 = µ2 = µ3 = µ4 = ... µk
X1 X2 X3
Freq
uenc
yµ1 = µ2 = µ3
X1 X2 X3
Freq
uenc
y
Not all µ's equal
H0: all populations have equal means
HA: at least one population mean is different.
ANOVA's v. t-tests
An ANOVA with 2 groups is mathematically equivalent to a two-tailed 2-sample t-test.
€
µ = 67.4
€
σ = 3.9
€
µY = µ = 67.4
€
σY =σn
=3.95
=1.7
NSampling
distribution of Y
Sampling distribution of Y
If we draw multiple samples from the same population, we are also drawing sample means
from an expected distribution.
€
σ x =σ x
n
Under the null hypothesis, the sample mean of each group should vary because of sampling error. The standard deviation of sample means, when the true mean is constant, is the standard error:
€
σx 2 =
σx2
n
Squaring the standard error, the variance among groups due to sampling error should be:
In ANOVA, we work with variances rather than standard deviations.
€
σ x 2 =
σ x2
n+ Variance µi[ ]
If the null hypothesis is not true, the variance among groups should be equal to the variance due to sampling error plus the real variance among population means.
With ANOVA, we test whether the variance among true group means is greater than zero. We do this by asking whether the observed variance among groups is greater than expected by chance (assuming the null is true):
€
σ x 2 >
σ x2
n?
€
σ x 2 >
σ x2
n?
nσ x 2 > σ x
2 ?
€
nσ x 2 is estimated by the
“Mean Squares Group”
€
σ x2 is the variance within groups,
estimated by the “Mean Squares Error”
€
MSgroup
€
MSerror
Population parameters Estimates from sample
Example
Treatment Mean Standard deviation n Control -0.3088 0.6176 8 Knees -0.3357 0.7908 7 Eyes -1.5514 0.7063 7
It is know that circadian rhythm is affected by the schedule of light to the eyes. Can circadian rhythm be affected by the schedule of light to the back of the knees?
ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P
Groups (treatment) 7.224 2 3.6122 7.29 0.004
Error 9.415 19 0.4955
Total 16.639 21
An ANOVA table is a convenient way to keep track of the important calculations.
Scientific papers often report ANOVA results with
ANOVA tables.
Partitioning the sum of squares
Yij −Y = (Yi −Y )+ (Yij −Yi )For each data point:
Partitioning the sum of squares
SStotal = ΣiΣ j (Yij −Y )2 = Σini (Yi −Y )
2 +ΣiΣ j (Yij −Yi )2
For the total variation in the dataset:
€
n σ x 2 + Variance µi[ ]( )
€
MSgroup
Mean squares group
Abbreviation:
Estimates this parameter:
Formula:
€
MSgroups =SSgroupsdfgroups
Mean squares group
€
SSgroup = ni X i − X ( )2∑
dfgroup = k-1
€
MSgroups =SSgroupsdfgroups
Mean squares error
Abbreviation:
Estimates this parameter:
Formula:
€
MSerror =SSerrordferror
€
σ x2
€
MSerror
Mean squares error Error sum of squares =
SSerror = ΣiΣ j (Yij −Yi )2 = dfisi
2 = si2 (ni −1)∑∑
Error degrees of freedom =
€
dferror = dfi = ni −1( )∑∑ = N − k
MSerror is like the pooled variance in a 2-sample t-test:
€
MSerror =SSerrordferror
=si2(ni −1)∑N − k
€
sp2 =
df1s12 + df2s2
2
df1+ df2
Test statistic: F
If H0 is true, then
€
n σ x 2 = σ x
2
In other words:
€
F =n σ x
2
σ x2 = 1
But, the above refer to population parameters. We must estimate F from samples with: MSgroup / MSerror
F if null hypothesis is false:
We test whether the F ratio is greater than one, as it would be if H0 is false:
€
F =n σ x
2 + Variance µi[ ]( )σ x2 >1
But we must take into account sampling error. Often, F calculated from data will be greater than one even when the null is true. Hence we must compare F to a null distribution.
ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P
Groups (treatment) 7.224 2 3.6122 7.29 0.004
Error 9.415 19 0.4955
Total 16.639 21
An ANOVA table is a convenient way to keep track of the important calculations.
Scientific papers often report ANOVA results with
ANOVA tables.
Example: Body temperature of squirrels in low, medium and
hot environments
Wooden & Walsberg (2004) Body temperature and locomotor capacity in a heterothermic rodent. Journal of Experimental Biology 207:41-46.
Squirrel body temperature data (°C) Cold: 30.4, 31.0, 31.2, 31.0, 31.5, 30.4, 30.6, 31.1,
31.3, 31.9, 31.4, 31.6, 31.5, 31.4, 30.3, 30.5, 30.3, 30.0, 30.8, 31.0
Warm: 36.3, 37.5, 36.9, 37.2, 37.5, 37.7, 37.5, 37.7,
38.0, 38.0, 37.6, 37.4, 37.9, 37.2, 36.3, 36.2, 36.4, 36.7, 36.8, 37.0, 37.7
Hot: 40.7, 40.6, 40.9, 41.1, 41.5, 40.8, 40.5, 41.0,
41.3, 41.5, 41.3, 41.2, 40.7, 40.3, 40.2, 41.3, 40.7, 41.6, 41.5, 40.5
Hypotheses
H0: Mean body temperature is the same for all three groups of squirrels. HA: At least one of the three is different from the others.
Summary data
Group s n
Cold 31.0 0.551 20
Warm 37.2 0.582 21
Hot 41.0 0.430 20 €
x
Total sample size:
€
N = n∑ = 20 + 21= 20 = 61
Error Mean square for squirrels
€
SSerror = dfisi2∑
= 19 0.551( )2 + 20 0.582( )2 +19 0.430( )2
= 16.1
dferror = 19 + 20 +19 = 58
MSerror =16.158
= 0.277
Squirrel Mean Squares Group:
€
X =20 31.0( ) + 21 37.2( ) + 20 41.0( )
20 + 21+ 20= 36.4
SSgroup = 20(31.0-36.4)2 +21(37.2-36.4)2 +20(41.0-36.4)2
=1015.7 €
SSgroup = ni X i − X ( )2∑
Squirrel Mean Squares Group:
dfgroup= k – 1 = 3-1=2
€
MSgroups =SSgroupsdfgroups
=1015.72
= 507.9
The test statistic for ANOVA is F
€
F =MSgroupMSerror
=507.90.277
= 1834.7
MSgroupis always in the numerator, MSerror is always in the denominator
Compare to Fα(1),df_group,df_error
F0.05(1),2,58 = 3.15. Since 1835 > 3.15, we know P < 0.05 and we can reject the null hypothesis. The variance in sample group means is bigger than expected given the variance within sample groups. Therefore, at least one of the groups has a population mean different from another group.
ANOVA table – squirrel data
Source SS df MS F P Group 1015.7 2 507.9 1834.7 <0.0001 Error 16.1 58 0.277 Total 1031.8 60
Variation Explained: R2
R2 =SSgroupsSStotal
The fraction of variation in Y that is “explained” by differences among groups
Assumptions of ANOVA
(1) Random samples
(2) Normal distributions for each population
(3) Equal variances for all populations.
Kruskal-Wallis test
• A non-parametric test similar to a single factor ANOVA
• Uses the ranks of the data points