Post on 26-Sep-2020
Unit 4: Inference for numerical variablesLecture 4: ANOVA
Statistics 104
Mine Cetinkaya-Rundel
October 24, 2013
Announcements
Announcements
PA opens at 5pm today, due Sat evening (based on feedback onmidterm evals)
If I still have your midterm or project proposal, pick it up at theend of class
New unit next week...
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 2 / 1
The t distribution for the difference of two means
Diamonds
0.99 carat 1 caratpt99 pt100
x 44.50 53.43s 13.32 12.22n 23 30
carat = 0.99 carat = 1
20
30
40
50
60
70
80
These data are a random sample from the diamonds data set in ggplot2 R package.Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 3 / 1
The t distribution for the difference of two means Hypothesis testing for the difference of two means
From last time..
We are interested in finding out if the average point price of a 1 caratdiamond is higher than the average point price of a 0.99 caratdiamond.
H0 :µpt99 = µpt100
HA :µpt99 < µpt100
SE = 3.56
T = −2.508
df = 22
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 4 / 1
The t distribution for the difference of two means Hypothesis testing for the difference of two means
p-value
Clicker question
Which of the following is the correct p-value for this hypothesis test?
T = −2.508
(a) between 0.005 and 0.01
(b) between 0.01 and 0.025
(c) between 0.01 and 0.025
(d) between 0.02 and 0.05
(e) between 0.01 and 0.02
one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010
df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 5 / 1
The t distribution for the difference of two means Confidence intervals for the difference of two means
Application exercise: t interval for comparing means
The equivalent confidence level for a one sided HT with α = 0.05 is90%. Calculate a 90% confidence interval for the average differencebetween the point prices of 0.99 and 1 carat diamonds, and choosethe closest answer below. Then, interpret this interval in context of thedata.
(a) (-15.05, -2.81)
(b) (-15.05, -2.81)
(c) (-15.91, -1.95)
(d) (-16.30, -1.56)
(e) (-15.05, 2.81)
(f) (-16.30, 1.56)
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 6 / 1
The t distribution for the difference of two means Confidence intervals for the difference of two means
Solution
one tail 0.100 0.050 0.025 0.010 0.005two tails 0.200 0.100 0.050 0.020 0.010
df 21 1.32 1.72 2.08 2.52 2.8322 1.32 1.72 2.07 2.51 2.8223 1.32 1.71 2.07 2.50 2.8124 1.32 1.71 2.06 2.49 2.8025 1.32 1.71 2.06 2.49 2.79
(xpt99 − xpt1) ± t?df × SE = (44.50 − 53.43) ± 1.72 × 3.56
= −8.93 ± 6.12
= (−15.05,−2.81)
We are 90% confident that the average point price of a 0.99 caratdiamond is $15.05 to $2.81 lower than the average point price of a 1carat diamond.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 7 / 1
The t distribution for the difference of two means Confidence intervals for the difference of two means
Synthesis
How (if at all) would this conclusion change your behaviour if you wentdiamond shopping?
Maybe buy a 0.99 carat diamond?It looks like a 1 carat, but issignificantly cheaper.
http:// rstudio-pubs-static.s3.amazonaws.com/
2176 75884214fc524dc0bc2a140573da38bb.html
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 8 / 1
ANOVA Classy vocabulary
The GSS gives the following 10 question vocabulary test:
A SPACE (school, noon, captain, room, board, don’t know)B BROADEN (efface, make level, elapse, embroider, widen, don’t know)C EMANATE (populate, free, prominent, rival, come, don’t know)D EDIBLE (auspicious, eligible, fit to eat, sagacious, able to speak, don’t know)E ANIMOSITY (hatred, animation, disobedience, diversity, friendship, don’t know)F PACT (puissance, remonstrance, agreement, skillet, pressure, don’t know)G CLOISTERED (miniature, bunched, arched, malady, secluded, don’t know)H CAPRICE (value, a star, grimace, whim, inducement, don’t know)I ACCUSTOM (disappoint, customary, encounter, get used to, business, don’t know)J ALLUSION (reference, dream, eulogy, illusion, aria, don’t know)
vocabulary scores
0 2 4 6 8 10
010
020
0
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 9 / 1
ANOVA Classy vocabulary
The GSS also asks the following question: “If you were asked to useone of four names for your social class, which would you say youbelong in: the lower class, the working class, the middle class, or theupper class?”
LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS
(self reported) class
0.0
0.1
0.2
0.3
0.4
0.5
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 10 / 1
ANOVA Classy vocabulary
Data
wordsum class1 6 MIDDLE CLASS2 9 WORKING CLASS3 6 WORKING CLASS4 5 WORKING CLASS5 6 WORKING CLASS6 6 WORKING CLASS7 8 MIDDLE CLASS8 10 WORKING CLASS9 8 WORKING CLASS
10 9 UPPER CLASS· · ·
795 9 MIDDLE CLASS
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 11 / 1
ANOVA Classy vocabulary
Exploratory analysis
●
●●●●●●
●●
●
●●●●
LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS
02
46
810
n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 12 / 1
ANOVA ANOVA and the F test
Clicker question
Which of the following plots shows groups with means that are mostand least likely to be significantly different from each other?
●
●
510
1520
2530
35
I
●
●●−
50
510
1520
II●
−5
05
1015
2025
III
(a) most: I, least: II
(b) most: I, least: II
(c) most: II, least: III
(d) most: I, least: III
(e) most: III, least: II
(f) most: II, least: I
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 13 / 1
ANOVA ANOVA and the F test
Research question
Is there a difference between the average vocabulary scores of Amer-icans from different (self reported) classes?
To compare means of 2 groups we use a Z or a T statistic.
To compare means of 3+ groups we use a new test calledANOVA and a new statistic called F.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 14 / 1
ANOVA ANOVA and the F test
ANOVA - hypotheses
H0 : The mean outcome is the same across all categories,
µ1 = µ2 = · · · = µk ,
where µi represents the mean of the outcome for observations incategory i.
HA : At least one pair of means are different from each other.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 15 / 1
ANOVA ANOVA and the F test
z/t test vs. ANOVA - Purpose
z/t test
Compare means from two groupsto see whether they are so farapart that the observed differencecannot reasonably be attributed tosampling variability.
H0 : µ1 = µ2
ANOVA
Compare the means from two ormore groups to see whether theyare so far apart that the observeddifferences cannot all reasonablybe attributed to samplingvariability.
H0 : µ1 = µ2 = · · · = µk
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 16 / 1
ANOVA ANOVA and the F test
z/t test vs. ANOVA - Method
z/t test
Compute a test statistic (a ratio).
z/t =(x1 − x2) − (µ1 − µ2)
SE(x1 − x2)
ANOVA
Compute a test statistic (a ratio).
F =variability bet. groupsvariability w/in groups
Large test statistics lead to small p-values.
If the p-value is small enough H0 is rejected, and we concludethat the population means are not equal.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 17 / 1
ANOVA ANOVA and the F test
F distribution and p-value
F =variability bet. groupsvariability w/in groups
In order to be able to reject H0, we need a small p-value, whichrequires a large F statistic.
In order to obtain a large F statistic, variability between samplemeans needs to be greater than variability within sample means.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 18 / 1
ANOVA ANOVA and the F test
Goal: Determine measures of variability between and within groups,so that we can make a decision on the hypotheses based on howthey compare to each other.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 19 / 1
ANOVA ANOVA output, deconstructed
Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628
Total 794 3106.36
Sum of squares total, SST
Measures the total variability in the data
SST =n∑
i=1
(xi − x)2
where xi represent the value of the response variable of each obser-vation in the dataset.[Very similar to calculation of variance, except not scaled by the sample size.]
SST = (6 − 6.14)2 + (9 − 6.14)2 + · · ·+ (9 − 6.14)2
= 3106.36
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 20 / 1
ANOVA ANOVA output, deconstructed
Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628
Total 794 3106.36
Sum of squares between groups, SSG
Measures the variability between groups, i.e. how the group meanscompare to the grand mean
SSG =k∑
j=1
ni(xj − x)2
nj : each group size, xj : average for each group, x: overall (grand)mean[Explained variability: deviation of group mean from overall mean, weighted by sample size.]
n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34overall 795 6.14 1.98
SSG =(41 × (5.07 − 6.14)2
)+
(407 × (5.75 − 6.14)2
)+
(331 × (6.76 − 6.14)2
)+
(16 × (6.19 − 6.14)2
)= 236.56Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 21 / 1
ANOVA ANOVA output, deconstructed
Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628
Total 794 3106.36
Sum of squares error, SSE
Measures the variability within groups:
SSE = SST − SSG
[Unexplained variability, i.e. unexplained by the group variable, due to other reasons]
SSE = 54.29 − 16.96 = 37.33
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 22 / 1
ANOVA ANOVA output, deconstructed
now we need a way to get from these measures of total variability toaverage variability (scaling by a measure that incorporates samplesizes and number of groups→ degrees of freedom)
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 23 / 1
ANOVA ANOVA output, deconstructed
Application exercise: ANOVA output
Fill in the rest of the ANOVA table, and make a decision on thehypotheses. Submit your decision using your clicker.
The data provide convincing evidence that the:
(a) average vocabulary scores are different for all classes.
(b) average vocabulary score for middle class is higher than the average forthe lower class.
(c) average vocabulary score is different for at least one pair of classes.
(d) average vocabulary score is different for at least one pair of classes.
(e) average vocabulary scores are the same for all classes.
(f) average vocabulary scores are different for upper and lower classes.
Note that you will need access to R to calculate the p-value. You canuse the following function:> pf(F-score, df_group, df_error, lower.tail = FALSE)
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 24 / 1
ANOVA ANOVA output, deconstructed
Relevant formulasDegrees of freedom associated with ANOVA
groups: dfG = k − 1, where k is the number of groups
total: dfT = n − 1, where n is the total sample size
error: dfE = dfT − dfG
Mean squares
Associated sum of squares divided by the associated df: MS = SS/df
Test statistic, F value
Ratio of the between group and within group variability: F = MSGMSE
p-value
Probability of at least as large a ratio between the “between group” and “withingroup” variability as the one observed, if in fact the means of all groups areequal – calculated as the area under the F curve, with degrees of freedomdfG and dfE , above the observed F statistic.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 25 / 1
ANOVA ANOVA output, deconstructed
Solution
Df Sum Sq Mean Sq F value Pr(>F)(Group) class 3 236.56 78.855 21.735 <0.0001(Error) Residuals 791 2869.80 3.628
Total 794 3106.36
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 26 / 1
ANOVA Checking conditions
(1) independence
If the data are a simple random sample from less than 10% ofthe population, this condition is satisfied.
Carefully consider whether the data may be independent (e.g. nopairing).
Always important, but sometimes difficult to check.
Does this condition appear to be satisfied?
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 27 / 1
ANOVA Checking conditions
(2) approximately normal
The observations within each group should be nearly normal(especially important when the sample sizes are small.)
Does this condition appear to be satisfied?
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
02
46
8
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
−3 −2 −1 0 1 2 32
46
810
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●●
●
●
−3 −2 −1 0 1 2 3
24
68
10
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
●
●
●
●
● ●
● ●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
24
68
Normal Q−Q Plot
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 28 / 1
ANOVA Checking conditions
(3) constant variance
The variability across the groups should be about equal (especiallyimportant when the sample sizes differ between groups.)
Does this condition appear to be satisfied?
●
●●●●●●
●●
●
●●●●
LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS
02
46
810
n mean sdlower class 41 5.07 2.24working class 407 5.75 1.87middle class 331 6.76 1.89upper class 16 6.19 2.34
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 29 / 1
Multiple comparisons & Type 1 error rate
Which means differ?
Earlier we concluded that at least one pair of means differ. Thenatural question that follows is “which ones?”
We can do two sample t tests for differences in each possiblepair of groups.
Can you see any pitfalls with this approach?
When we run too many tests, the Type 1 Error rate increases.
This issue is resolved by using a modified significance level.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 30 / 1
Multiple comparisons & Type 1 error rate
Multiple comparisons
The scenario of testing many pairs of groups is called multiplecomparisons.
The Bonferroni correction suggests that a more stringentsignificance level is more appropriate for these tests:
α? = α/K
where K is the number of comparisons being considered.
If there are k groups, then usually all possible pairs arecompared and K =
k(k−1)2 .
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 31 / 1
Multiple comparisons & Type 1 error rate
Determining the modified α
Clicker question
In the aldrin data set depth has 3 levels: bottom, mid-depth, and sur-face. If α = 0.05, what should be the modified significance level for twosample t tests for determining which pairs of groups have significantlydifferent means?
(a) α∗ = 0.05
(b) α∗ = 0.05/2 = 0.025
(c) α∗ = 0.05/4 = 0.0125
(d) α∗ = 0.05/6 = 0.0083
(e) α∗ = 0.05/6 = 0.0083
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 32 / 1
Multiple comparisons & Type 1 error rate
Which means differ?
Based on the box plots below, which means would you expect to besignificantly different?
●
●●●●●●
●●
●
●●●●
LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS
02
46
810
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 33 / 1
Multiple comparisons & Type 1 error rate
Which means differ? (cont.)
When doing multiple comparisons after ANOVA, since theassumption of equal variability across groups must have beensatisfied, we re-think how we measure the standard error and thedegrees of freedom.For all comparisons, use a consistent
SE: calculate SE using spooled =√
MSE instead of s1 and s2.
SE =
√s2
1
n1+
s22
n2→ SE =
√MSE
n1+
MSEn2
df: use df = dfE from ANOVA instead of df calculated based onindividual sample sizes n1 and n2.
df = min(n1 − 1, n2 − 1)→ df = dfE
Finally, compare the p-value of this test to the modifiedsignificance level (α?).
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 34 / 1
Multiple comparisons & Type 1 error rate
[Time permitting] Is there a difference between the average vocabularyscores between middle and lower class Americans?
TdfE =(xmiddle − xlower)√
MSEnmiddle
+ MSEnlower
T791 =(6.76 − 5.07)√
3.628331 + 3.628
41
=1.690.315
= 5.365
p − value = 1.06 × 10−7 (two-sided)
α? = 0.05/6 = 0.0083
Reject H0, the data provide convincing evidence of a difference between theaverage vocabulary scores of those from the lower and middle classes.
Statistics 104 (Mine Cetinkaya-Rundel) U4 - L4: ANOVA October 24, 2013 35 / 1