AP Statistics: ANOVA Section 1
In section 13.1 A, we used a t-test to compare the means between
two groups. An ANOVA (ANalysis Of VAriance) test is used to test for
a difference in means among several groups.
EXAMPLE: As a young student in Australia, Dominic Kelly enjoyed watching ants gather on
pieces of sandwich. Later, as a university student he decided to conduct a formal experiment. He chose three types of sandwich filling: vegemite,
peanut butter and ham & pickles. To conduct the experiment he randomly chose a sandwich,
broke off a piece and left it on the ground near an anthill. After several minutes, he placed a jar over the sandwich bit and counted the number of ants. He repeated this process until he had 8 samples for each of the eight sandwich fillings.
We wish to determine if the differences in the means are statistically significant. Hypotheses:
HPPBVH :0
sandwich ofbit thearound gathered
ants ofnumber mean theis each Where i
others thefromdifferent is oneleast At : iaH
The means and the standard deviations for the data above are given at the right. We can see that the sample means are different and that the mean of ham & pickles is quite a bit larger than the others.
BUT, is the difference statistically significant.
An assessment of the difference in means between several groups depends upon two
kinds of variability: how different the means are from each other AND the
amount of variability within each sample.
The basic idea of ANOVA is to split the total variability into these two parts
TOTAL variability of all data values(SST) = Variability BETWEEN groups (SSG) + Variability
WITHIN groups (SSE)
SST: sum of squares totalSSG: sum of squares groupSSE: sum of squares error
The variability between the groups (SSG) is a good measure of how much the group means vary, but we need to balance that against the background variation within
the groups (SSE).
We cannot compare these two pieces of variation directly, however. In our particular example, SSG is computed using only the 3
group means while SSE is computed using all 24 data values (remember, if we assume the H0 is
true, these 24 values are from a population with the same mean).
To put them on comparable scales, we involve degrees of freedom.
Degrees of freedom for groups = k – 1
Degrees of freedom for error = n – k
Note: df for groups + df for error = Total df 1 - )( )1( nknk
We can now compute the mean square for groups (MSG) and the mean square for
error (MSE)
An ANOVA test compares variability within groups to variability between groups. If the
ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means.
This ratio is called the F-statistic.
The table at the right is called an analysis of variance table. The information for the sums of squares for the example above is given. Fill in the missing parts of the table.
22123
22123
5.7802
1561MSG
7.13821
2913SSE
63.57.138
5.780F
5.7807.138
63.5
Note that our test statistic is F =, thus we will need to use the F-distribution to find our p-value. Like the -distribution, the
F-distribution is right skewed. When referencing the F distribution, the
numerator degrees of freedom are always given first, and the denominator degrees
of freedom are given second
tailed-right always is test theand
positive always is statistic test The :Note
2
We can use the F-distribution to find the p-value when the following conditions are true.
1. The data from each population should follow a Normal distribution. (Watch for clear skewness or outliers if the sample size is small)
2. The variability should be roughly the same within each group. (General Rule: Largest sd not more than twice the smallest sd)
Conditions:
Normalityabout
concerns real no shows
right at the dotplots The
25)st.dev.(9.smallest the
e than twicless is .63)st.dev.(14largest The
Calculations:
TI-83/84: 2nd VARS (DISTR) 9:Fcdf(lower limit, upper limit, df-numerator, df-denominator)
011.)21,2,10000,63.5( Fcdf
Conclusion:
.Hreject we.05, .011 Since 0
fillings.sandwich of types2least at for ants ofnumber
mean in the difference a is e that therconclude We
Example: Two sets of sample data, A and B, are given. Without doing any calculations, indicate in which set of sample data there
is likely to be stronger evidence of a difference in the two population means.
Dataset A Dataset B
Group 1 Group 2 Group 1 Group 212 25 15 2020 18 14 218 15 16 19
21 28 15 1914 14 15 21
20x 15 21 x20x 15 21 x
Dataset A Dataset B
Group 1 Group 2 Group 1 Group 212 25 15 2020 18 14 218 15 16 19
21 28 15 1914 14 15 21
20x 15 21 x20x 15 21 x
An ANOVA test compares variability within groups to variability between groups. If the ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means.
Dataset A Dataset B
Group 1 Group 2 Group 1 Group 212 25 15 2020 18 14 218 15 16 19
21 28 15 1914 14 15 21
20x 15 21 x20x 15 21 x
groupsin
yvariabilit
groupsbetween
ility variabsame much thepretty
groupsbetween y variabilit
smallermuch
B groupin y variabilitevidencestronger
provides B Group
Top Related