Download - AP Statistics: ANOVA Section 1. In section 13.1 A, we used a t-test to compare the means between two groups. An ANOVA (ANalysis Of VAriance) test is used.

AP Statistics: ANOVA Section 1

In section 13.1 A, we used a t-test to compare the means between

two groups. An ANOVA (ANalysis Of VAriance) test is used to test for

a difference in means among several groups.

EXAMPLE: As a young student in Australia, Dominic Kelly enjoyed watching ants gather on

pieces of sandwich. Later, as a university student he decided to conduct a formal experiment. He chose three types of sandwich filling: vegemite,

peanut butter and ham & pickles. To conduct the experiment he randomly chose a sandwich,

broke off a piece and left it on the ground near an anthill. After several minutes, he placed a jar over the sandwich bit and counted the number of ants. He repeated this process until he had 8 samples for each of the eight sandwich fillings.

We wish to determine if the differences in the means are statistically significant. Hypotheses:

HPPBVH :0

sandwich ofbit thearound gathered

ants ofnumber mean theis each Where i

others thefromdifferent is oneleast At : iaH

The means and the standard deviations for the data above are given at the right. We can see that the sample means are different and that the mean of ham & pickles is quite a bit larger than the others.

BUT, is the difference statistically significant.

An assessment of the difference in means between several groups depends upon two

kinds of variability: how different the means are from each other AND the

amount of variability within each sample.

The basic idea of ANOVA is to split the total variability into these two parts

TOTAL variability of all data values(SST) = Variability BETWEEN groups (SSG) + Variability

WITHIN groups (SSE)

SST: sum of squares totalSSG: sum of squares groupSSE: sum of squares error

The variability between the groups (SSG) is a good measure of how much the group means vary, but we need to balance that against the background variation within

the groups (SSE).

We cannot compare these two pieces of variation directly, however. In our particular example, SSG is computed using only the 3

group means while SSE is computed using all 24 data values (remember, if we assume the H0 is

true, these 24 values are from a population with the same mean).

To put them on comparable scales, we involve degrees of freedom.

Degrees of freedom for groups = k – 1

Degrees of freedom for error = n – k

Note: df for groups + df for error = Total df 1 - )( )1( nknk

We can now compute the mean square for groups (MSG) and the mean square for

error (MSE)

An ANOVA test compares variability within groups to variability between groups. If the

ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means.

This ratio is called the F-statistic.

The table at the right is called an analysis of variance table. The information for the sums of squares for the example above is given. Fill in the missing parts of the table.

22123

22123

5.7802

1561MSG

7.13821

2913SSE

63.57.138

5.780F

5.7807.138

63.5

Note that our test statistic is F =, thus we will need to use the F-distribution to find our p-value. Like the -distribution, the

F-distribution is right skewed. When referencing the F distribution, the

numerator degrees of freedom are always given first, and the denominator degrees

of freedom are given second

tailed-right always is test theand

positive always is statistic test The :Note

2

We can use the F-distribution to find the p-value when the following conditions are true.

1. The data from each population should follow a Normal distribution. (Watch for clear skewness or outliers if the sample size is small)

2. The variability should be roughly the same within each group. (General Rule: Largest sd not more than twice the smallest sd)

Conditions:

Normalityabout

concerns real no shows

right at the dotplots The

25)st.dev.(9.smallest the

e than twicless is .63)st.dev.(14largest The

Calculations:

TI-83/84: 2nd VARS (DISTR) 9:Fcdf(lower limit, upper limit, df-numerator, df-denominator)

011.)21,2,10000,63.5( Fcdf

Conclusion:

.Hreject we.05, .011 Since 0

fillings.sandwich of types2least at for ants ofnumber

mean in the difference a is e that therconclude We

Example: Two sets of sample data, A and B, are given. Without doing any calculations, indicate in which set of sample data there

is likely to be stronger evidence of a difference in the two population means.

Dataset A Dataset B

Group 1 Group 2 Group 1 Group 212 25 15 2020 18 14 218 15 16 19

21 28 15 1914 14 15 21

20x 15 21 x20x 15 21 x

Dataset A Dataset B


21 28 15 1914 14 15 21

20x 15 21 x20x 15 21 x

An ANOVA test compares variability within groups to variability between groups. If the ratio of variability between groups to variability within groups is higher than we would expect just by random chance, we have evidence of a difference in means.

Dataset A Dataset B


21 28 15 1914 14 15 21

20x 15 21 x20x 15 21 x

groupsin

yvariabilit

groupsbetween

ility variabsame much thepretty

groupsbetween y variabilit

smallermuch

B groupin y variabilitevidencestronger

provides B Group