Chapter 15 ANOVA. Comparing Means for Several Populations When we wish to test for differences in...

Chapter 15

ANOVA

Comparing Means for Several Populations

When we wish to test for differences in means for only 1 or 2 populations, we use one- or two-sample t inference.

(We did two-sample t inferences in MAT 212)

Testing for differences in 2 or more populations, or at several different levels (values) of a variable involves a different approach.

This is called Analysis of Variance, or ANOVA.

ANOVA partitions the total sum of squares into two parts:

1. within treatment variability

2. between treatment variability

Comparing Means for Several Populations

Example: Test 5 types of concrete for differences in moisture absorption.

The 5 types of concrete are the five levels of the treatment.

Within Variability – this seeks to quantify the variability in absorption for one particular type of concrete.

Between Variability – this seeks to quantify the differences between the types of concrete.

ANOVA seeks to answer the question “Are the differences between the 5 sample means what is expected purely from random variation alone?”

Definitions

• An experimental unit is an object, or subject, that produces a sample measurement.

• The experimental conditions that define the different populations in a completely randomized design are called treatments.

• Testing for differences in the treatments is equivalent to testing for differences in the population means.

Graphical demonstration:Employing two types of variability

Graphical demonstration:Employing two types of variability

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Assumptions for ANOVA

• 1. The samples are independent and random– Selection of objects from any one population is

unrelated to the selection of objects from any of the other populations. Selections are random (one individual has as much chance of being selected as another.)

– Examples• Different groups of people (no person in more than one

group)• Different types of music• Different concentrations of chemicals• Different models of automobiles


• 2. All populations are normal• 3. Each population has the same standard

deviation, (which implies the same variance, σ2)But the values of the population standard deviations is

not known before testing.

• 4. Each sample has a mean that can be calculated. This mean is somehow representative of the population mean for its population.


The following assumptions are required for a 1-way ANOVA:• The k populations are independent.• Each population is normally distributed.• Each population has common standard deviation, .

• Each population has a mean, i for i = 1, 2, …, k.

So we now are testing whether all the treatment means are equal.

H0: 1 = 2 = … = k

Ha: At least two of the population means are not equal

Test Statistic

• If the null hypothesis is true, we expect the k sample means to have reasonably similar values.

• In other words, if the population means are equal, we would expect the variability among the sample means to be relatively small.

• Variability among the sample means is one of the things we will be testing for.

Test Statistic

• If the null hypothesis is true, we do not expect the population means to be exactly the same, because there is a chance factor in our choice of sample experimental units.

• We need to take into account the variability due to chance among the sample means.

Test Statistic

• This method is called “analysis of variance” of ANOVA because we are comparing two sources of variance: the variance among the sample means and the variation expected by chance among the sample means when the null hypothesis is true.

Test Statistic

• Our test statistic is called F.

• F = Variability among the sample means Variability expected by chance

Degrees of freedom

• For a sample, (or group) (k) df = n – 1

• Total df = total number of units in the experiment – 1

• Error df = Total df – Group df – Or

• Error df = N - k

Technology

• We will use Minitab (or StatCrunch or Excel) to do our calculations.

• A typical Minitab display is on the next slide.

ANOVA Table: Tensile Strength for 6 Machines

Analysis of Variance for Tensile-StrengthSource DF SS MS F P

Machine 5 5.34 1.07 0.31 0.902

Error 18 62.64 3.48

Total 23 67.98

SST = Sum of squares of treatment =

SSMachine = 5.34 (sample mean variability), k = 6 machines

SSError = 62.64 (variability due to chance)

Notice how much larger the “chance” variability is than the other.

There is little to no evidence that the machines differ in mean tensile-strength. Look at that HUGE p-value!

Another Example

• A sociologist conducts an experiment to compare the mean grade-point averages of first-year college students associated with four socioeconomic groups. The sociologist defines the four categories of interest to be: Poor, Lower Middle Class, Upper Middle Class, and Well-to-do. The experimenter knows that the populations of grade-point averages are normally distributed with equal standard deviations. At the end of the school year, the sociologist selects independent random samples of 10 grade-point averages for first year students in each of the four socioeconomic groups.

• Do the data provide sufficient evidence to indicate a difference in mean grade-point averages for at least two of the four socioeconomic groups?

Socioeconomics and GPA

• Treatments = 4 SOE groups• Response variable = GPA

• H0: μ1= μ2= μ3= μ4

• H1: At least two of the population means are not equal

• Decision rule:Accept H1 if the p-value < .05

• Test statistic: F• F = Variability among the sample means

Variability expected by chance


• Variability among sample means = MST = SST / k-1

• Variability due to chance = MSE = SSE / n-1

•

One-way ANOVA: GPA versus Group Source DF SS MS F PGroup 3 1.519 0.506 2.99 0.044Error 36 6.091 0.169Total 39 7.610


• F = 2.99. P-value = .044 p=value < .05• There is sufficient evidence to say that there is a

difference in the mean grade-point averages for at least two of the socioeconomic groups. We reach this conclusion at the 0.05 level of significance.

• Since we accepted the alternative hypothesis, we now need to state which means are different.


• We already have enough data to say that of the four groups, the Well-to-do have the highest mean GPA with 2.576, the Upper Middle is next with 2.717, followed by the Lower Middle with 2.520. The Poor have the lowest mean GPA with 2.264.

• But are these differences statistically significant?

Which means are different?

• We need to test each of the following pairs of hypotheses.

• Pair 1: Ho: μ1-μ2=0 Ha: μ1-μ2≠0• Pair 2: Ho: μ1-μ3=0 Ha: μ1-μ3≠0• Pair 3: Ho: μ1-μ4=0 Ha: μ1-μ4≠0• Pair 4: Ho: μ2-μ3=0 Ha: μ2-μ3≠0• Pair 5: Ho: μ2-μ4=0 Ha: μ2-μ4≠0• Pair 6: Ho: μ3-μ4=0 Ha: μ3-μ4≠0

Which means are different?

• To test each pair of hypothesis, we are only testing two means for a difference between them.

• This is the two-sample t-statistic that we used in Chapter 13 in MAT 212.

Which means are different

• However, it takes less time to calculate the confidence intervals for each pair and use these to make our inferences.

• If a confidence interval contains only positive numbers, we may conclude that the first mean is larger than the second

• If a confidence interval contains only negative numbers, we may conclude that the first mean is smaller than the second.

• If a confidence interval contains the number zero, there is insufficient evidence to conclude which mean is larger.

Which means are different

• To do this, we use StatCrunch, and the Tukey’s Multiple Comparisons

• (The notes in blue on the following slide are the conclusions drawn, these are not a result of StatCrunch.)

• Group = Lower Middle subtracted from:

• Group Lower Upper• Poor -0.6331 0.1131 (-,+) Not significant• Upper Middle -0.1801 0.5661 (-,+) Not significant • Well-to-do -0.1411 0.6051 (-,+) Not significant

• Group = Poor subtracted from:

• Group Lower Upper -• Upper Middle 0.0799 0.8261 (+,+) μ1 > μ2• Well-to-do 0.1189 0.8651 (+,+) μ1 > μ2• • Group = Upper Middle subtracted from:

• Group Lower Upper • Well-to-do -0.3341 0.4121 (-,+) Not significant•


• This shows that both Upper Middle Class and Well-to-do have higher mean GPA than Poor. There are no other statistically significant differences.

ANOVA – What is expected from you?

Be able to complete each of the following exercises:

• State the two hypotheses.• State the decision rule.• What is the test stat, and what is its formula. • What is the observed value of this test statistic? • Is this valid? • What is the p-value?• State a conclusion. • If you accepted the alternate hypothesis, you

then need to find out which means are different.

Another Example

• Is hair color related to pain sensitivity? To study this, an experimenter divides men and women of various ages into four hair color categories: light blond, dark blond, light brunette, and dark brunette. There are six people in each of the four categories. Each participant in the study receives a pain threshold score based upon his or her performance in a pain sensitivity test (the higher the score, the lower the person’s pain tolerance.)

Hair Color vs Pain Sensitivity

• The treatments are the hair color• The response variable is the sensitivity to pain

score.• H0: μ1= μ2= μ3= μ4

• H1: At least two of the population means of the scores are not equal

• Decision rule:Accept H1 if the p-value < .05• Test statistic: F• F = Variability among the sample means

Variability expected by chance

Hair Color vs Pain SensitivityOne-way ANOVA: Score versus Hair Color Source DF SS MS F PHair Color 3 908.8 302.9 5.44 0.007Error 20 1113.0 55.7Total 23 2021.8

H0: light_blond = dark_blond = … = dark_brunette

Ha: At least two population means are different.

F = 5.44 p-value = 0.007

At the .05 level of significance, there is overwhelming evidence to conclude that there is a difference among mean pain thresholds for people possessing these four hair colors.

Minitab• One-way ANOVA: Score versus Hair Color

• Source DF SS MS F P• Hair Color 3 908.8 302.9 5.44 0.007• Error 20 1113.0 55.7• Total 23 2021.8

• S = 7.460 R-Sq = 44.95% R-Sq(adj) = 36.69%

Hair Color = Dark Brunette subtracted from:

Hair Color Lower UpperLight Blond 8.182 26.151 Light Brunette 0.682 18.651

Hair Color = Dark Blond subtracted from:

Hair Color Lower UpperDark Brunette -15.818 2.151Light Blond 1.349 19.318Light Brunette -6.151 11.818

Hair Color = Dark Brunette subtracted from:

Hair Color Lower Upper Light Blond 8.182 26.151 Light Brunette 0.682 18.651

Hair Color vs Pain SensitivityExamine Minitab’s output to make the following table:

Pair From To Conclusion

D Brun – D Blon -15.8 2.1 NS (No difference)

L Blon – D Blon 1.3 19.3 L Blon > D Blon

L Brun – D Blon -6.2 11.8 NS (No difference)

L Blon – D Brun 8.2 26.2 L Blon > D Brun

L Brun – D Brun 0.7 18.7 L Brun > D Brun

L Brun – L Blon -16.5 1.5 NS (No difference)

Summarize the results.

When should we use the multiple comparison method?

• The sample data are obtained from the k populations using a completely randomized design

• An analysis of variance F-test indicates that there are some differences among the k population means.

• The objective is to determine which of the k population means differ. It is usually of interest to determine which mean might be the largest (or smallest).

Chapter 15 ANOVA. Comparing Means for Several Populations When we wish to test for differences in...

Documents

Transcript of Chapter 15 ANOVA. Comparing Means for Several Populations When we wish to test for differences in...