Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp...

41
679 14.1 One-Way ANOVA: Comparing Several Means 14.2 Estimating Differences in Groups for a Single Factor 14.3 Two-Way ANOVA Comparing Groups: Analysis of Variance Methods 14 000200010271675566_CH24.pdf 7/18/2012 11:17:05 AM 000200010271675566_CH24.pdf 7/18/2012 11:17:05 AM

Transcript of Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp...

Page 1: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

679

14.1 One-Way ANOVA: Comparing Several Means

14.2 Estimating Differences in Groups for a Single Factor

14.3 Two-Way ANOVA

Comparing Groups: Analysis of Variance Methods 14

000200010271675566_CH24.pdf 7/18/2012 11:17:05 AM000200010271675566_CH24.pdf 7/18/2012 11:17:05 AM

Page 2: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

680 Chapter 14 Comparing Groups: Analysis of Variance Methods

Example 1

Investigating Customer Satisfaction Picture the Scenario In recent years, many companies have increased the attention paid to mea-suring and analyzing customer satisfaction. Here are examples of two recent studies of customer satisfaction: ! A company that makes personal computers has provided a toll-free tele-

phone number for owners of their PCs to call and seek technical support. For years the company had two service centers for these calls: San Jose, California, and Toronto, Canada. Recently the company outsourced many of the customer service jobs to a new center in Bangalore, India, because employee salaries are much lower there. The company wanted to compare customer satisfaction at the three centers.

! An airline has a toll-free telephone number that potential customers can call to make flight reservations. Usually the call volume is heavy and callers are placed on hold until an agent is free to answer. Researchers working for the airline recently conducted a randomized experiment to analyze whether callers would remain on hold longer if they heard (a) an advertisement about the airline and its current promotions, (b) recorded Muzak (“elevator music”), or (c) recorded classical music. Currently, messages are five minutes long and then repeated; the researchers also wanted to find out if it would make a difference if messages were instead 10 minutes long before repeating.

Questions to Explore In the second study, the company’s CEO had some familiarity with statisti-cal methods, based on a course he took in college. He asked the researchers: ! In this experiment, are the sample mean times that callers stayed on hold

before hanging up significantly different for the three recorded message types?

! What conclusions can you make if you take into account both the type of recorded message and whether it was repeated every five minutes or every ten minutes?

Thinking Ahead Chapter 10 showed how to compare two means. In practice, there may be several means to compare, such as in the second example. This chapter shows how to use statistical inference to compare several means. We’ll see how to determine whether a set of sample means is significantly different and how to estimate the differences among corresponding population means. To illus-trate, we’ll analyze data from the second study in Examples 2–4 and 7 .

The methods introduced in this chapter apply when a quantitative response variable has a categorical explanatory variable. The categories of the explana-tory variable identify the groups to be compared in terms of their means on the response variable. For example, the first study in Example 1 compared mean cus-tomer satisfaction for three groups—customers who call the service centers at the three locations. The response variable is customer satisfaction (on a scale of 0 to 10), and the explanatory variable is the service center location.

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 3: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.1 One-Way ANOVA: Comparing Several Means 681

The analysis of variance method compares means of several groups. Let g denote the number of groups. Each group has a corresponding population of sub-jects. The means of the response variable for the g populations are denoted by µ1, µ2, c, µg.

Hypotheses and Assumptions for the ANOVA Test Comparing Means The analysis of variance is a significance test of the null hypothesis of equal pop-ulation means,

H0: µ1 = µ2 = g = µg.

An example is H0: µ1 = µ2 = µ3 for testing population mean satisfaction at g = 3 service center locations. The alternative hypothesis is

Ha: at least two of the population means are unequal.

If H0 is false, perhaps all the population means differ, but perhaps merely one mean differs from the others. The test analyzes whether the differences observed among the sample means could have reasonably occurred by chance, if the null hypothesis of equal population means were true.

The assumptions for the ANOVA test comparing population means are as follows:

! The population distributions of the response variable for the g groups are nor-mal, with the same standard deviation for each group.

! Randomization (depends on data collection method): In a survey sample, independent random samples are selected from each of the g populations. For an experiment, subjects are randomly assigned separately to the g groups.

Under the first assumption, when the population means are equal, the popula-tion distribution of the response variable is the same for each group. The popula-tion distribution does not depend on the group to which a subject belongs. The ANOVA test is a test of independence between the quantitative response vari-able and the group factor.

The inferential method for comparing means of several groups is called analy-sis of variance , denoted ANOVA . Section 14.1 shows that the name “analysis of variance” refers to the significance test’s focus on two types of variability in the data. Section 14.2 shows how to construct confidence intervals comparing the means. It also shows that ANOVA methods are special cases of a multiple regression analysis.

Categorical explanatory variables in multiple regression and in ANOVA are often referred to as factors . ANOVA with a single factor, such as service center location, is called one-way ANOVA . Section 14.3 introduces ANOVA for two factors, called two-way ANOVA . The second study in Example 1 requires the use of two-way ANOVA to analyze how the mean telephone holding time varies across categories of recorded message type and categories defined by the length of time before repeating the message.

14.1 One-Way ANOVA: Comparing Several Means

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 4: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

682 Chapter 14 Comparing Groups: Analysis of Variance Methods

Think It Through

a. Denote the holding time means for the population that these three random samples represent by µ1 for the advertisement, µ2 for Muzak, and µ3 for classical music. ANOVA tests whether these are equal. The null hypothesis is H0: µ1 = µ2 = µ3. The alternative hypothesis is that at least two of the population means are different.

ANOVA hypotheses

Example 2

Tolerance of Being on Hold? Picture the Scenario Let’s refer back to the second scenario in Example 1 . An airline has a toll-free telephone number for reservations. Often the call volume is heavy, and callers are placed on hold until a reservation agent is free to answer. The airline hopes a caller remains on hold until the call is answered, so as not to lose a potential customer.

The airline recently conducted a randomized experiment to analyze whether callers would remain on hold longer, on average, if they heard (a) an advertisement about the airline and its current promotions, (b) Muzak, or (c) classical music (Vivaldi’s Four Seasons ). The company randomly selected one out of every 1000 calls in a particular week. For each call, they randomly selected one of the three recorded messages to play and then measured the number of minutes that the caller remained on hold before hanging up (these calls were purposely not answered). The total sample size was 15. The com-pany kept the study small, hoping it could make conclusions without alienat-ing too many potential customers! Table 14.1 shows the data. It also shows the mean and standard deviation for each recorded message type.

Table 14.1 Telephone Holding Times by Type of Recorded Message Each observation is the number of minutes a caller remained on hold before hanging up, rounded to the nearest minute.

Recorded Message

Holding Time Observations Sample Size Mean Standard Deviation

Advertisement 5, 1, 11, 2, 8 5 5.4 4.2

Muzak 0, 1, 4, 6, 3 5 2.8 2.4

Classical 13, 9, 8, 15, 7 5 10.4 3.4

Questions to Explore

a. What are the hypotheses for the ANOVA test? b. Figure 14.1 displays the sample means. Since these means are quite

different, is there sufficient evidence to conclude that the population means differ?

0

Muzak

5 10

Advertisement Classical

Sample Mean–x

! Figure 14.1 Sample Means of Telephone Holding Times for Callers Who Hear One of Three Recorded Messages. Question Since the sample means are quite different, can we conclude that the population means differ?

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 5: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.1 One-Way ANOVA: Comparing Several Means 683

Variability Between Groups and Within Groups Is the Key to Significance The ANOVA method is used to compare population means. So, why is it called analysis of variance ? The reason is that the test statistic uses evidence about two types of variability. Rather than presenting a formula now for this test statis-tic, which is rather complex, we’ll discuss the reasoning behind it, which is quite simple.

Table 14.1 on the previous page listed data for three groups. Figure 14.2a shows the same data with dot plots. Suppose the sample data were different, as shown in Figure 14.2b . The data in Figure 14.2b have the same means as the data in Figure 14.2a but have smaller standard deviations within each group. Which case do you think gives stronger evidence against H0: µ1 = µ2 = µ3?

What’s the difference between the data in these two cases? The variability between pairs of sample means is the same in each case because the sample means are the same. However, the variability within each sample is much smaller in Figure 14.2b than in Figure 14.2a . The sample standard deviation is about 1.0 for each sample in Figure 14.2b whereas it is between 2.4 and 4.2 for the samples in Figure 14.2a . We’ll see that the evidence against H0: µ1 = µ2 = µ3 is stronger when the variability within each sample is smaller. Figure 14.2b has less variability within each sample than Figure 14.2a . Therefore, it gives stronger evidence against H0. The evidence against H0 is also stronger when the variability between sample

b. The sample means are quite different. But even if the population means are equal, we expect the sample means to differ because of sampling variability. So these differences alone are not sufficient evi-dence to enable us to reject H0.

Insight The strength of evidence against H0 will also depend on the sample sizes and the variability of the data. We’ll next study how to test these hypotheses.

Try Exercise 14.1, parts a and b

(a)

Telephone holding time

Classical

Advertisement

0 2 4 6 8 10 12 14

Muzak

(b)

Telephone holding time

Muzak

Advertisement

0 2 4 6 8 10 12 14

Classical

! Figure 14.2 Data from Table 14.1 in Figure 14.2a and Hypothetical Data in Figure 14.2b That Have the Same Means but Less Variability Within Groups. Question Do the data in Figure 14.2b give stronger or weaker evidence against H0: µ1 = µ2 = µ3 than the data in Figure 14.2a . Why?

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 6: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

684 Chapter 14 Comparing Groups: Analysis of Variance Methods

means increases (that is, when the sample means are farther apart) and as the sample sizes increase.

ANOVA F Test Statistic The analysis of variance (ANOVA) F test statistic summarizes

F =Between@groups variability

Within@groups variability.

The larger the variability between groups relative to the variability withingroups, the larger the F test statistic tends to be. For instance, F = 6.4 for the data in Figure 14.2a , whereas F = 67.8 for the data in Figure 14.2b . The between-groups variability is the same in each figure, but Figure 14.2b has much less within-groups variability and thus a larger test statistic value. Later in the section, we’ll see the two types of variability described in the F test statistic are measured by estimates of variances.

The Test Statistic for Comparing Means Has the F Distribution When H0 is true, the F test statistic has the F sampling distribution. The formula for the F test statistic is such that when H0 is true, the F distribution has a mean of approximately 1. When H0 is false, the F test statistic tends to be larger than 1, more so as the sample sizes increase. The larger the F test statistic value, the stronger the evidence against H0.

Recall that we used the F distribution in the F test that the slope parameters of a multiple regression model are all zero ( Section 13.3 ). As in that test, the P-value here is the probability (presuming that H0 is true) that the F test statistic is larger than the observed F value. That is, it is the right-hand tail probability, as shown in the margin figure, representing results even more extreme than observed. The larger the F test statistic, the smaller the P-value.

In Section 13.3 , we learned that the F distribution has two df values. For ANOVA with g groups and total sample size for all groups combined of N = n1 + n2 + . . . + ng,

df1 = g - 1 and df2 = N - g.

Table D in the appendix reports F values having P-value of 0.05, for various df1 and df2 values. For any given F value and df values, software provides the P-value. The following summarizes the steps of an ANOVA F test.

In Words The P-value for an ANOVA F test statistic is the right-tail probability from the F distribution.

Observed F10F

Shaded right tailarea = P-value

SUMMARY: Steps of ANOVA F Test for Comparing Population MeansFof Several Groups

1. Assumptions: Independent random samples (either from random sampling or a ran-domized experiment), normal population distributions with equal standard deviations

2. Hypotheses: H0: µ1 = µ2 = . . . = µg (Equal population means for g groups), Ha: at least two of the population means are unequal.

3. Test statistic: F =Between@groups variability

Within@groups variability.

F sampling distribution has F df1ff = g - 1, df2ff = N - g = total sample size ! number of groups

4. P-value: Right-tail probability of above observed F value F

5. Conclusion: Interpret in context. If decision needed, reject H0 if P@value … significance level (such as 0.05).

Recall The F distribution was introduced in Section 13.3 . It’s used for tests about several parameters (rather than a single parameter or a difference between two parameters, for which we can use a t test). !

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 7: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.1 One-Way ANOVA: Comparing Several Means 685

One-way ANOVA

Example 3

Telephone Holding Times Picture the Scenario Examples 1 and 2 discussed a study of the length of time that 15 callers to an airline’s toll-free telephone number remain on hold before hanging up. The study compared three recorded messages: an advertisement about the airline, Muzak, and classical music. Let µ1, µ2, and µ3 denote the population mean telephone holding times for the three messages.

Questions to Explore

a. For testing H0: µ1 = µ2 = µ3 based on this experiment, what value of the F test statistic would have a P-value of 0.05?

b. For the data in Table 14.1 on page 682, we’ll see that software reports F = 6.4. Based on the answer to part a, will the P-value be larger, or smaller, than 0.05?

c. Can you reject H0, using a significance level of 0.05? What can you conclude from this?

Think It Through

a. With g = 3 groups and a total sample size of N = 15 (5 in each group), the test statistic has

df1 = g - 1 = 2 and df2 = N - g = 15 - 3 = 12.

From Table D (see the excerpt in the margin), with these df values an F test statistic value of 3.88 has a P-value of 0.05.

b. Since the F test statistic of 6.4 is farther out in the tail than 3.88 (see figure in margin), the right-tail probability above 6.4 is less than 0.05. So, the P-value is less than 0.05.

c. Since P@value 6 0.05, there is sufficient evidence to reject H0: µ1 = µ2 = µ3. We conclude that a difference exists among the three types of messages in the population mean amount of time that customers are willing to remain on hold.

Insight We’ll see that software reports a P@value = 0.013. This is quite strong evi-dence against H0. If H0 were true, there’d be only about a 1% chance of get-ting an F test statistic value larger than the observed F value of 6.4.

Try Exercise 14.2, parts a–c

F values with

df1

df2 1 2 3

6 5.99 5.14 4.76

12 4.75 3.88 3.49

18 4.41 3.55 3.16

24 4.26 3.40 3.01

30 4.17 3.32 2.92

120 3.92 3.07 2.68

3.886.4

0F

Shaded righttail area = 0.05

The Variance Estimates and the ANOVA Table Now let’s take a closer look at the F test statistic. Denote the group sample means by y1, y2, c, yg. (We use y rather than x because the quantitative variable is the response variable when used in a corresponding regression analysis.) We’ll see that the F test statistic depends on these sample means, the sample standard deviations— s1, s2, c, sg —for the g groups, and the sample sizes.

3.88

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 8: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

686 Chapter 14 Comparing Groups: Analysis of Variance Methods

One assumption for the ANOVA F test is that each population has the same standard deviation. Let ! denote the standard deviation for each of the g popula-tion distributions. The F test statistic for H0: µ1 = µ2 = g= µg is the ratio of two estimates of !2, the population variance for each group. Since we won’t usu-ally do computations by hand, we’ll show formulas merely to give a better sense of what the F test statistic represents. The formulas are simplest when the group sample sizes are equal (as in Example 3 ), the case we’ll show.

The estimate of !2 in the denominator of the F test statistic uses the variability within each group. The sample standard deviations s1, s2, c, sg summarize the variation of the observations within the groups around their means.

! With equal sample sizes, the within-groups estimate of the variance !2 is the mean of the g sample variances for the g groups,

Within@groups variance estimate s2 =s1

2 + s22 + g+ sg

2

g.

When the sample sizes are not equal, the within-groups estimate is a weighted average of the sample variances, with greater weight given to samples with larger sample sizes. In either case, this estimate is unbiased: Its sampling distri-bution has !2 as its mean, regardless of whether H0 is true.

The estimate of !2 in the numerator of the F test statistic uses the variability between each sample mean and the overall sample mean y for all the data.

! With equal sample sizes, n in each group, the between-groups estimate of the variance !2 is

Between@groupsvariance estimate =

n[(y1 - y)2 + (y2 - y)2 + g + (yg - y)2]

g - 1.

If H0 is true, this estimate is also unbiased. We expect this estimate to take a similar value as the within-groups estimate, apart from sampling error. If H0 is false, however, the population means differ and the sample means tend to dif-fer more greatly. Then, the between-groups estimate tends to overestimate !2.

The F test statistic is the ratio of these two estimates of the population variance,

F =Between@groups estimate of !2

Within@groups estimate of !2 .

Computer software displays the two estimates in an ANOVA table similar to tables displayed in regression. Table 14.2 shows the basic format, illustrat-ing the test in Example 3 :

! The MS column contains the two estimates, which are called mean squares . ! The ratio of the two mean squares is the F test statistic, F = 74.6>11.6 = 6.4.

This F test statistic has P@value = 0.013, also shown in Table 14.2 .

Did You Know? Without n in the formula, the between-groups estimate is the sample variance of the g sample means. That sample variance of {y1, y2,c, yg} estimates the variance of the sampling distribution of each sample mean, which is !2/n. Multiplying by n then gives an estimate of !2 itself. See Exercise 14.66. "

Recall In Sections 12.4 and 13.3 , we saw that a mean square (MS) is a ratio of a sum of squares (SS) to its df value. "

Table 14.2 ANOVA Table for F Test Using Data From Table 14.1

Source DF SS MS F P

Group 2 149.2 74.6 6.43 0.013

Error 12 139.2 11.6

Total 14 288.4

Between groups

Within groupsF test statistic = ratio of the MS values

74.6

11.6

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 9: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.1 One-Way ANOVA: Comparing Several Means 687

The mean square in the group row of the ANOVA table is based on variabil-ity between the groups. It is the between-groups estimate of the population vari-ance !2. This is 74.6 in Table 14.2 , listed under MS. The mean square error is the within-groups estimate s2 of !2. This is 11.6 in Table 14.2 . The “Error” label for this MS refers to the fact that it summarizes the error from not being able to pre-dict subjects’ responses exactly if we know only the group to which they belong.

Each mean square equals a sum of squares (in the SS column) divided by a degrees of freedom value (in the DF column). The df values for Group and Error in Table 14.2 are the df values for the F distribution, df1 = g - 1 = 3 - 1 = 2 and df2 = N - g = 15 - 3 = 12.

The sum of the between-groups sum of squares and the within-groups (Error) sum of squares is the total sum of squares . This is the sum of squares of the com-bined sample of N observations around the overall sample mean. The analysis of variance partitions the total sum of squares into two independent parts, the between-groups SS and the within-groups SS. It can be shown that the total sum of squares equals

Total SS = !(y - y!)2 = between@groups SS + within@groups SS.

In Table 14.2 , for example,

Total SS = 288.4 = 149.2 + 139.2.

The total SS divided by N - 1 (the df for the total SS) is the sample variance when the data for the g groups are combined and treated as a single sample. The margin shows the TI@83+ /84 output for this one-way ANOVA.

Assumptions and the Effects of Violating Them The assumptions for the ANOVA F test comparing population means are:

(1) The population distributions of the response variable for the g groups are normal,

(2) Those distributions have the same standard deviation !, and (3) The data resulted from randomization.

Figure 14.3 portrays the population distribution assumptions. TI-83+/84 output

µ2

!

µ1

!

µ3

!

! Figure 14.3 The Assumptions About the Population Distributions: Normal With Equal Standard Deviations. Question In practice, what types of evidence would suggest that these assumptions are badly violated?

The assumptions that the population distributions are normal with identical standard deviations seem stringent. They are never satisfied exactly in practice. Moderate violations of the normality assumption are not serious. The F sam-pling distribution still provides a reasonably good approximation to the actual sampling distribution of the F test statistic. This becomes even more the case as the sample sizes increase, because the sampling distribution then has weaker dependence on the form of the population distributions.

Moderate violations of the equal population standard deviation assumption are also not serious. When the sample sizes are identical for the groups, the F test

000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM000200010271675566_CH24.pdf 7/18/2012 11:17:06 AM

Page 10: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

688 Chapter 14 Comparing Groups: Analysis of Variance Methods

still works well even with severe violations of this assumption. When the sample sizes are not equal, the F test works quite well as long as the largest group stan-dard deviation is no more than about twice the smallest group standard deviation.

You can construct box plots or dot plots for the sample data distributions to check for extreme violations of normality. Misleading results may occur with the F test if the distributions are highly skewed and the sample size N is small, or if there are relatively large differences among the standard deviations (the larg-est sample standard deviation being more than double the smallest one) and the sample sizes are unequal. When the distributions are highly skewed, the mean may not even be a relevant summary measure. 1

1 Chapter 15 discusses methods that apply when these assumptions are badly violated.

In Practice Robustness of ANOVA F test Since the ANOVA F test is robust to moderate breakdowns in the population normality and equal standard deviation assumptions, in practice it is used unless (1) graphical methods show extreme skew for the response variable or (2) the largest group standard deviation is more than about double the smallest group standard deviation and the sample sizes are unequal.

ANOVA assumptions

Example 4

Telephone Holding Time Study Picture the Scenario Let’s check the assumptions for the F test on telephone holding times ( Example 3 ).

Question to Explore Is it appropriate to apply ANOVA to the data in Table 14.1 to compare mean telephone holding times for three message types?

Think It Through Subjects were selected randomly for the experiment and assigned randomly to the three recorded messages. From Table 14.1 (summarized in the mar-gin), the largest sample standard deviation of 4.2 is less than twice the small-est standard deviation of 2.4 (In any case, the sample sizes are equal, so this assumption is not crucial). The sample sizes in Table 14.1 are small, so it is difficult to make judgments about shapes of population distributions. However, the dot plots in Figure 14.2a on page 683 did not show evidence of severe nonnormality. Thus, ANOVA is suitable for these data.

Insight As in other statistical inferences, the method used to gather the data is the most crucial factor. Inferences have greater validity when the data are from an experimental study that randomly assigned subjects to groups or from a sample survey that used random sampling.

Try Exercise 14.11, part c

Recall Table 14.1 showed the following:

Recorded Message

Sample Size Mean

Std. Dev.

Advert. 5 5.4 4.2

Muzak 5 2.8 2.4

Classical 5 10.4 3.4 !

The ANOVA methods in this chapter are designed for independent samples. Recall that for independent samples, the subjects in one sample are distinct from those in other samples. Separate methods, beyond the scope of this text, handle dependent samples.

Recall See the beginning of Chapter 10 to review the distinction between independent samples and dependent samples . !

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 11: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.1 One-Way ANOVA: Comparing Several Means 689

Using One F Test or Several t Tests to Compare the Means With two groups, Section 10.3 showed how a t test can compare the means under the assumption of equal population standard deviations. That test also uses between-groups and within-groups variation. The t test statistic has the differ-ence between the two group means in the numerator and a denominator based on pooling variability within the two groups. See the formula in the margin. In fact, if we apply the ANOVA F test to data from g = 2 groups, it can be shown that the F test statistic equals the square of this t test statistic. The P-value for the F test is exactly the same as the two-sided P-value for the t test. We can use either test to conduct the analysis.

When there are several groups, instead of using the F test why not use a t test to compare each pair of means? One reason for doing the F test is that using a single test rather than multiple tests enables us to control the probability of a Type I error. With a significance level of 0.05 in the F test, for instance, the prob-ability of incorrectly rejecting a true H0 is fixed at 0.05. When we do a separate t test for each pair of means, by contrast, a Type I error probability applies for each comparison. In that case, we are not controlling the overall Type I error rate for all the comparisons.

But the F test has its own disadvantages. With a small P-value, we can con-clude that the population means are not identical. However, the result of the F test does not tell us which groups are different or how different they are. In Example 3 , we have not concluded whether one recorded message works signifi-cantly better than the other two at keeping potential customers on the phone. We can address these issues using confidence intervals, as the next section shows.

Recall To compare two means while assuming equal population standard deviations, Section 10.3 showed the test statistic is

t =y1 - y2

sA 1n1

+ 1n2

,

where the standard deviation s pools information from within both samples. It has df = n1 + n2 - 2 = N - g for N = n1 + n2 and g = 2. !

14.1 Hotel satisfaction The CEO of a company that owns five resort hotels wants to evaluate and compare sat-isfaction with the five hotels. The company’s research department randomly sampled 125 people who had stayed at any of the hotels during the past month and asked them to rate their expectations of the hotel before their stay and to rate the quality of the actual stay at the hotel. Both observations used a rating scale of 0–10, with 0 = very poor and 10 = excellent. The researchers compared the hotels on the gap between prior expecta-tion and actual quality, using the difference score, y = performance gap = (prior expectation score - actual quality score). a. Identify the response variable, the factor, and the cat-

egories that form the groups. b. State the null and alternative hypotheses for conduct-

ing an ANOVA. c. Explain why the df values for this ANOVA are df1 = 4

and df2 = 120. d. How large an F test statistic is needed to get a

P@value = 0.05 in this ANOVA? 14.2 Satisfaction with banking A bank conducts a survey

in which it randomly samples 400 of its customers. The survey asks the customers which way they use the bank the most: (1) interacting with a teller at the bank,

(2) using ATMs, or (3) using the bank’s Internet banking service. It also asks their level of satisfaction with the service they most often use (on a scale of 0 to 10 with 0 = very poor and 10 = excellent ). Does mean satisfaction differ according to how they most use the bank? a. Identifying notation, state the null and alternative

hypotheses for conducting an ANOVA with data from the survey.

b. Report the df values for this ANOVA. Above what F test statistic values give a P-value below 0.05?

c. For the data, F = 0.46 and the P-value equals 0.63. What can you conclude?

d. What were the assumptions on which the ANOVA was based? Which assumption is the most important?

14.3 What’s the best way to learn French? The following table shows scores on the first quiz (maximum score 10 points) for eighth-grade students in an introductory level French course. The instructor grouped the students in the course as follows: Group 1: Never studied foreign language before but have

good English skills Group 2: Never studied foreign language before and have

poor English skills Group 3: Studied at least one other foreign language

14.1 Practicing the Basics

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 12: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

French scores on the quiz Group 1 Group 2 Group 3

4 1 9 6 5 10 8 5

Mean 6.0 3.0 8.0 Std. Dev. 2.000 2.828 2.646

Source DF SS MS F P Group 2 30.00 15.00 2.50 0.177 Error 5 30.00 6.00 Total 7 60.00

a. Defining notation and using results obtained with soft-ware, also shown in the table, report the five steps of the ANOVA test.

b. The sample means are quite different, but the P-value is not small. Name one important reason for this. ( Hint : For given sample means, how do the results of the test depend on the sample sizes?)

c. Was this an experimental study, or an observational study? Explain how a lurking variable could be respon-sible for Group 3 having a larger mean than the others. (Thus, even if the P-value were small, it is inappropri-ate to assume that having studied at least one foreign language causes one to perform better on this quiz.)

14.4 What affects the F value? Refer to the previous exercise. a. Suppose that the first observation in the second group

was actually 9, not 1. Then the standard deviations are the same as reported in the table, but the sample means are 6, 7, and 8 rather than 6, 3, and 8. Do you think the F test statistic would be larger, the same, or smaller? Explain your reasoning, without doing any calculations.

b. Suppose you had the same means as shown in the table but the sample standard deviations were 1.0, 1.8, and 1.6, instead of 2.0, 2.8, and 2.6. Do you think the F test statistic would be larger, the same, or smaller? Explain your reasoning.

c. Suppose you had the same means and standard devia-tions as shown in the table but the sample sizes were 30, 20, and 30, instead of 3, 2, and 3. Do you think the F test statistic would be larger, the same, or smaller? Explain your reasoning.

d. In parts a, b, and c, would the P-value be larger, the same, or smaller? Why?

14.5 Outsourcing Example 1 at the beginning of this chapter mentioned a study to compare customer satisfaction at service centers in San Jose, California; Toronto, Canada; and Bangalore, India. Each center randomly sampled 100 people who called during a two-week period. Callers rated their satisfaction on a scale of 0 to 10, with higher scores representing greater satisfaction. The sample means were 7.6 for San Jose, 7.8 for Toronto, and 7.1 for Bangalore. The table shows the results of conducting an ANOVA. a. Define notation and specify the null hypothesis tested

in this table. b. Explain how to obtain the F test statistic value

reported in the table from the MS values shown, and report the values of df1 and df2 for the F distribution.

c. Interpret the P-value reported for this test. What con-clusion would you make using a 0.05 significance level?

Customer satisfaction with outsourcing

Source DF SS MS F P Group 2 26.00 13.00 27.6 0.000 Error 297 140.00 0.47 Total 299 60.00

14.6 Good friends and astrological sign A recent General Social Survey asked subjects how many good friends they have. Is number of friends associated with the respon-dent’s astrological sign (the 12 symbols of the Zodiac)? The ANOVA table for the GSS data reports F = 0.61 based on df1 = 11, df2 = 813. a. Introduce notation, and specify the null hypothesis and

alternative hypothesis for the ANOVA. b. Based on what you know about the F distribution, would

you guess that the test statistic value of 0.61 provides strong evidence against the null hypothesis? Explain.

c. Software reports a P-value of 0.82. Explain how to interpret this.

14.7 How many kids to have? A recent General Social Survey asked subjects, “What is the ideal number of kids for a family?” Do responses tend to depend on the subjects’ religious affiliation? Results of an ANOVA are shown in the printout, for religion categories (Protestant, Catholic, Jewish, or Other). a. Define notation and specify the null hypothesis tested

in this printout. b. Summarize the assumptions made to conduct this test. c. Report the F test statistic value and the P-value for this

test. Interpret the P-value. d. Based on part c, can you conclude that every pair of

religious affiliations has different population means for ideal family size? Explain.

Ideal number of kids in a family by religious affiliation

Source DF SS MS F P Religion 3 11.72 3.91 5.48 0.001 Error 1295 922.82 0.71 Total 1298 934.54

14.8 Smoking and personality A study about smoking and personality (by A. Terracciano and P. Costa, Addiction , vol. 99, 2004, pp. 472–481) used a sample of 1638 adults in the Baltimore Longitudinal Study on Aging. The subjects formed three groups according to smoking status (never, former, current). Each subject completed a personality questionnaire that provided scores on various personal-ity scales designed to have overall means of about 50 and standard deviations of about 10. The table shows some results for three traits, giving the means with standard deviations in parentheses.

Never smokers ( n ! 828 )

Former smokers ( n ! 694 )

Current smokers ( n ! 116 ) F

Neuroticism 46.7 (9.6) 48.5 (9.2) 51.9 (9.9) 17.77 Extraversion 50.4 (10.3) 50.2 (10.0) 50.9 (9.4) 0.24 Conscientiousness 51.8 (10.1) 48.9 (9.7) 45.6 (10.3) 29.42

a. For the F test for the extraversion scale, using the 0.05 significance level, what conclusion would you make?

b. Refer to part a. Does this mean that the population means are necessarily equal?

690 Chapter 14 Comparing Groups: Analysis of Variance Methods

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 13: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

14.9 Florida student data Refer to the Florida Student Survey data file on the text CD. For the response variable, use the number of weekly hours engaged in sports and other physical exercise. For the explanatory variable, use gender. a. Using software, for each gender construct box plots

and find the mean and standard deviation of the response variable.

b. Conduct an ANOVA. Report the hypotheses, F test statistic value, P-value, and interpret results.

c. To compare the means, suppose you instead used the two-sided t test from Section 10.3 , which assumes equal population standard deviations. How would the t test statistic and P-value for that test relate to the F test sta-tistic and P-value for ANOVA?

14.10 Software and French ANOVA Refer to Exercise 14.3. Using software, a. Create the data file and find the sample means and

standard deviations. b. Find and report the ANOVA table. Interpret the

P-value.

c. Change an observation in Group 2 so that the P-value will be smaller. Specify the value you changed, and report the resulting F test statistic and the P-value. Explain why the value you changed would have this effect.

14.11 Comparing therapies for anorexia The Anorexia data file on the text CD shows weight change for 72 anorexic teenage girls who were randomly assigned to one of three psychological treatments. Use software to analyze these data. (The change scores are given in the file for the control and cognitive therapy groups. You can create the change scores for the family therapy group.) a. Construct box plots for the three groups. Use these

and sample summary means and standard deviations to describe the three samples.

b. For the one-way ANOVA comparing the three mean weight changes, report the test statistic and P-value. Explain how to interpret.

c. State and check the assumptions for the test in part b.

Section 14.2 Estimating Differences in Groups for a Single Factor 691

When an analysis of variance F test has a small P-value, the test does not specify which means are different or how different they are. In practice, we can estimate differences between population means with confidence intervals.

Confidence Intervals Comparing Pairs of Means Let s denote the estimate of the residual standard deviation. This is the square root of the within-groups variance estimate s2 of !2 that is the denominator of the F test statistic in ANOVA. It is also the square root of the mean square error that software reports (the MS in the row for “Error” of any ANOVA table).

14.2 Estimating Differences in Groups for a Single Factor

Recall s2 is an unbiased estimator of !2, regardless of whether H0 is true in the ANOVA F test. !

SUMMARY: Confidence Interval Comparing Means For two groups i and i j , with sample means j yiyy and yjyy having sample sizes ni and njn ,jj the95% confidence interval for µi - µjµ is

yiyy - yjyy { t.025 sA 1ni

+ 1njnA .

The t -score from the t t table has t df = N - g = total sample size - # groups.

The df value of N - g for the t -score is also df2 for the F test. This is the df for the MS error. For g = 2 groups, N = n1 + n2 and df = N - g = (n1 + n2 - 2).This confidence interval is then identical to the one introduced in Section 10.3 for (µ1 - µ2) based on a pooled standard deviation. In the context of follow-up analyses after the ANOVA F test where we form this confidence interval to com-pare a pair of means, some software (such as MINITAB) refers to this method of comparing means as the Fisher method .

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 14: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

692 Chapter 14 Comparing Groups: Analysis of Variance Methods

When the confidence interval does not contain 0, we can infer that the popula-tion means are different. The interval shows just how different they may be.

Fisher method

Example 5

Number of Good Friends and Happiness Picture the Scenario Chapter 11 investigated the association between happiness and several cat-egorical variables, using data from the General Social Survey. The respon-dents indicated whether they were very happy, pretty happy, or not too happy. Is happiness associated with having lots of friends? A recent GSS asked, “About how many good friends do you have?” Here, we could treat either happiness (variable HAPPY in GSS) or number of good friends (NUMFREND in GSS) as the response variable. If we choose number of good friends, then we are in the ANOVA setting, having a quantitative response variable and a categorical explanatory variable (happiness).

For each happiness category, Table 14.3 shows the sample mean, standard deviation, and sample size for the number of good friends. It also shows the ANOVA table for the F test comparing the population means. The small P-value of 0.03 suggests that at least two of the three population means are different.

Table 14.3 Summary of ANOVA for Comparing Mean Number of Good Friends for Three Happiness Categories The analysis is based on GSS data.

Very happy Pretty happy Not too happy

Mean 10.4 7.4 8.3

Standard deviation 17.8 13.6 15.6

Sample size 276 468 87

Source DF SS MS F P

Group 2 1626.8 813.4 3.47 0.032

Error 828 193900.9 234.2

Total 830 195527.7

Question to Explore Use 95% confidence intervals to compare the population mean number of good friends for the three pairs of happiness categories—very happy with pretty happy, very happy with not too happy, and pretty happy with not too happy.

Think It Through From Table 14.3 , the MS error is 234.2. The residual standard deviation is s = 2234.2 = 15.3, with df = 828 (listed in the row for the MS error). For a 95% confidence interval with df = 828, software reports t.025 = 1.963. For comparing the very happy and pretty happy categories, the confidence inter-val for µ1 - µ2 is

(y1 - y2) { t.025 sA 1n1

+ 1n2

= (10.4 - 7.4) { 1.963(15.3)A 1276

+ 1468

which is 3.0 { 2.3, or (0.7, 5.3).

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 15: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.2 Estimating Differences in Groups for a Single Factor 693

The Effects of Violating Assumptions The t confidence intervals have the same assumptions as the ANOVA F test: (1) normal population distributions, (2) identical standard deviations, and (3) data that resulted from randomization. These inferences also are not highly depen-dent on the normality assumption, especially when the sample sizes are large, such as in Example 5. When the standard deviations are quite different, with the ratio of the largest to smallest exceeding about 2, it is preferable to use the confidence interval formula from Section 10.2 (see the margin Recall) that uses separate standard deviations for the groups rather than a single pooled value s. That approach does not assume equal standard deviations.

In Example 5 , the sample standard deviations are not very different (ranging from 13.6 to 17.8), and the GSS is a multistage random sample with properties similar to a simple random sample. Thus, Assumptions 2 and 3 are reasonably well satisfied. The sample sizes are fairly large, so Assumption 1 of normality is not crucial. It’s justifiable to use ANOVA and follow-up confidence intervals with these data.

Controlling Overall Confidence With Many Confidence Intervals With g groups, there are g(g - 1)/2 pairs of groups to compare. With g = 3, for instance, there are g(g - 1)/2 = 3(2)/2 = 3 comparisons: Group 1 with Group 2, Group 1 with Group 3, and Group 2 with Group 3.

1 3 4 8

Number of Good Friends

16

We infer that the population mean number of good friends is between about 1 and 5 higher for those who are very happy than for those who are pretty happy. Since the confidence interval contains only positive numbers, this suggests that µ1 - µ2 7 0; that is, µ1 exceeds µ2. On the average, people who are very happy have more good friends than people who are pretty happy.

For the other comparisons, you can find

Very happy, not too happy: 95% CI for µ1 - µ3 is (10.4 - 8.3) { 3.7, or (-1.6, 5.8). Pretty happy, not too happy: 95% CI for µ2 - µ3 is (7.4 - 8.3) { 3.5, or (-4.4, 2.6).

These two confidence intervals contain 0. So there’s not enough evidence to conclude that a difference exists between µ1 and µ3 or between µ2 and µ3.

Insight The confidence intervals are quite wide, even though the sample sizes are fairly large. This is because the sample standard deviations (and hence s ) are large. Table 14.3 reports that the sample standard deviations are larger than the sample means, suggesting that the three distributions are skewed to the right. The margin figure shows a box plot of the overall sample data distribu-tion of number of good friends, except for many large outliers. It would also be sensible to compare the median responses, but these are not available at the GSS website. Do non-normal population distributions invalidate this inferential analysis? We’ll discuss this next.

Try Exercise 14.12

Recall From Section 10.2 , a 95% confidence interval for µ2 - µ1 using separate standard deviations s1 and s2 is

y1 - y2 { t.025Ds12

n1+

s22

n2.

Software supplies the df value and the confidence interval. !

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 16: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

694 Chapter 14 Comparing Groups: Analysis of Variance Methods

The confidence interval method just discussed is mainly used when g is small or when only a few comparisons are of main interest. When there are many groups, the number of comparisons can be large. For example, when g = 10, there are g(g - 1)/2 = 45 pairs of means to compare. If we plan to construct 95% confi-dence intervals for these comparisons, an error probability of 0.05 applies to each comparison. On average, 45(0.05) = 2.25 of the confidence intervals would not contain the true differences of means.

For 95% confidence intervals, the confidence level of 0.95 is the probability that any particular confidence interval that we plan to construct will contain the parameter. The probability that all the confidence intervals will contain the parameters is considerably smaller than the confidence level for any particular interval. How can we construct the intervals so that the 95% confidence extends to the entire set of intervals rather than to each single interval? Methods that con-trol the probability that all confidence intervals will contain the true differences in means are called multiple comparison methods . For these methods, all inter-vals are designed to contain the true parameters simultaneously with an overall fixed probability.

Multiple Comparisons for Comparing All Pairs of Means Multiple comparison methods compare pairs of means with a confidence level that applies simultaneously to the entire set of comparisons rather than to each separate comparison.

The simplest multiple comparison method uses the confidence interval for-mula from the beginning of this section. However, for each interval it uses a t -score for a more stringent confidence level. This ensures that the overall confi-dence level is acceptably high. The desired overall error probability is split into equal parts for each comparison. Suppose we want a confidence level of 0.95 that all confidence intervals will be simultaneously correct. If we plan five confidence intervals comparing means, then the method uses error probability 0.05/5 = 0.01 for each one; that is, a 99% confidence level for each separate interval. This approach ensures that the overall confidence level is at least 0.95. (It is actually slightly larger.) Called the Bonferroni method, it is based on a special case of a probability theorem shown by Italian probabilist, Carlo Bonferroni, in 1936 (Exercise 14.67).

We shall instead use the Tukey method . It is designed to give an overall confi-dence level very close to the desired value (such as 0.95), and it has the advantage that its confidence intervals are slightly narrower than the Bonferroni intervals. The Tukey method is more complex, using a sampling distribution pertaining to the difference between the largest and smallest of the g sample means. We do not present its formula, but it is easy to obtain with software.

Recall John Tukey was responsible for many statistical innovations, including box plots and other methods of exploratory data analysis (EDA). See On the Shoulders of . . . John Tukey in Section 2.6 to read more about Tukey and EDA. !

Tukey method

Example 6

Number of Good Friends Picture the Scenario Example 5 compared the population mean numbers of good friends, for three levels of reported happiness. There, we constructed a separate 95% confidence interval for the difference between each pair of means. Table 14.4 displays these three intervals. It also displays the confidence intervals that software reports for the Tukey multiple comparison method.

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 17: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.2 Estimating Differences in Groups for a Single Factor 695

Try Exercise 14.15

Table 14.4 Multiple Comparisons of Mean Good Friends for Three Happiness Categories An asterisk * indicates a significant difference, with the confidence interval not containing 0.

Groups Difference of means

Separate 95% Cls

Tukey 95% Multiple Comparison CIs

(Very happy, Pretty happy) µ1 - µ2 (0.7, 5.3) * (0.3, 5.7) *

(Very happy, Not too happy) µ1 - µ3 (-1.6, 5.8) (-2.6, 6.5)

(Pretty happy, Not too happy) µ2 - µ3 (-4.4, 2.6) (-5.1, 3.3)

Question to Explore

a. Explain how the Tukey multiple comparison confidence intervals dif-fer from the separate confidence intervals in Table 14.4 .

b. Summarize results shown for the Tukey multiple comparisons.

Think It Through

a. The Tukey confidence intervals hold with an overall confidence level of about 95%. This confidence applies to the entire set of three inter-vals. The Tukey confidence intervals are wider than the separate 95% confidence intervals because the multiple comparison approach uses a higher confidence level for each separate interval to ensure achieving the overall confidence level of 95% for the entire set of intervals.

b. The Tukey confidence interval for µ1 - µ2 contains only positive val-ues, so we infer that µ1 7 µ2. The mean number of good friends is higher, although perhaps barely so, for those who are very happy than for those who are pretty happy. The other two Tukey intervals contain 0, so we cannot infer that those pairs of means differ.

Insight Figure 14.4 summarizes the three Tukey comparisons from Table 14.4 . The intervals have different lengths because the group sample sizes are different.

10

Difference

–5 0 5

Groups

Very, pretty happy

Very, not too happy

Pretty, not too happy µ2 – µ3

µ1 – µ3

µ1 – µ2

! Figure 14.4 Summary of Tukey Comparisons of Pairs of Means.

ANOVA and Regression ANOVA can be presented as a special case of multiple regression. The factor enters the regression model using indicator variables. Each indicator variable takes only two values, 0 and 1, and indicates whether an observation falls in a particular group.

With three groups, we need two indicator variables to indicate the group membership. The first indicator variable is

x1 = 1 for observations from the first group = 0 otherwise.

Recall You can review indicator variables in Section 13.5 . We used them there to include a categorical explanatory variable in a regression model. "

000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM000200010271675566_CH24.pdf 7/18/2012 11:17:07 AM

Page 18: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

696 Chapter 14 Comparing Groups: Analysis of Variance Methods

The second indicator variable is

x2 = 1 for observations from the second group

= 0 otherwise.

The indicator variables identify the group to which an observation belongs as follows:

Group 1: x1 = 1 and x2 = 0

Group 2: x1 = 0 and x2 = 1

Group 3: x1 = 0 and x2 = 0.

We don’t need a separate indicator variable for the third group. We know an observation is in that group if x1 = 0 and x2 = 0.

With these indicator variables, the multiple regression equation for the mean of y is

µy = ! + "1x1 + "2x2.

For observations from the third group, x1 = x2 = 0, and the equation reduces to

µy = ! + "1(0) + "2(0) = !.

So the parameter ! represents the population mean of the response variable y for the last group. For observations from the first group, x1 = 1 and x2 = 0, so

µy = ! + "1(1) + "2(0) = ! + "1

equals the population mean µ1 for that group. Similarly, ! + "2 equals the popu-lation mean µ2 for the second group (let x1 = 0 and x2 = 1 ).

Since ! + "1 = µ1 and ! = µ3, the difference between the means

µ1 - µ3 = (! + "1) - ! = "1.

That is, the coefficient "1 of the first indicator variable represents the difference between the first mean and the last mean. Likewise, "2 = µ2 - µ3. The beta coefficients of the indicator variables represent differences between the mean of each group and the mean of the last group. Table 14.5 summarizes the parame-ters of the regression model and their correspondence with the three population means.

Table 14.5 Interpretation of Coefficients of Indicator Variables in Regression Model The indicator variables represent a categorical predictor with three categories specifying three groups.

Indicator

Group x1 x2 Mean of y Interpretation of !

1 1 0 µ1 = ! + "1 "1 = µ1 - µ3

2 0 1 µ2 = ! + "2 "2 = µ2 - µ3

3 0 0 µ3 = !

Using Regression for the ANOVA Comparison of Means For three groups, the null hypothesis for the ANOVA F test is H0: µ1 = µ2 = µ3. If H0 is true, then µ1 - µ3 = 0 and µ2 - µ3 = 0. In the multiple regression model

µy = ! + "1x1 + "2x2

Recall Section 13.3 introduced the F test that all the beta parameters in a multiple regression model equal 0. !

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 19: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.2 Estimating Differences in Groups for a Single Factor 697

with indicator variables, recall that µ1 - µ3 = !1 and µ2 - µ3 = !2. Therefore, the ANOVA null hypothesis H0: µ1 = µ2 = µ3 is equivalent to H0: !1 = !2 = 0 in the regression model. If the beta parameters in the regression model all equal 0, then the mean of the response variable equals " for each group. We can per-form the ANOVA test comparing means using the F test of H0: !1 = !2 = 0 for this regression model.

Regression analysis

Example 7

Telephone Holding Times Picture the Scenario Let’s return to the data we analyzed in Examples 1–4 on telephone holding times for callers to an airline for which the recorded message is an advertise-ment, Muzak, or classical music.

Questions to Explore

a. Set up indicator variables to use regression to model the mean holding times with the type of recorded message as explanatory variable.

b. Table 14.6 shows a portion of a MINITAB printout for fitting this model. Use it to find the estimated mean holding time for the adver-tisement recorded message.

c. Use Table 14.6 to conduct the ANOVA F test (which Example 3 had shown).

Table 14.6 Printout for Regression Model µy ! ! " "1x1 " "2x2 for Telephone Holding Times and Type of Recorded Message The indicator variables are x1 for the advertisement and x2 for Muzak.

Predictor Coef SE Coef T P

Constant 10.400 1.523 6.83 0.000

x1 -5.000 2.154 -2.32 0.039

x2 -7.600 2.154 -3.53 0.004

Analysis of Variance

Source DF SS MS F P

Regression 2 149.20 74.60 6.43 0.013

Residual Error 12 139.20 11.60

Total 14 288.40

Think It Through

a. The factor (type of recorded message) has three categories—advertisement, Muzak, and classical music. We set up indicator vari-ables x1 and x2 with

x1 = 1 for the advertisement (and 0 otherwise), x2 = 1 for Muzak (and 0 otherwise),

so x1 = x2 = 0 for classical music.

The regression model for the mean of y = telephone holding time is then

µy = " + !1x1 + !2x2.

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 20: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

698 Chapter 14 Comparing Groups: Analysis of Variance Methods

Notice the similarity between the ANOVA table for comparing means ( Table 14.2 ) and the ANOVA table for regression ( Table 14.6 ), both shown again in the margin. The “between-groups sum of squares” for ordinary ANOVA is the “regression sum of squares” for the regression model. This is the variability explained by the indicator variables for the groups. The “error sum of squares” for ordinary ANOVA is the residual error sum of squares for the regression model. This represents the variability within the groups. This sum of squares divided by its degrees of freedom is the mean square error = 11.6 (MS in the error row), which is also the within-groups estimate of the variance of the obser-vations within the groups. The regression mean square is the between-groups estimate = 74.6. The ratio of the regression mean square to the mean square error is the F test statistic (F = 6.4).

So far, this chapter has shown how to compare groups for a single factor. This is one-way ANOVA . Sometimes the groups to compare are the cells of a cross-classification of two or more factors. For example, the four groups (employed men, employed women, unemployed men, unemployed women) result from cross classifying employment status and gender. The next section presents two-way ANOVA , the procedure for comparing the mean of a quantitative response variable across categories of each of two factors.

b. From Table 14.6 , the prediction equation is

yn = 10.4 - 5.0x1 - 7.6x2.

For the advertisement, x1 = 1 and x2 = 0, so the estimated mean is yn = 10.4 - 5.0(1) - 7.6(0) = 5.4. This is the sample mean for the five subjects in that group.

c. From Table 14.6 , the F test statistic for testing

H0: !1 = !2 = 0

is F = 6.43, with df1 = 2 and df2 = 12. This null hypothesis is equiva-lent to

H0: µ1 = µ2 = µ3.

Table 14.6 reports a P-value of 0.013. The regression approach pro-vides the same F test statistic and P-value as the ANOVA did in Table 14.2 on page 686 .

Insight Testing that the beta coefficients equal zero is a way of testing that the popu-lation means are equal. Likewise, confidence intervals for those coefficients give us confidence intervals for differences between means. For instance, since !1 = µ1 - µ3, a confidence interval for !1 is also a confidence interval comparing µ1 and µ3. From Table 14.6 , the estimate -5.0 of !1 has se = 2.154. Since df = 12 for that interval, t.025 = 2.179, and a 95% confidence interval for µ1 - µ3 is

-5.0 { 2.179(2.154), or -5.0 { 4.7, which is (-9.7, -0.3).

This agrees with the 95% confidence interval you would obtain using the difference between the sample means and its standard error.

Try Exercise 14.17

Table 14.2 : ANOVA Table for Comparing Means

Source DF SS MS F P

Group 2 149.2 74.6 6.4 0.013

Error 12 139.2 11.6

Total 14 288.4

Table 14.6 : ANOVA Table for Regression Model

Source DF SS MS F P

Regression 2 149.2 74.6 6.4 .013

Residual 12 139.2 11.6

Total 14 288.4

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 21: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.2 Estimating Differences in Groups for a Single Factor 699

14.12 House prices and age For the House Selling Prices OR data file on the text CD, the output shows the result of conducting an ANOVA comparing mean house selling prices (in $1000) by Age Category ( New = 0 to 24 years old, Medium = 25 to 50 years old, Old = 51 to 74 years old, Very Old = 75 + years old ). It also shows a sum-mary table of means and standard deviations of the selling prices, by condition. Consider the New and Medium ages, where 150 of the 200 observations fall.

a. Using information given in the tables, show how to construct a 95% confidence interval comparing the corresponding population means.

b. Interpret the confidence interval.

Age N Mean StDev Medium 71 243.6 79.5 New 78 305.8 125.9 Old 37 215.7 85.3 Very Old 14 311.6 188.5

Source DF SS MS F P Age Condition 3 281272 93757 7.70 0.000 Error 196 2387598 12182 Total 199 2668870

14.13 Religious importance and educational level An exten-sive survey by the Pew Forum on Religion & Public Life, conducted in 2007, details views on religion in the United States. Based on interviews with a representative sample of more than 35,000 Americans age 18 and older, the U.S. Religious Landscape Survey found that religious affilia-tion in the United States is both diverse and dependent on a lot of factors. The table shows results of ANOVA for comparing feelings about the importance of religion in life and the subject’s level of education (scale increasing with education: 3 = HS Graduate, 5 = Some College or Associate Degree, 6 = College Graduate, 7 = Post -Graduate Training). It appears that as educational level increases, the importance of religion in daily life decreases. Construct a 95% confidence interval to compare the pop-ulation mean educational level for the Very Important and Not at All Important religious attitude groups. Interpret the interval.

Summary of ANOVA for mean education level of subjects and the level of importance of religion in daily life

Very important

Somewhat important

Not too important

Not at all important

Mean 4.55 4.61 4.87 5.12 N 1959 776 309 341

Source DF SS MS F P Educ 8 69.17 8.65 6.65 0.000 Error 3403 4425.25 1.30 Total 3411 4494.43

14.14 Comparing telephone holding times Examples 2 and 3 analyzed whether telephone callers to an airline would stay on hold different lengths of time, on average, if they heard (a) an advertisement about the airline, (b) Muzak, or (c) classical music. The sample means were 5.4, 2.8, and 10.4, with n1 = n2 = n3 = 5. The ANOVA test had F = 74.6/11.6 = 6.4 and a P-value of 0.013. a. A 95% confidence interval comparing the population

mean times that callers are willing to remain on hold for classical music and Muzak is (2.9, 12.3). Interpret this interval.

b. The margin of error was 4.7 for this comparison. Without doing a calculation, explain why the margin of error is 4.7 for comparing each pair of means.

c. The 95% confidence intervals are (0.3, 9.7) for µ3 - µ1 and (-2.1, 7.3) for µ1 - µ2. Interpret these two con-fidence intervals. Using these two intervals and the interval from part a, summarize what the airline com-pany learned from this study.

d. The confidence intervals are wide. In the design of this experiment, what could you change to estimate the dif-ferences in means more precisely?

14.15 Tukey holding time comparisons Refer to the previ-ous exercise. We could instead use the Tukey method to construct multiple comparison confidence intervals. The Tukey confidence intervals having overall confidence level 95% have margins of error of 5.7, compared to 4.7 for the separate 95% confidence intervals in the previous exercise. a. According to this method, which groups are signifi-

cantly different? b. Why are the margins of error larger than with the sepa-

rate 95% intervals? 14.16 REM sleep A psychologist compares the mean amount

of time of rapid-eye movement (REM) sleep for subjects under three conditions. She randomly assigns 12 subjects to the three groups, four per group. The sample means for the three groups were 18, 15, and 12. The table shows the ANOVA table from SPSS. REM sleep Source DF SS MS F P Group 2 72.00 36.00 0.79 0.481 Error 9 408.00 45.33 Total 11 480.00

a. Report and interpret the P-value for the ANOVA F test. b. For the Tukey 95% multiple comparison confidence

intervals comparing each pair of means, the margin of error for each interval is 13.3. Is it true or false that since all the confidence intervals contain 0, it is plau-sible that all three population means equal 0.

c. Would the margin of error for each separate 95% confidence interval be less than 13.3, equal to 13.3, or larger than 13.3? Explain why.

14.2 Practicing the Basics

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 22: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

14.17 REM regression Refer to the previous exercise.

a. Set up indicator variables for a regression model so that an F test for the regression parameters is equivalent to the ANOVA test comparing the three means.

b. Express the null hypothesis both in terms of population means and in terms of regression parameters for the model in part a.

c. The sample means were 18, 15, and 12. Explain why the prediction equation is yn = 12 + 6x1 + 3x2. Interpret the three parameter estimates in this equation.

14.18 Outsourcing satisfaction Exercise 14.5 showed an ANOVA for comparing mean customer satisfaction scores for three service centers. The sample means on a scale of 0 to 10 were 7.60 in San Jose, 7.80 in Toronto, and 7.10 in Bangalore. Each sample size = 100, MS error = 0.47, and the F test statistic = 27.6 has P@value 6 0.001.

a. Explain why the margin of error for separate 95% confidence intervals is the same for comparing the population means for each pair of cities. Show that this margin of error is 0.19.

b. Find the 95% confidence interval for the difference in population means for each pair of service centers. Interpret.

c. The margin of error for Tukey 95% multiple compari-son confidence intervals for comparing the service cen-ters is 0.23. Construct the intervals. Interpret.

d. Why are the confidence intervals different in part b and in part c? What is an advantage of using the Tukey intervals?

14.19 Regression for outsourcing Refer to the previous exercise.

a. Set up indicator variables to represent the three service centers.

b. The prediction equation is yn = 7.1 + 0.5x1 + 0.7x2. Show how the terms in this equation relate to the sample means of 7.6 for San Jose, 7.8 for Toronto, and 7.1 for Bangalore.

14.20 Advertising effect on sales Each of 100 restaurants in a fast-food chain is randomly assigned one of four media for an advertising campaign: A = radio, B = TV, C = newspaper, D = mailing. For each restaurant, the observation is the change in sales, defined as the differ-ence between the sales for the month during which the advertising campaign took place and the sales in the same month a year ago (in thousands of dollars). a. By creating indicator variables, write a regression

equation for the analysis to compare mean change in sales for the four media.

b. Explain how you could use the regression model to test the null hypothesis of equal population mean change in sales for the four media.

c. The prediction equation is yn = 35 + 5x1 - 10x2 + 2x3. Estimate the difference in mean change in sales for media (i) A and D, (ii) A and B.

14.21 French ANOVA Refer to Exercise 14.3 about studying French, with data shown again below. Using software, a. Compare the three pairs of means with separate 95%

confidence intervals. Interpret. b. Compare the three pairs of means with Tukey 95%

multiple comparison confidence intervals. Interpret, and explain why the intervals are different than in part a.

Group 1 Group 2 Group 3 4 1 9 6 5 10 8 5

One-way ANOVA is a bivariate (two-variable) method. It analyzes the relation-ship between the mean of a quantitative response variable and the groups that are categories of a factor. ANOVA extends to handle two or more factors. With multiple factors, the analysis is multivariate . We’ll illustrate for the case of two factors. This extension is a two-way ANOVA . It enables us to study the effect of one factor at a fixed level of a second factor.

The great British statistician R. A. Fisher (see On the Shoulders of R. A. Fisher at the end of Section 8.5 ) developed ANOVA methods in the 1920s. Agricultural experiments were the source of many of the early ANOVA applications. For instance, ANOVA has often been used to compare the mean yield of a crop for different fertilizers.

14.3 Two-Way ANOVA Recall A factor is a categorical explanatory variable. One-way ANOVA has a single factor. !

700 Chapter 14 Comparing Groups: Analysis of Variance Methods

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 23: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 701

Two factors

Example 8

Amounts of Fertilizer and Manure Picture the Scenario This example presents a typical ANOVA application, based on a study at Iowa State University. 2 A large field was portioned into 20 equal-size plots. Each plot was planted with the same amount of corn seed, using a fixed spacing pattern between the seeds. The goal was to study how the yield of corn later harvested from the plots (in metric tons) depended on the lev-els of use of nitrogen-based fertilizer and manure. Each factor was mea-sured in a binary manner. The fertilizer level was low = 45-kg per hectare or high = 135 kg per hectare. The manure level was low = 84 kg per hectare or high = 168 kg per hectare.

Questions to Explore

a. What are four treatments you can compare with this experiment? b. What comparisons are relevant when you control for (keep fixed)

manure level?

Think It Through

a. Four treatments result from cross-classifying the two binary factors: fertilizer level and manure level. Table 14.7 shows the four treatments, defined for the 2 * 2 = 4 combinations of categories of the two fac-tors (fertilizer level and manure level).

Table 14.7 Four Groups for Comparing Mean Corn Yield These result from the two-way cross classification of fertilizer level with manure level.

Fertilizer

Manure Low High

Low

High

b. We can compare the mean corn yield for the two levels of fertilizer, controlling for manure level (that is, at a fixed level of manure use). For fields in which manure level was low , we can compare the mean yields for the two levels of fertilizer use. These refer to the first row of Table 14.7 . Likewise, for fields in which manure level was high , we can compare the mean yields for the two levels of fertilizer use. These refer to the second row of Table 14.7 .

Insight Among the questions we’ll learn how to answer in this section are: Does the mean corn yield depend significantly on the fertilizer level? Does it depend on the manure level? Does the effect of fertilizer depend on the manure level?

Try Exercise 14.22

2 Thanks to Dan Nettleton, Iowa State University, for data on which this example is based.

Did You Know? A metric ton is 1000 kilograms, which is about 2200 pounds. A hectare is 10,000 square meters (e.g., 100 meters by 100 meters), which is about 2.5 acres. !

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 24: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

702 Chapter 14 Comparing Groups: Analysis of Variance Methods

Inference About Effects in Two-Way ANOVA In two-way ANOVA, a null hypothesis states that the population means are the same in each category of one factor, at each fixed level of the other factor. For example, we could test

H0: Mean corn yield is equal for plots at the low and high levels of fertilizer, for each fixed level of manure.

Table 14.8a displays a set of population means satisfying this null hypothesis of “no effect of fertilizer level.”

Table 14.8 Population Mean Corn Yield Satisfying Null Hypotheses: (a) No Effect of Fertilizer Level, (b) No Effect of Manure Level

Fertilizer

Manure Low High

(a) Low 10 10

High 20 20

Fertilizer

Manure Low High

(b) Low 10 20

High 10 20

We could also test

H0: Mean corn yield is equal for plots at the low and high levels of manure, for each fixed level of fertilizer.

Table 14.8b displays a set of population means satisfying this null hypothesis of “no effect of manure level.” The effects of individual factors tested with these two null hypotheses are called main effects . We’ll discuss a third null hypothesis later in the section.

As in one-way ANOVA, the F tests of hypotheses in two-way ANOVA assume that

! The population distribution for each group is normal. ! The population standard deviations are identical. ! The data result from random sampling or a randomized experiment.

Here, each group refers to a cell in the two-way classification of the two fac-tors. ANOVA procedures still usually work quite well if the population distribu-tions are not normal with identical standard deviations. As in other ANOVA inferences, the quality of the sample is the most important assumption.

The test statistics have complex formulas. We’ll rely on software. As in one-way ANOVA, the test for a factor uses two estimates of the variance for each group. These estimates appear in the mean square (MS) column of the ANOVA table.

SUMMARY: F Test Statistics in Two-Way ANOVA F For testing the main effect for a factor, the test statistic is the ratio of mean squares,

F =MS for the factor

MS error.

The MS for the factor is a variance estimate based on between-groups variation for that factor. The MS error is a within-groups variance estimate that is always unbiased.

When the null hypothesis of equal population means for the factor is true, the F test statistic values tend to fluctuate around 1. When it is false, they tend to be larger. The P-value is the right-tail probability above the observed F value. That is, it is the probability (presuming H0 is true) of even more extreme results than we observed in the sample.

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 25: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 703

Testing the main effects

Example 9

Corn Yield Picture the Scenario Let’s analyze the relationship between corn yield and the two factors, fertil-izer level and manure level. Table 14.9 shows the data and the sample mean and standard deviation for each group.

Table 14.9 Corn Yield by Fertilizer Level and Manure Level

Fertilizer Level

Manure Level Plot

Sample Size Mean

Std. Dev.

1 2 3 4 5

High High 13.7 15.8 13.9 16.6 15.5 5 15.1 1.3

High Low 16.4 12.5 14.1 14.4 12.2 5 13.9 1.7

Low High 15.0 15.1 12.0 15.7 12.2 5 14.0 1.8

Low Low 12.4 10.6 13.7 8.7 10.9 5 11.3 1.9

Questions to Explore

a. Summarize the factor effects as shown by the sample means. b. Table 14.10 is an ANOVA table for two-way ANOVA. Specify the two

hypotheses tested, give the test statistics and P-values, and interpret.

Table 14.10 Two-Way ANOVA for Corn Yield Data in Table 14.9

Source DF SS MS F P

Fertilizer 1 17.67 17.67 6.33 0.022

Manure 1 19.21 19.21 6.88 0.018

Error 17 47.44 2.79

Total 19 84.32

Think It Through

a. Table 14.9 (with means summarized in the margin) shows that for each manure level, the sample mean yield is higher for the plots using more fertilizer. For each fertilizer level, the sample mean yield is higher for the plots using more manure.

b. First, consider the hypothesis

H0: Mean corn yield is equal for plots at the low and high levels of fer-tilizer, for each fixed level of manure.

For the fertilizer main effect, Table 14.10 reports that the between-groups estimate of the variance is 17.67. This is the mean square (MS)

MS values for numerator of F statistics

MS error is denominator of each F statistic

17.67

19.21

2.79

Means from Table 14.9

Fertilizer

Manure Low High

Low 11.3 13.9

High 14.0 15.1

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 26: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

704 Chapter 14 Comparing Groups: Analysis of Variance Methods

Regression Model with Indicator Variables for Two-Way ANOVA Let f denote an indicator variable for fertilizer level and let m denote an indica-tor variable for manure level. Specifically,

f = 1 for plots with high fertilizer level

= 0 for plots with low fertilizer level

m = 1 for plots with high manure level

= 0 for plots with low manure level.

The multiple regression model for the mean corn yield with these two indica-tor variables is

µy = ! + "1 f + "2 m.

To find the population means for the four groups, we substitute the possible com-binations of values for the indicator variables. For example, for plots that have high fertilizer level (f = 1) and low manure level (m = 0), the mean corn yield is

µy = ! + "1(1) + "2(0) = ! + "1.

Table 14.11 shows the four means. The difference between the means at the high and low levels of fertilizer equals "1 for each manure level. That is, the coefficient "1 of the indicator variable f for fertilizer level equals the difference between the means at its high and low levels, controlling for manure level. The

for fertilizer in Table 14.10 . The within-groups estimate is the MS error, or 2.79. The F test statistic is the ratio,

F = 17.67 / 2.79 = 6.33.

From Table 14.10 , the df values are 1 and 17 for the two estimates. From the F distribution with df1 = 1 and df2 = 17, the P-value is 0.022, also reported in Table 14.10 . If the population means were equal at the two levels of fertilizer, the probability of an F test statistic value larger than 6.33 would be only 0.022. There is strong evidence that the mean corn yield depends on fertilizer level.

Next, consider the hypothesis

H0: Mean corn yield is equal for plots at the low and high levels of manure, for each fixed level of fertilizer.

For the manure main effect, the F test statistic is F = 19.21 / 2.79 = 6.88. From Table 14.10 , df1 = 1 and df2 = 17, and the P-value is 0.018. There is strong evidence that the mean corn yield also depends on the manure level.

Insight As with any significance test, the information gain is limited. We do not learn how large the fertilizer and manure effects are on the corn yield. We can use confidence intervals to investigate the sizes of the main effects. We’ll now learn how to do this by using regression modeling with indicator variables.

Try Exercise 14.27

000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM000200010271675566_CH24.pdf 7/18/2012 11:17:08 AM

Page 27: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 705

null hypothesis of no fertilizer effect is H0: !1 = 0. Likewise, !2 is the difference between the means at the high and low levels of manure, for each fertilizer level.

Table 14.11 Population Mean Corn Yield for Fertilizer and Manure Levels

Indicator Variables

Fertilizer Manure f m Mean of y

High High 1 1 " + !1 + !2

High Low 1 0 " + !1

Low High 0 1 " + !2

Low Low 0 0 "

We do not need to use regression modeling to conduct the ANOVA F tests. They’re easy to do using software. But the modeling approach helps us to focus on estimating the means and the differences among them. We can compare means using an ordinary confidence interval for the regression parameter that equals the difference between those means. The 95% confidence interval has the usual form of

parameter estimate { t0.025(se).

The df for the t -score is the df value for the MS error.

Regression modeling

Example 10

Estimate and Compare Mean Corn Yields Picture the Scenario Table 14.12 shows the result of fitting the regression model for predicting corn yield with indicator variables for fertilizer level and manure level.

Table 14.12 Estimates of Regression Parameters for Two-Way ANOVA of the Mean Corn Yield by Fertilizer Level and Manure Level

Predictor Coef SE Coef T P

Constant 11.6500 0.6470 18.01 0.000

fertilizer 1.8800 0.7471 2.52 0.022

manure 1.9600 0.7471 2.62 0.018

Questions to Explore

a. Find and use the prediction equation to estimate the mean corn yield for each group.

b. Use a parameter estimate from the prediction equation to compare mean corn yields for the high and low levels of fertilizer, at each manure level.

c. Find a 95% confidence interval comparing the mean corn yield at the high and low levels of fertilizer, controlling for manure level. Interpret it.

Think It Through

a. From Table 14.12 , the prediction equation is (rounding to one decimal place)

yn = 11.6 + 1.9f + 2.0m.

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 28: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

706 Chapter 14 Comparing Groups: Analysis of Variance Methods

The y -intercept equals 11.6. This is the estimated mean yield (in metric tons per hectare) when both indicator variables equal 0, that is, with fertil-izer and manure at the low levels. The estimated means for the other cases result from substituting values for the indicator variables. For instance, at fertilizer level = high and manure level = low, f = 1 and m = 0, so the estimated mean yield is yn = 11.6 + 1.9(1) + 2.0(0) = 13.5. Doing this for all four groups, we get

Fertilizer

Manure Low High

Low 11.6 11.6 + 1.9 = 13.5

High 11.6 + 2.0 = 13.6 11.6 + 1.9 + 2.0 = 15.5

b. The coefficient of the fertilizer indicator variable f is 1.9. This is the estimated difference in mean corn yield between the high and low levels of fertilizer, for each level of manure (for instance, 13.5 - 11.6 = 1.9 when manure level = low ).

c. Again, The estimate of the fertilizer effect is 1.9. Its standard error, reported in Table 14.12 , is 0.747. From Table 14.10 (see page 703), the df for the MS error is 17. From Table B, t.025 = 2.11 when df = 17. The 95% confidence interval is

1.9 { 2.11(0.747), which is (0.3, 3.5).

At each manure level, we estimate that the mean corn yield is between 0.3 and 3.5 metric tons per hectare higher at the high fertilizer level than at the low fertilizer level. The confidence interval contains only positive values (does not contain 0), reflecting the conclusion that the mean yield is signifi-cantly higher at the higher level of fertilizer. This agrees with the P-value falling below 0.05 in the test for the fertilizer effect.

Insight The estimated means are not the same as the sample means in Table 14.9 . (Both sets are shown again in the margin.) The model smooths the sample means so that the difference between the estimated means for two categories of a factor is exactly the same at each category of the other factor. For exam-ple, from the estimated group means found previously, the fertilizer effect of 1.9 = 13.5 - 11.6 = 15.5 - 13.6.

Try Exercise 14.28

Estimated Means

Fertilizer

Manure Low High

Low 11.6 13.5

High 13.6 15.5

Table 14.9 : Sample Means

Fertilizer

Manure Low High

Low 11.3 13.9

High 14.0 15.1

The regression model assumes that the difference between means for the two categories for one factor is the same in each category of the other factor. The next section shows how to check this assumption. When it is reasonable, we can use a single comparison, rather than a separate one at each category of the other variable. In Example 10 , we estimated the difference in mean corn yield between the pair of fertilizer levels, a single comparison (with estimate 1.9) holding at each level of manure.

Exploring Interaction Between Factors in Two-Way ANOVA Investigating whether interaction occurs is important whenever we analyze mul-tivariate relationships. No interaction between two factors means that the effect of either factor on the response variable is the same at each category of the other factor. The regression model in Example 10 and the ANOVA tests of main

Recall Section 13.5 introduced the concept of interaction between two explanatory variables in their effects on a response variable. In a regression context, no interaction implied parallel lines (common slopes). !

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 29: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 707

effects assume there is no interaction between the factors. What does this mean in this context?

From Example 10 , the estimated mean corn yields for the regression model having an indicator variable for fertilizer level and an indicator variable for manure level are shown in the table in the margin.

What pattern do these show? Let’s plot the means for the two fertilizer levels, within each level of manure. Figure 14.5 shows a plot in which the y -axis gives estimated mean corn yields, and points are shown for the four fertilizer–manure combinations. The horizontal axis is not a numerical scale but merely lists the two fertilizer levels. The drawn lines connect the means for the two fertilizer levels, for a given manure level. The absence of interaction is indicated by the parallel lines.

Fertilizer

Manure Low High

Low 11.6 13.5

High 13.6 15.5

Mean Corn Yield

10

15

Fertilizerlow

Fertilizerhigh

Manure low

Manure high

! Figure 14.5 Mean Corn Yield, by Fertilizer and Manure Levels, Showing No Interaction. The parallel lines reflect an absence of interaction. This implies that the difference in estimated means between the two fertilizer levels is the same for each manure level. Question Is it also true that the difference in estimated means between the two manure levels is the same for each fertilizer level?

The parallelism of lines occurs because the difference in the estimated mean corn yield between the high and low levels of fertilizer is the same for each manure level. The difference equals 1.9. Also, the difference between the high and low levels of manure in the estimated mean corn yield is 2.0 for each fertilizer level.

By contrast, Table 14.13 and Figure 14.6 show a set of means for which there is interaction. The difference between the high and low levels of fertilizer in the mean corn yield is 14 - 10 = 4 for low manure and 12 - 16 = -4 for high manure. Here, the difference in means depends on the manure level: According to these means, it’s better to use a high level of fertilizer when the manure level is low but it’s better to use a low level of fertilizer when the manure level is high. Similarly, the manure effect differs at the two fertilizer levels; for the low level, it is 16 - 10 = 6 whereas at the high level, it is 12 - 14 = -2. The lines in Figure 14.6 are not parallel.

Table 14.13 Means that Show Interaction Between the Factors in Their Effects on the Response The effect of fertilizer differs according to whether manure level is low or high. See Figure 14.6 .

Fertilizer

Manure Low High

Low 10 14

High 16 12

In Table 14.13 , suppose the numbers of observations are the same for each group. Then the overall mean corn yield, ignoring manure level, is 13 for each fer-tilizer level (the average of 10 and 16, and the average of 14 and 12). The overall

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 30: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

708 Chapter 14 Comparing Groups: Analysis of Variance Methods

difference in means between the two fertilizer levels then equals 0. A one-way analysis of mean corn yield by fertilizer level would conclude that fertilizer level has no effect. However, a two-way analysis that allows for interaction would detect that fertilizer has an effect, but that effect differs according to the manure level.

Testing for Interaction In conducting a two-way ANOVA, before testing the main effects, it is custom-ary to test a third null hypothesis stating that there is no interaction between the factors in their effects on the response. The test statistic providing the sample evidence of interaction is

F =MS for interaction

MS error.

When H0 is false, the F statistic tends to be large. Therefore, as usual, the P-value is the right-tail probability.

Mean corn yield

10

15

Fertilizerlow

Fertilizerhigh

Manure low

Manure high

! Figure 14.6 Mean Corn Yield, by Fertilizer and Manure Levels, Displaying Interaction. Question What aspect of the plot reflects the interaction?

Testing for interaction

Example 11

Corn Yield Data Picture the Scenario Let’s return to our analysis of the corn yield data, summarized in the margin table. In Example 10 we analyzed these data assuming no interaction. Let’s see if that analysis is valid. Table 14.14 is an ANOVA table for a model that allows interaction in assessing the effects of fertilizer level and manure level on the mean corn yield.

Table 14.14 Two-Way ANOVA of Mean Corn Yield by Fertilizer Level and Manure Level, Allowing Interaction

Source DF SS MS F P

Fertilizer 1 17.67 17.67 6.37 0.023

Manure 1 19.21 19.21 6.92 0.018

Interaction 1 3.04 3.04 1.10 0.311

Error 16 44.40 2.78

Total 19 84.32 F statistic for test of no interaction

3.04

2.78

Corn Yields

Fertilizer Level

Manure Level Mean

Std. Dev.

High High 15.1 1.3

High Low 13.9 1.7

Low High 14.0 1.8

Low Low 11.3 1.9

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 31: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 709

It is not meaningful to test the main effects hypotheses when there is interaction. A small P-value in the test of H0: no interaction suggests that each factor has an effect, but the size of effect for one factor varies according to the category of the other factor. Then, you should investigate the nature of the interaction by plot-ting the sample cell means, using a plot like Figure 14.6 . You should compare categories of one factor separately at different levels of the other factor.

Question to Explore Give the result of the test of H0: no interaction, and interpret.

Think It Through The test statistic for H0: no interaction is

F = (MS for interaction)/(MS error) = 3.04 / 2.78 = 1.10.

Based on the F distribution with df1 = 1 and df2 = 16 for these two mean squares, the ANOVA table reports P@value = 0.31. This is not much evi-dence of interaction. We would not reject H0 at the usual significance levels, such as 0.05.

Insight Because there is not much evidence of interaction, we are justified in con-ducting the simpler two-way ANOVA about main effects. The tests pre-sented previously in Table 14.10 for effects of fertilizer and manure on mean corn yield are valid.

Try Exercise 14.29

In Practice Check Interaction Before Main Effects In practice, in two-way ANOVA you should first test the hypothesis of no interaction. If the evidence of interaction is not strong (that is, if the P-value is not small), then test the main effects hypotheses and/or construct confidence intervals for those effects. But if important evidence of interaction exists, plot and compare the cell means for a factor separately at each category of the other factor.

For comparing means in two cells using a confidence interval, use the for-mula from the box at the beginning of Section 14.2 , shown again in the margin. Substitute the cell sample sizes for ni and nj and use the MS error for the two-way ANOVA that allows interaction.

Recall From the box at the beginning of Section 14.2 , the 95% confidence interval comparing two means is

(yi - yj) { t.025 se, where se = sB 1ni

+ 1nj

,

s is the square root of the MS error, and df for the t distribution is df for MS error. !

Interactions and confidence interval

Example 12

Political Ideology by Gender and Race Picture the Scenario In most years, the General Social Survey asks subjects to report their politi-cal ideology, measured with seven categories in which 1 = extremely liberal, 4 = moderate , 7 = extremely conservative. Table 14.15 shows results from the 2008 General Social Survey on mean political ideology classified by gen-der and by race.

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 32: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

710 Chapter 14 Comparing Groups: Analysis of Variance Methods

Table 14.15 Mean Political Ideology by Gender and by Race

Race

Gender Black White

Female 4.164 ( n = 165 ) 4.268 ( n = 840 )

Male 3.819 ( n = 116 ) 4.444 ( n = 719 )

For the test of H0: no interaction, software reports an F test statistic of 6.19 with df1 = 1 and df2 = 1836, for a P-value of 0.013. So, in comparing females and males on their mean political ideology, we should do it sepa-rately by race. The MS error for the model allowing interaction equals 2.534, so the residual standard deviation is s = 22.534 = 1.592.

Analysis of Variance for POLVIEWS, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

SEX 1 5.138 1.652 1.652 0.65 0.420

RACE 1 24.786 30.780 30.780 12.15 0.001

SEX *RACE 1 15.693 15.693 15.693 6.19 0.013

Error 1836 4651.981 4651.981 2.534

Total 1839 4697.598

Questions to Explore

a. Interpret the significant interaction by comparing sample mean politi-cal ideology of females and males for each race descriptively and using 95% confidence intervals.

b. Interpret the confidence intervals derived in part a.

Think It Through

a. The sample means show that for black subjects, females are more conservative (have the higher mean). By contrast, for white subjects, males are more conservative. For a confidence interval comparing mean political ideology for females and males who are black, the stan-dard error (using the sample sizes reported in the table) is

se = sA 1n for black females

+ 1n for black males

=

1.592A 1165

+ 1116

= 0.193.

The 95% confidence interval is

(4.164 - 3.819) { 1.96(0.193), which is 0.345 { 0.378, or

(-0.03, 0.72).

Likewise, you can find that the 95% confidence interval compar-ing mean political ideology for females and males who are white is -0.1768 { 0.159, or (-0.33, -0.02).

b. Since the confidence interval for black subjects contains zero we cannot infer that there is a difference in the populations. For white subjects, however, all values in the interval are negative. We infer that white females are less conservative than white males in the population.

CautionWhen conducting a two-way ANOVA for samples of different sizes, the analysis (using software) will often have to be performed as a General Linear Model. Notice that the Analysis of Variance table here shows both a Seq SS and an Adj SS column. You should use the Adjusted SS and Adjusted MS for the tests.

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 33: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 711

Why Not Instead Perform Two Separate One-Way ANOVAs? When you have two factors, rather than performing a two-way ANOVA, why not instead perform two separate one-way ANOVAs? For instance, in Example 12 you could compare the mean political ideology for females and males using a one-way ANOVA, ignoring the information about race. Likewise, you could per-form a separate one-way ANOVA to compare the means for blacks and whites, ignoring the information about gender.

The main reason is that you learn more from two-way ANOVA. The two-way ANOVA indicates whether there is interaction. When there is, as in Example 12 , it is more informative to compare levels of one factor separately at each level of the other factor. This enables us to investigate how the effect depends on that other factor. For instance, a one-way ANOVA of mean political ideology by gen-der might show no gender effect, whereas the two-way ANOVA has shown that there is a gender effect but it varies by race.

Similarly, in experimental studies, rather than carrying out one experiment to investigate the effect of one factor and a separate experiment to investigate the effect of a second factor, it is better to carry out a single experiment and study both factors at the same time. Besides the advantage of being able to check for interaction, this is more cost effective. If we have funds to experiment with 100 subjects, we can use a sample size of 100 for studying each factor with a two-way ANOVA, rather than use 50 subjects in one experiment about one factor and 50 subjects in a separate experiment about the other factor.

Yet another benefit of a two-way ANOVA is that the residual variability, which affects the MS error and the denominators of the F test statistics, tends to decrease. When we use two factors to predict a response variable, we usually tend to get better predictions (that is, less residual variability) than when we use one factor. With less residual (within-groups) variability, we get larger test statis-tics, and hence greater power for rejecting false null hypotheses.

Factorial ANOVA The methods of two-way ANOVA extend to the analysis of several factors. A multifactor ANOVA with observations from all combinations of the factors is called factorial ANOVA . For example, with three factors, three-way ANOVA considers main effects for those factors as well as possible interactions.

Insight The confidence interval for white subjects has an endpoint that is close to 0. So, the true gender effect in the population could be small.

Try Exercise 14.33

In Practice Use Regression With Categorical and Quantitative Predictors With several explanatory variables, usually some are categorical and some are quantitative. Then, it is sensible to build a multiple regression model containing both types of predictors. That’s what we did in Example 10 in the previous chapter when we modeled house selling prices in terms of house size and condition of the house.

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 34: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

712 Chapter 14 Comparing Groups: Analysis of Variance Methods

14.22 Reducing cholesterol An experiment randomly assigns 100 subjects suffering from high cholesterol to one of four groups: low-dose Lipitor, high-dose Lipitor, low-dose Zocor, high-dose Zocor. After three months of treatment, the change in cholesterol level is measured. a. Identify the response variable and the two factors. b. What are four treatments that can be compared with

this experiment? c. What comparisons are relevant when we control for

dose level? 14.23 Drug main effects For the previous exercise, show a

hypothetical set of population means for the four groups that would have a. A dose effect but no drug effect. b. A drug effect but no dose effect. c. A drug effect and a dose effect. d. No drug effect and no dose effect.

14.24 Reasons for divorce The 26 students in a statistics class at the University of Florida were surveyed about their attitudes toward divorce. Each received a response score according to how many from a list of 10 possible reasons were regarded as legitimate for a woman to seek a divorce. The higher the score, the more willing the subject is to accept divorce as an option in a variety of circumstances. The students were also asked whether they were funda-mentalist or nonfundamentalist in their religious beliefs and whether their church attendance frequency was fre-quent (more than once a month) or infrequent. The table displays the data and the results of a two-way ANOVA. a. State the null hypothesis to which the F test statistic in

the religion row refers. b. Show how to use mean squares to construct the F test

statistic for the religion main effect, report its P-value, and interpret.

Religion Fundamentalist Non fundamentalist

Church Frequent 0, 3, 4, 0, 3, 2, 0, 1, 1, 3 2, 5, 1, 2, 3, 3 Attendance Infrequent 4, 3, 4 6, 8, 6, 4, 6, 3, 7, 4

Source DF SS MS F P Religion 1 11.07 11.07 5.20 0.032 Church_attend 1 36.84 36.84 17.32 0.000 Error 23 48.93 2.13 Total 25 117.12

14.25 House prices, age, and bedrooms For the House Selling Prices OR data file on the text CD, the output shows the result of conducting a two-way ANOVA of house selling prices (in thousands) by the number of bedrooms in the house and the age (New, Medium, Old, Very Old—see exercise 14.12) of the houses in Corvallis, Oregon. a. For testing the main effect of age, report the F test

statistic value, and show how it was formed from other values reported in the ANOVA table.

b. Report the P-value for the main effect test for age, and interpret.

14.3 Practicing the Basics

Source DF SS MS F P Bedrooms 7 517868 68523 6.79 0.000 Age Condition 3 243063 81021 8.03 0.000 Error 189 1907939 10095 Total 199 2668870

14.26 Corn and manure In Example 10 , the coefficient of the manure-level indicator variable m is 1.96. a. Explain why this coefficient is the estimated difference

in mean corn yield between the high and low levels of manure, for each level of fertilizer.

b. Explain why the 95% confidence interval for the dif-ference in mean corn yield between the high and low levels of manure is 1.96 { 2.11(0.747).

14.27 Hang up if message repeated? Example 2 described an experiment in which telephone callers to an airline were put on hold with an advertisement, Muzak, or classical music in the background. Each caller who was chosen was also randomly assigned to a category of a second factor: Whether the message played was five minutes long or ten minutes long. (In each case, it was repeated at the end.) The table shows the data classified by both factors and the results of a two-way ANOVA.

Telephone holding times by type of recorded message and repeat time

Repeat Time

Message Ten Minutes Five Minutes

Advertisement 8, 11, 2 5, 1 Muzak 1, 4, 3 0, 6 Classical 13, 8, 15 7, 9

Source DF SS MS F P Message 2 149.20 74.60 7.09 0.011 Repeat 1 23.51 23.51 2.24 0.163 Error 11 115.69 10.52 Total 14 288.40

a. State the null hypothesis to which the F test statistic in the Message row refers.

b. Show how to use mean squares to construct the F test statistic for the Message main effect, report its P-value, and interpret.

c. On what assumptions is this analysis based? 14.28 Regression for telephone holding times Refer to the

previous exercise. Let x1 = 1 for the advertisement and 0 otherwise, x2 = 1 for Muzak and 0 otherwise, and x1 = x2 = 0 for classical music. Likewise, let x3 = 1 for repeating in 10-minute cycles and x3 = 0 for repeating in 5-minute cycles. The display shows results of a regression of the telephone holding times on these indicator variables.

Regression for telephone holding times

Predictor Coef SE Coef T P Constant 8.867 1.776 4.99 0.000 x1 -5.000 2.051 -2.44 0.033 x2 -7.600 2.051 -3.71 0.003 x3 2.556 1.709 1.50 0.163

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 35: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Section 14.3 Two-Way ANOVA 713

a. State the prediction equation. Interpret the parameter estimates.

b. Find the estimated means for the six groups in the two-way cross-classification of message type and repeat time.

c. Find the estimated difference between the mean hold-ing times for 10-minute repeats and 5-minute repeats, for a fixed message type. How can you get this estimate from a coefficient of an indicator variable in the pre-diction equation?

14.29 Interaction between message and repeat time? Refer to the previous two exercises. When we allow interaction, two of the rows in the new ANOVA table are

Source DF SS MS F P Group * Repeat 2 15.02 7.51 0.67 0.535 Error 9 100.67 11.19

where Group*Repeat denotes the interaction effect. a. Show the hypotheses, test statistic, and P-value for the

test of H0: no interaction. b. What is the implication of the result of this test? For

instance, the analyses in previous exercises assumed a lack of interaction. Was it valid to do so?

14.30 Income by gender and job type In 2000, the popula-tion mean hourly wage for males was $22 for white-collar jobs, $11 for service jobs, and $14 for blue-collar jobs. For females the means were $15 for white-collar jobs, $8 for service jobs, and $10 for blue-collar jobs. 3 a. Identify the response variable and the two factors. b. Show these means in a two-way classification of the

two factors. c. Compare the differences between males and females

for (i) white-collar jobs and (ii) blue-collar jobs. Explain why there is interaction, and describe it.

d. Show a set of six population mean incomes that would satisfy H0: no interaction.

14.31 Ideology by gender and race Refer to Example 12 , the sample means from which are shown again below.

Mean political Ideology

Race Gender Black White Female 4.164 4.2675 Male 3.819 4.4443

a. Explain how to obtain the following interpretation for the interaction from the sample means: “For females there is no race effect on ideology. For males, whites are more conservative by about half an ideology cat-egory, on the average.”

b. Suppose that instead of the two-way ANOVA, you performed a one-way ANOVA with gender as the predictor and a separate one-way ANOVA with race as the predictor. Suppose the ANOVA for gender does not show a significant effect. Explain how this could happen, even though the two-way ANOVA showed a gender effect for each race. ( Hint: Will the overall sample means for females and males be more similar than they are for each race?)

c. Refer to part b. Summarize what you would learn about the gender effect from a two-way ANOVA that you would fail to learn from a one-way ANOVA.

14.32 Attractiveness and getting dates The results in the table are from a study of physical attractiveness and subjective well-being (E. Diener et al., Journal of Personality and Social Psychology , vol. 69, 1995, pp. 120–129). A panel rated a sample of college students on their physical attrac-tiveness. The table presents the number of dates in the past three months for students rated in the top or bottom quartile of attractiveness. a. Identify the response variable and the factors. b. Do these data appear to show interaction? Explain. c. Based on the results in the table, specify one of the

ANOVA assumptions that these data violate. Is this the most important assumption?

Dates and attractiveness Number of DATES,

MEN Number of DATES,

WOMEN ATTRACTIVENESS Mean Std. Dev n Mean Std. Dev n More 9.7 10.0 35 17.8 14.2 33 Less 9.9 12.6 36 10.4 16.6 27

14.33 Diet and weight gain A randomized experiment 4 mea-sured weight gain (in grams) of male rats under six diets varying by source of protein (beef, cereal, pork) and level of protein (high, low). Ten rats were assigned to each diet. The data are shown in the table that follows and are also available in the Protein and Weight Gain data file on the book’s CD. a. Conduct a two-way ANOVA that assumes a lack of

interaction. Report the F test statistic and the P-value for testing the effect of the protein level. Interpret.

b. Now conduct a two-way ANOVA that also considers potential interaction. Report the hypotheses, test sta-tistic and P-value for a test of no interaction. What do you conclude at the 0.05 significance level? Explain.

c. Refer to part b. Allowing interaction, construct a 95% confidence interval to compare the mean weight gain for the two protein levels, for the beef source of protein.

Weight gain by source of protein and by level of protein

High Protein Low Protein Beef 73, 102, 118, 104, 81, 90, 76, 90, 64, 86,

107, 100, 87, 117, 111 51, 72, 90, 95, 78 Cereal 98, 74, 56, 111, 95, 107, 95, 97, 80, 98,

88, 82, 77, 86, 92 74, 74, 67, 89, 58 Pork 94, 79, 96, 98, 102, 49, 82, 73, 86, 81,

102, 108, 91, 120, 105 97, 106, 70, 61, 82

14.34 Regression of weight gain on diet Refer to the previous exercise. a. Set up indicator variables for protein source and for

protein level, and specify a regression model with the effects both of protein level and protein source on weight gain.

4 Source: Data from G. Snedecor and W. Cochran, Statistical Methods , 6th ed. (Iowa State University Press, 1967), p. 347.

3 Source : Data from The State of Working America 2000–2001 , Economic Policy Institute.

000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM000200010271675566_CH24.pdf 7/18/2012 11:17:09 AM

Page 36: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

b. Fit the model in part a, and explain how to interpret the parameter estimate for the protein level indicator variable.

c. Show how you could test a hypothesis about beta parameters in the model in part a to analyze the effect of protein source on weight gain.

d. Using the fit of the model, find the estimated mean for each of the six diets. Explain what it means when we say that these estimated means do not allow for inter-action between protein level and source in their effects on weight loss.

714 Chapter 14 Comparing Groups: Analysis of Variance Methods

ANSWERS TO THE CHAPTER FIGURE QUESTIONS Figure 14.1 No, even if the population means are equal we would expect the sample means to vary due to sample to sample variability. Differences among the sample means is not sufficient evidence to con-clude that the population means differ.

Figure 14.2 The data in Figure 14.2b give stronger evidence against H0 because Figure 14.2b has less variability within each sample than Figure 14.2a .

Figure 14.3 Evidence suggesting the normal with equal standard devia-tion assumptions are violated would be (1) graphical methods showing

extreme skew for the response variable, or (2) the largest group standard deviation is more than about double the smallest group standard devia-tion when the sample sizes are unequal.

Figure 14.5 Yes. The parallelism of lines implies that no interaction holds no matter which factor we choose for making comparisons of means.

Figure 14.6 The lines are not parallel.

Analysis of variance (ANOVA) methods compare several groups according to their means on a quantitative response variable. The groups are categories of categorical explanatory variables. A categorical explanatory variable is also called a factor .

! The one-way ANOVA F test compares means for a single factor. The groups are specified by categories of a single cat-egorical explanatory variable.

! Two-way ANOVA methods compare means across catego-ries of each of two factors, at fixed levels of the other fac-tor. When there is interaction between the factors in their effects, differences between response means for categories of one factor change according to the category of the other factor.

! Multiple comparison methods such as the Tukey method compare means for each pair of groups while controlling the overall confidence level.

! Analysis of variance methods can be conducted by using mul-tiple regression models. The regression model uses indicator variables as explanatory variables to represent the factors. Each indicator variable equals 1 for observations from a par-ticular category and 0 otherwise.

! The ANOVA methods assume randomization and that the population distribution for each group is normal, with the same standard deviation for each group. In practice, the ran-domness assumption is the most important , and ANOVA methods are robust to moderate violations of the other assumptions.

CHAPTER SUMMARY

Practicing the Basics

14.35 Good friends and marital status Is the number of good friends associated with marital status? For GSS data with marital status measured with the categories (married, widowed, divorced, separated, never mar-ried), an ANOVA table reports F = 0.80 based on df1 = 4, df2 = 612.

a. Introduce notation, and specify the null hypothesis and the alternative hypothesis for the ANOVA F test.

b. Based on what you know about the F distribution, would you guess that the test statistic value of 0.80

provides strong evidence against the null hypothesis? Explain.

c. Software reports a P-value of 0.53. Explain how to inter-pret it.

14.36 Going to bars and having friends Do people who go to bars and pubs a lot tend to have more friends? A recent GSS asked, “How often do you go to a bar or tavern?” The table shows results of ANOVA for comparing the mean number of good friends at three levels of this vari-able. The very often group reported going to a bar at least several times a week. The occasional group reported going occasionally, but not as often as several times a week.

CHAPTER PROBLEMS

Chapter Review

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM

Page 37: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Chapter Problems 715

14.39 Compare bumpers Refer to the previous exercise.

a. Find the margin of error for constructing a 95% con-fidence interval for the difference between any pair of the true means. Interpret by showing which pairs of bumpers (if any) are significantly different in their true mean repair costs.

b. For Tukey 95% multiple comparison confidence intervals, the margin of error is 8.4. Explain the dif-ference between confidence intervals formed with this method and separate confidence intervals formed with the method in part a.

c. Set up indicator variables for a multiple regression model including bumper type.

d. The prediction equation for part c is yn = 13 - 11x1 - 10x2. Explain how to interpret the three parameter estimates in this model, and show how these estimates relate to the sample means for the three bumpers.

14.40 Segregation by region Studies of the degree of residen-tial racial segregation often use the segregation index. This is the percentage of nonwhites who would have to change the block on which they live to produce a fully nonseg-regated city—one in which the percentage of nonwhites living in each block is the same for all blocks in the city. This index can assume values ranging from 0 to 100, with higher values indicating greater segregation. (The national average for large U.S. metropolitan areas in 2009 was 27, down from 33 in 2000.) The table shows the index for a sample of cities for 2005–2009, classified by region.

Segregation index

Northeast North Central South West

Boston: 67 Minneapolis-St. Paul: 56

New Orleans: 64

San Francisco-Oakland: 64

NY, Long Island, Northern NJ: 79 Detroit: 80 Tampa: 58

Dallas-Ft Worth: 57

Philadelphia: 69 Chicago: 78 Miami: 66 Los Angeles: 70 Pittsburgh: 68 Milwaukee: 81 Atlanta: 60 Seattle: 54

Source: Racial and Ethnic Residential Segregation in the United States, 1980–2000, U.S. Census Bureau Series CENSR-3, 2002. www.psc.isr.umich.edu/dis/census/segregation.html .

a. Report the mean and standard deviation for each of the four regions.

b. Define notation, and state the hypotheses for one-way ANOVA.

c. Report the F test statistic and its P-value. What do you conclude about the mean segregation indices for the four regions?

d. Suppose we took these data from the Census Bureau report by choosing only the cities in which we know people. Is the ANOVA valid? Explain.

14.41 Compare segregation means Refer to the previous exercise.

a. Using software, find the margin of error that pertains to each comparison using the Tukey method for 95% multiple comparison confidence intervals.

b. Using part a, determine which pairs of means, if any, are significantly different.

Summary of ANOVA for mean number of good friends and going to bars

Very often Occasional Never Mean 12.1 6.4 6.2 Standard dev. 21.3 10.2 14.0 Sample size 41 166 215

Source DF SS MS F P Group 2 1116.8 558.4 3.03 0.049 Error 419 77171.8 184.2 Total 421 78288.5

a. State the (i) hypotheses, (ii) test statistic value, and (iii) P-value for the significance test displayed in this table. Interpret the P-value.

b. Based on the assumptions needed to use the method in part a, is there any aspect of the data summarized here that suggests that the ANOVA test and follow-up confi-dence intervals may not be appropriate? Explain.

14.37 TV watching The 2008 General Social Survey asked 1324 subjects how many hours per day they watched TV, on the average. Are there differences in population means according to the race of the subject (white, black, other)? The sample means were 2.76 for whites (n = 1014), 4.38 for blacks (n = 188), and 2.70 for other (n = 122). In a one-way ANOVA, the between-groups estimate ofthe variance is 215.26 and the within-groups estimate is 6.76. a. Conduct the ANOVA test and make a decision using

0.05 significance level. b. The 95% confidence interval comparing the popu-

lation means is (1.1, 2.2) for blacks and whites, (-0.4, 0.5) for whites and the other category, and (1.0, 2.4) for blacks and the other category. Based on the three confidence intervals, indicate which pairs of means are significantly different. Interpret.

c. Based on the information given, show how to con-struct the confidence interval that compares the population mean TV watching for blacks and whites.

d. Refer to part c. Would the corresponding interval formed with the Tukey method be wider, or narrower? Explain why.

14.38 Comparing auto bumpers A consumer organization compares the sturdiness of three types of front bumpers. In the study, a particular brand of car is driven into a con-crete wall at 15 miles per hour. The response is the amount of damage, as measured by the repair costs, in hundreds of dollars. Due to the potentially large costs, the study conducts only two tests with each bumper type, using six cars. The table shows the data and some ANOVA results. Show the (a) assumptions, (b) hypotheses, (c) test statistic and df values, (d) P-value, and (e) interpretation for test-ing the hypothesis that the true mean repair costs are the same for the three bumper types.

Bumper A Bumper B Bumper C 1 2 11 3 4 15

Source DF SS MS F P Bumper 2 148.00 74.00 18.50 0.021 Error 3 12.00 4.00 Total 5 160.00

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM

Page 38: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

14.42 Georgia political ideology The Georgia Student Survey file on the text CD asked students their political party affiliation (1 = Democrat, 2 = Republican, 3 = Independent ) and their political ideology (on a scale from 1 = very liberal to 7 = very conservative ). The table shows results of an ANOVA, with political ideology as the response variable.

Georgia political ideology and party affiliation

Political ideology

Affiliation N Mean StDev

Democrat 8 2.6250 1.0607 Republican 36 5.5000 1.0000 Independent 15 3.4667 0.9155

Source DF SS MS F P PoliticalAff 2 79.629 39.814 40.83 0.000 Error 56 54.608 0.975 Total 58 134.237

a. Does the ANOVA assumption of equal population standard deviations seem plausible, or is it so badly violated that ANOVA is inappropriate?

b. Are the population distributions normal? Why or why not? Which is more important, the normality assumption or the assumption that the groups are random samples from the population of interest?

c. The next table shows 95% confidence intervals com-paring pairs of means. Interpret the confidence inter-val comparing Republicans and Democrats.

Affiliations

95% CI

Lower Center Upper

Republican–Democrat 2.10 2.87 3.65 Independent–Democrat -0.02 0.84 1.71 Republican–Independent 1.43 2.03 2.64

d. Explain how you would summarize the results of the ANOVA F test and the confidence intervals to some-one who has not studied statistics.

14.43 Comparing therapies for anorexia The Anorexia data file on the text CD shows weight change for 72 anorexic teenage girls who were randomly assigned to one of three psychological treatments.

a. Show how to construct 95% confidence intervals to investigate how the population means differ. Interpret them.

b. Report the Tukey 95% multiple comparison confidence intervals. Interpret, and explain why they are wider than the confidence intervals in part a.

14.44 Lot size varies by region? A geographer compares resi-dential lot sizes in four quadrants of a city. To do this, she randomly samples 300 records from a city file on home residences and records the lot sizes (in thousands of square feet) by quadrant. The ANOVA table, shown in the table that follows, refers to a comparison of mean lot sizes for the northeast (NE), northwest (NW), southwest (SW), and southeast (SE) quadrants of the city. Fill in all the blanks in the table.

Lot sizes by quadrant of city Source DF SS MS F P-value Quadrant _____ _____ _____ _____ 0.000 Error 296 1480 _____ Total _____ 4180

14.45 House with garage Refer to the House Selling Price OR data file on the text CD.

a. Set up an indicator variable for whether a house has a garage or not. Using software, put this as the sole predictor of house selling price (in thousands) in a regression model. Report the prediction equation, and interpret the intercept and slope estimates.

b. For the model fitted in part a, conduct the t test for the effect of the indicator variable in the regression analysis (that is, test H0: ! = 0 ). Interpret.

c. Use software to conduct the F test for the analysis of variance comparing the mean selling prices of homes with and without a garage. Interpret.

d. Explain the connection between the value of t in part b and the value of F in part c.

14.46 Ideal number of kids by gender and race The GSS asks, “What is the ideal number of kids for a family?” When we use a recent GSS to evaluate how the mean response depends on gender and race (black or white), we get the results shown in the ANOVA table.

a. Identify the response variable and the factors.

b. Explain what it would mean if there were no interac-tion between gender and race in their effects. Show a hypothetical set of population means that would show a strong race effect and a weak gender effect and no interaction.

c. Using the table, specify the null and alternative hypoth-eses, test statistic, and P-value for the test of no interac-tion. Interpret the result.

ANOVA of ideal number of kids by gender and race Source DF SS MS F P Gender 1 0.25 0.25 0.36 0.546 Race 1 16.98 16.98 24.36 0.000 Interaction 1 0.95 0.95 1.36 0.244 Error 1245 867.72 0.70 Total 1248 886.12

14.47 Regress kids on gender and race Refer to the previous exercise. Let f = 1 for females and 0 for males, and let r = 1 for blacks and 0 for whites. The regression model for predicting y = ideal number of kids is yn = 2.42 + 0.04f + 0.37r.

a. Interpret the coefficient of f. What is the practical implication of this estimate being so close to 0?

b. Find the estimated mean for each of the four combi-nations of gender and race.

c. Summarize what you learn about the effects based on the analyses in this and the previous exercise.

14.48 Florida student data For the FL Student Survey data file on the text CD, we use as the response variable sports (the number of weekly hours engaged in sports

716 Chapter 14 Comparing Groups: Analysis of Variance Methods

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM

Page 39: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

Chapter Problems 717

and other physical exercise). For the explanatory vari-ables, we use gender and whether the student is a veg-etarian. The output shows results of a two-way ANOVA.

a. State the hypotheses for the vegetarian main effect.

b. Show how the F test statistic for part a was obtained from mean squares reported in the ANOVA table.

c. Report and interpret the P-value of the test for the veg-etarian main effect.

Source DF SS MS F P vegetarian 1 7.89 7.89 0.56 0.457 gender 1 64.35 64.35 4.59 0.037 Error 57 799.80 14.03 Total 59 878.98

14.49 Regress TV watching on gender and religion When we use a recent GSS and regress y = number of hours per day watching TV on g = gender (1 = male, 0 = female) and religious affiliation ( r1 = 1 for Protestant, r2 = 1 for Catholic, r3 = 1 for Jewish, r1 = r2 = r3 = 0 for none or other), we get the prediction equation TVHOURS = 2.65 - 0.186 g + 0.666r1 + 0.373r2 - 0.270r3

a. Interpret the gender effect.

b. Interpret the coefficient of r 1 .

c. State a corresponding model for the population, and indicate which parameters would need to equal zero for the response variable to be independent of religious af-filiation, for each gender.

14.50 Income, gender, and education According to the U.S. Census Bureau, as of March 2009, the average earnings of full-time workers was estimated to be $31,666 for females with high school education, $43,493 for males with high school education, $60,293 for white females with a bach-elor’s degree, and $94,206 for males with a bachelor’s degree.

a. Identify the response variable and the two factors.

b. Show these means in a two-way classification of the factors.

c. Compare the differences between males and females for (i) high school graduates and (ii) college graduates. If these are close estimates of the population means, explain why there is interaction. Describe its nature.

14.51 Birth weight, age of mother, and smoking A study on the effects of prenatal exposure to smoke (by J. Nigg and N. Breslau, Journal of the American Academy of Child & Adolescent Psychiatry , vol. 46, 2009, pp. 362–369) indi-cated that mean birth weight was significantly lower for babies born to mothers who smoked during pregnancy. It also suggested that increasing age of the mother resulted in increased effects. Explain how this suggests interaction between smoking status and age of mother in their effects on birth weight of the child.

14.52 TV watching by gender and race When we use the 2008 GSS to evaluate how the mean number of hours a day watching TV depends on gender and race, we get the results shown in the ANOVA table that follows.

a. Identify the response variable and the factors.

b. From the table, specify the test statistic and P-value for the test of no interaction. Interpret the result.

c. Is there a significant (i) gender effect and (ii) race effect? Explain.

d. The sample means were 2.82 for white females, 2.68 for white males, 4.52 for black females, and 4.19 for black males. Explain how these results are compatible with the results of the tests discussed in part c.

Analysis of Variance for TVHOURS, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P RACE 1 419.60 399.12 399.12 58.32 0.000 SEX 1 8.74 8.74 8.74 1.28 0.259 RACE * SEX 1 1.42 1.42 1.42 0.21 0.649 Error 1198 8199.10 8199.10 6.84 Total 1201 8628.86 S = 2.61610 R@Sq = 4.98% R@Sq(adj) = 4.74%

14.53 Salary and gender The American Association of University Professors (AAUP) reports yearly on fac-ulty salaries for all types of higher education institu-tions across the United States. Regard Salary as the response variable, Gender as the explanatory variable, and Academic Rank as the control variable. A regression analysis using these data could include an indicator vari-able for Gender and an indicator variable for Rank. The estimated coefficients are -13 (thousands of dollars) for Gender ( x1 = 1 for female and x1 = 0 for male) and -40 (thousands of dollars) for a lesser assistant Rank ( x2 = 1 for assistant professor and x2 = 0 for professor). a. Interpret the coefficient for gender. b. At particular settings of the other predictors, the esti-

mated mean salary for female professors was 96.2 thou-sand dollars. Using the estimated coefficients, find the estimated means for (i) male professors and (ii) female assistant professors.

14.54 Political ideology by religion and gender The table shown summarizes responses on political ideology (where 1 = extremely liberal and 7 = extremely conservative) in a General Social Survey by religion and gender. The P-value is 0.414 for testing the null hypothesis of no interaction. Explain what this means in the context of this example. ( Hint : Is the difference between males and females in sample mean ideology about the same for each religion?)

Political ideology by religion and gender

Political Ideology

Religion Mean Std Dev. Mean Std. Dev.

Protestant Female 4.35 1.44 Male 4.50 1.39 Catholic Female 3.97 1.29 Male 4.04 1.34 Jewish Female 2.86 1.49 Male 3.50 1.16 None Female 3.25 1.31 Male 3.60 1.42

Concepts and Applications 14.55 Number of friends and degree Using the GSS Web site

sda.berkeley.edu/GSS, analyze whether the number of good friends (the variable NUMFRIEND ) depends on the subject’s highest degree (the variable DEGREE ). To do so, at this web address look under the Analysis tab to pick comparison of means and then enter the variable

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM

Page 40: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

names and check the ANOVA stats box. Prepare a short report summarizing your analysis and its interpretation.

14.56 Sketch within- and between-groups variability Sketch a dot plot of data for 10 observations in each of three groups such that a. You believe the P-value would be very small for

a one-way ANOVA. (You do not need to do the ANOVA; merely show points for which you think this would happen.)

b. The P-value in one-way ANOVA would not be espe-cially small.

14.57 A ! B and B ! C, but A 3 C? In multiple compari-sons following a one-way ANOVA with equal sample sizes, the margin of error with a 95% confidence interval for comparing each pair of means equals 10. Give three sample means illustrating that it is possible that Group A is not significantly different from Group B and Group B is not significantly different from Group C, yet Group A is significantly different from Group C.

14.58 Multiple comparison confidence For four groups, explain carefully the difference between a confidence level of 0.95 for a single comparison of two means and a confidence level of 0.95 for a multiple comparison of all six pairs of means.

14.59 Another Simpson paradox The 25 women faculty mem-bers in the humanities division of a college have a mean salary of $65,000, whereas the five women faculty in the science division have a mean salary of $72,000. The 20 men in the humanities division have a mean salary of $64,000, and the 30 men in the science division have a mean salary of $71,000. a. Construct a 2 * 2 table of sample mean incomes

for the table of gender and division of college. Use weighted averages to find the overall means for men and women.

b. Discuss how the results of a one-way comparison of mean incomes by gender would differ from the results of a two-way comparison of mean incomes by gender, controlling for division of college. ( Note : This reversal of which gender has the higher mean salary, according to whether one controls division of college, illustrates Simpson’s paradox. )

14.60 Multiple choice: ANOVA/regression similarities Analysis of variance and multiple regression have many similarities. Which of the following is not true? a. The response variable is quantitative for each. b. They both have F tests for testing that the response

variable is statistically independent of the explana-tory variable(s).

c. For inferential purposes, they both assume that the response variable y is normally distributed with the same standard deviation at all combinations of levels of the explanatory variable(s).

d. They are both appropriate when the main focus is on describing straight-line effects of quantitative explana-tory variables.

14.61 Multiple choice: ANOVA variability One-way ANOVA provides relatively more evidence that H0: µ1 = g= µg is false:

a. The smaller the between-groups variation and the larger the within-groups variation.

b. The smaller the between-groups variation and the smaller the within-groups variation.

c. The larger the between-groups variation and the smaller the within-groups variation.

d. The larger the between-groups variation and the larger the within-groups variation.

14.62 Multiple choice: Multiple comparisons For four means, it is planned to construct Tukey 95% multiple compari-son confidence intervals for the differences between the six pairs.

a. For each confidence interval, there is a 0.95 chance that the interval will contain the true difference.

b. The probability that all six confidence intervals will contain the true differences is 0.70.

c. The probability that all six confidence intervals will contain the true differences is about 0.95.

d. The probability that all six confidence intervals will con-tain the true differences is (0.95)6.

14.63 Multiple choice: Interaction There is interaction in a two-way ANOVA model when

a. The two factors are associated.

b. Both factors have significant effects in the model without interaction terms.

c. The difference in true means between two categories of one factor varies among the categories of the other factor.

d. The mean square for interaction is about the same size as the mean square error.

14.64 True or false: Interaction For subjects aged under 50, there is little difference in mean annual medical expenses for smokers and nonsmokers. For subjects aged over 50, there is a large difference. Is it true or false that there is interaction between smoking status and age in their effects on annual medical expenses.

14.65 What causes large or small F ? An experiment used four groups of five individuals each. The overall sample mean was 60.

a. What did the sample means look like if the one-way ANOVA for comparing the means had test statistic F = 0? (Hint: What would have to happen in order for the between-groups variability to be 0?)

b. What did the data look like in each group if F = infinity? ( Hint : What would have to happen in order for the within-groups variability to be 0?)

14.66 Between-subjects estimate This exercise motivates the formula for the between-subjects estimate of the variance in one-way ANOVA. Suppose each population mean equals µ (that is, H0 is true) and each sample size equals n. Then the sampling distribution of each yi has mean µ and variance !2/n, and the sample mean of the yi values is the overall sample mean, {y}.

a. Treating the g sample means as g observations hav-ing sample mean y, explain why !(yi - y)2/(g - 1)

718 Chapter 14 Comparing Groups: Analysis of Variance Methods

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM

Page 41: Comparing Groups: Analysis of Variance Methodsfaculty.chemeketa.edu/wbarber1/mth244/Agresti Chp 14.pdf684 Chapter 14 Comparing Groups: Analysis of Variance Methods means increases

estimates the variance !2/n of the distribution of the {yi} values.

b. Using part a, explain why the between-groups estimate !n(yi - y)2/(g - 1) estimates !2. (For the unequal sample size case, the formula replaces n by ni. )

14.67 Bonferroni multiple comparisons The Bonferroni theo-rem states that the probability that at least one of a set of events occurs can be no greater than the sum of the sepa-rate probabilities of the events. For instance, if the prob-ability of an error for each of five separate confidence intervals equals 0.01, then the probability that at least one confidence interval will be in error is no greater than (0.01 + 0.01 + 0.01 + 0.01 + 0.01) = 0.05.

a. Following Example 10 , construct a confidence inter-val for each factor and guarantee that they both hold with overall confidence level at least 95%. [ Hint : Each interval should use t.0125 = 2.46.]

b. Exercise 14.8 referred to a study comparing three groups (smoking status never, former, or current) on various personality scales. The study measured 35 per-sonality scales and reported an F test comparing the three smoking groups for each scale. The researchers mentioned doing a Bonferroni correction for the 35 F tests. If the nominal overall probability of Type I error was 0.05 for the 35 tests, how small did the P-value have to be for a particular test to be significant? ( Hint : What should the Type I error probability be for each of 35 tests in order for the overall Type I error probability to be no more than 0.05?)

14.68 Independent confidence intervals You plan to construct a 95% confidence interval in five different situations with independent data sets.

a. Assuming that the results of the confidence intervals are statistically independent, find the probability that all five confidence intervals will contain the param-eters they are designed to estimate. ( Hint : Use the binomial distribution.)

b. Which confidence level should you use for each confi-dence interval so that the probability that all five inter-vals contain the parameters equals exactly 0.95?

14.69 Regression or ANOVA? You want to analyze y = house selling price and x = number of bathrooms (1, 2, or 3) by testing whether x and y are independent.

a. You could conduct a test of independence using (i) the ANOVA F test for a multiple regression model with two indicator variables or (ii) a regression t test for the coefficient of the number of bathrooms when it is treated as a quantitative predictor in a straight-line regression model. Explain the difference between these two ways of treating the number of bathrooms in the analysis.

b. What do you think are the advantages and disadvan-tages of the straight-line regression approach to con-ducting the test?

c. Give an example of three population means for which the straight-line regression model would be less appro-priate than the model with indicator variables.

14.70 Three factors An experiment analyzed how the mean corn yield varied according to three factors: nitrogen-based fertilizer, phosphate-based fertilizer, and potash (potassium chloride)-based fertilizer, each applied at low and at high levels.

a. How many groups result from the different combina-tions of the three factors?

b. Defining indicator variables, state the regression model that corresponds to an ANOVA assuming a lack of interaction.

c. Give possible estimates of the model parameters in part b for which the estimated corn yield would be highest at the high level of nitrogen fertilizer and phosphate fertilizer and the low level of potash.

Student Activities

14.71 Student survey data Refer to the student survey data file that your class created with Activity 3 in Chapter 1 . For variables chosen by your instructor, use ANOVA methods and related inferential statistical analyses. Interpret and summarize your findings in a short report, and prepare to discuss your findings in class.

Chapter Problems 719

000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM000200010271675566_CH24.pdf 7/18/2012 11:17:10 AM