Comparing means of multiple groups - ANOVA (solutions to ...

30
Chapter 8 1 Chapter 8 Comparing means of multiple groups - ANOVA (solutions to exercise)

Transcript of Comparing means of multiple groups - ANOVA (solutions to ...

Page 1: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 1

Chapter 8

Comparing means of multiple groups- ANOVA (solutions to exercise)

Page 2: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 CONTENTS 2

Contents

8 Comparing means of multiple groups - ANOVA (solutions to exercise) 18.1 Environment action plans . . . . . . . . . . . . . . . . . . . . . . . 38.2 Environment action plans (part 2) . . . . . . . . . . . . . . . . . . . 68.3 Plastic film . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128.4 Brass alloys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148.5 Plastic tubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178.6 Joining methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208.7 Remoulade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268.8 Transport times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Page 3: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.1 ENVIRONMENT ACTION PLANS 3

8.1 Environment action plans

Exercise 8.1 Environment action plans

To investigate the effect of two recent national Danish aquatic environment ac-tion plans the concentration of nitrogen (measured in g/m3) have been mea-sured in a particular river just before the national action plans were enforced(1998 and 2003) and in 2011. Each measurement is repeated 6 times during ashort stretch of river. The result is shown in the following table:

N1998 N2003 N20115.01 5.59 3.026.23 5.13 4.765.98 5.33 3.465.31 4.65 4.125.13 5.52 4.515.65 4.92 4.42

Row mean 5.5517 5.1900 4.0483

Further, the total variation in the data is SST = 11.4944. You got the followingoutput from R corresponding to a one-way analysis of variance (where most ofthe information, however, is replaced by the letters A-E as well as U and V):

> anova(lm(N ~ Year))Analysis of Variance Table

Response: NDf SumSq MeanSq Fvalue Pr(>F)

Year A B C U VResiduals D 4.1060 E

a) What numbers did the letters A-D substitute?

Page 4: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.1 ENVIRONMENT ACTION PLANS 4

Solution

One should check the structure of the oneway ANOVA table, so A and D are thedegrees of freedom, A = k− 1 = 3− 1 = 2 and D = n− k = 18− 3 = 15. And B isthe treatment sum-of-squares

SS(Tr) = SST− SSE = 11.4944− 4.1060 = 7.3884,

And finally, C is the MS(Tr)-value

MS(Tr) = SS(Tr)/2 = 7.3884/2 = 3.6942.

b) If you use the significance level α = 0.05, what critical value should beused for the hypothesis test carried out in the analysis (and in the tableillustrated with the figures U and V)?

Solution

The relevant distribution for testing effects in ANOVA is the F-distribution, herewith degrees of freedom k− 1 = 2 and n− k = 15. So,

F0.05(2, 15) = 3.682,

found in R as:

qf(p=0.95, df1=2, df2=15)

[1] 3.682

c) Can you with these data demonstrate statistically significant (at signifi-cance level α = 0.05) differences in N-mean values from year to year (bothconclusion and argument must be valid)?

Page 5: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.1 ENVIRONMENT ACTION PLANS 5

Solution

U is the F-statistic

F =CE

=(11.4944− 4.1060)/2

4.1060/15= 13.496,

and V is the p-value (using the F(2,15)-distribution)

P(F > 13.496) = 0.00044,

Or in R

1-pf(q=13.496, df1=2, df2=15)

[1] 0.0004435

So, the answer is, yes, as the number V is less than 0.05.

d) Compute the 90% confidence interval for the single mean difference be-tween year 2011 and year 1998.

Solution

We use the formula for a single pre-planned pairwise post hoc confidence intervals

4.0483− 5.5517± t0.05(15)√

MSE · (1/6 + 1/6),

−1.50± 1.753√

4.1060/15 · (1/3).

In R:

-1.50+c(-1,1)* 1.753*sqrt(4.1060/15*(1/3))

[1] -2.0295 -0.9705

Page 6: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 6

8.2 Environment action plans (part 2)

Exercise 8.2 Environment action plans (part 2)

This exercise is using the same data as the previous exercise, but let us repeatthe description here. To investigate the effect of two recent national Danishaquatic environment action plans the concentration of nitrogen (measured ing/m3) have been measured in a particular river just before the national actionplans were enforced (1998 and 2003) and in 2011. Each measurement is repeated6 times during a short stretch of river. The result is shown in the following table,where we have now added also the variance computed within each group.

N1998 N2003 N20115.01 5.59 3.026.23 5.13 4.765.98 5.33 3.465.31 4.65 4.125.13 5.52 4.515.65 4.92 4.42

Row means 5.5517 5.1900 4.0483Row variances 0.2365767 0.1313200 0.4532967

The data can be read into R and the means and variances computed by thefollowing in R:

nitrogen <- c(5.01, 5.59, 3.02,6.23, 5.13, 4.76,5.98, 5.33, 3.46,5.31, 4.65, 4.12,5.13, 5.52, 4.51,5.65, 4.92, 4.42)

year <- factor(rep(c("1998", "2003", "2011"), 6))tapply(nitrogen, year, mean)

1998 2003 20115.552 5.190 4.048

tapply(nitrogen, year, var)

1998 2003 20110.2366 0.1313 0.4533

Page 7: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 7

mean(nitrogen)

[1] 4.93

a) Compute the three sums of squares (SST, SS(Tr) and SSE) using the threemeans and three variances, and the overall mean (show the formulas ex-plicitly).

Solution

The treatment sum-of-squares SS(Tr) can (Theorem 8.2 Equation 8-6) be computedfrom the three means as

SS(Tr) =k

∑i=1

ni(yi − y)2

= 6 · (5.551667− 4.93)2 + 6 · (5.190000− 4.93)2 + 6 · (4.048333− 4.93)2

= 7.388439.

The residual error sum-of-squares SSE (see Theorem 8.2) is defined by by

SSE =k

∑i=1

ni

∑j=1

(yij − yi)2

So we see that the inner part is the total variance of each group (using Equation 8-15in Theorem 8.4)

SSE =k

∑i=1

(ni − 1)si

and now we can insert the values

SSE = 5s21 + 5s2

2 + 5s23 = 5 · 0.2365767 + 5 · 0.1313200 + 5 · 0.4532967

= 4.105967.

Finally, then (not that we would need this in real data analysis when we have theother two)

SST = SS(Tr) + SSE = 7.388439 + 4.105967 = 11.49441.

b) Find the SST-value in R using the sample variance function var.

Page 8: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 8

Solution

The SST-value is ”almost” just the variance of the observations ignoring the groupinformation, or rather, it is the numerator of this variance calculation, so: n− 1 = 17times the variance will be SST (cf. Theorem 8.4):

D <- data.frame(nitrogen=c(5.01, 5.59, 3.02,

6.23, 5.13, 4.76,5.98, 5.33, 3.46,5.31, 4.65, 4.12,5.13, 5.52, 4.51,5.65, 4.92, 4.42),

year=factor(rep(c("1998", "2003", "2011"), 6)))tapply(D$nitrogen, D$year, mean)

1998 2003 20115.552 5.190 4.048

tapply(D$nitrogen, D$year, var)

1998 2003 20110.2366 0.1313 0.4533

mean(D$nitrogen)

[1] 4.93

17 * var(D$nitrogen)

[1] 11.49

c) Run the ANOVA in R and produce the ANOVA table in R.

Page 9: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 9

Solution

It may be done as follows:

fit <- lm(nitrogen ~ year, data=D)anova(fit)

Analysis of Variance Table

Response: nitrogenDf Sum Sq Mean Sq F value Pr(>F)

year 2 7.39 3.69 13.5 0.00044 ***Residuals 15 4.11 0.27---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

d) Do a complete post hoc analysis, where all the 3 years are compared pair-wise.

Solution

We want to construct the M = 3 · 2/2 = 3 different confidence intervals usingMethod 8.9. As all nis equal 6 in this case, all 3 confidence intervals will have thesame width, and we can use Remark 8.13 and compute the (half) width of the confi-dence intervals, the LSD-value. And since there are 3 multiple comparisons we willuse αBonferroni = 0.05/3 = 0.01667

LSD0.01667 = t1−(0.05/3)/2 ·√

2 · 0.2737/6 = 0.8136.

LSD_0.01667 <- qt(1-(0.05/3)/2, 15) * sqrt(2*0.2737/6)LSD_0.01667

[1] 0.8136

So, if we again study the three group means, we can see that the nitrogen level in2011 is significantly smaller than in 2003 and 1998, whereas the level in 1998 and2003 are not significantly different.

Page 10: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 10

Solution

The differences could also be shown in the following plot, where the compact letter(see page 368 of Chapter 8) display way of telling the story has been added to thebox plot:

plot(D$year, D$nitrogen, xlab="Nitrogen", ylab="Year")text(1:3, c(5.7, 5.4, 4), c("a", "a", "b"), cex=2)

1998 2003 2011

3.0

3.5

4.0

4.5

5.0

5.5

6.0

Nitrogen

Year

a a

b

Hence, the groups are sorted from largest sample mean to lowest, and then thegroups (here years) which are not significantly different share letters.

e) Use R to do model validation by residual analysis.

Solution

The box plot does not indicate clear variance differences (although it can be a bitdifficult to know exactly how different such patterns should be for it to be a problem.Let us check for the normality by doing a normal q-q plot on the residuals:

qqnorm(fit$residuals)qqline(fit$residuals)

Page 11: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.2 ENVIRONMENT ACTION PLANS (PART 2) 11

-2 -1 0 1 2

-1.0

-0.5

0.0

0.5

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

There appears to be no important deviation from normality. For more detailed in-vestigations, see 8.17.

Page 12: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.3 PLASTIC FILM 12

8.3 Plastic film

Exercise 8.3 Plastic film

A company is starting a production of a new type of patch. For the product athin plastic film is to be used. Samples of products were received from 5 possiblesuppliers. Each sample consisted of 20 measurements of the film thickness andthe following data were found:

Average film thickness Sample standard deviationx in µm s in µm

Supplier 1 31.4 1.9Supplier 2 30.6 1.6Supplier 3 30.5 2.2Supplier 4 31.3 1.8Supplier 5 29.2 2.2

From the usual calculations for a one-way analysis of variance the following isobtained:

Source Degrees of freedom Sums of SquaresSupplier 4 SS(Tr) = 62Error 95 SSE = 362.71

a) Is there a significant (α = 5%) difference between the mean film thick-nesses for the suppliers (both conclusion and argument must be correct)?

Solution

The F-test statistics for one-way ANOVA is

Fobs =MS(Tr)

MSE=

SS(Tr)/(k− 1)SSE/(n− k)

==62/4

362.71/95= 4.06,

and the relevant critical value is F0.05(4, 95) to be found in R by: qf(p=0.95, df1=4,df2=95). So the answer is:

Page 13: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.3 PLASTIC FILM 13

Yes, the null hypothesis is rejected, since Fobs = 4.06 is larger than the critical value2.47.

Or we could find the p-value:

1-pf(4.06, 4, 95)

[1] 0.004406

and conclude that it is this is so small, we have strong evidence against the nullhypothesis.

b) Compute a 95% confidence interval for the difference in mean film thick-nesses of Supplier 1 and Supplier 4 (considered as a “single pre-planned”comparison).

Solution

The "ANOVA post hoc" confidence interval is to be used

31.4− 31.3± t0.975(95)

√MSE

(1n1

+1n2

),

where MSE = SSEn−k = 362.71

95 , so since t0.975(95) - to be found in R as: qt(0.975,95), itbecomes

0.1± 1.985

√362.71

95

(1

20+

120

).

0.1+ c(-1,1)* qt(0.975, 95)*sqrt(362.71/(95*10))

[1] -1.127 1.327

So the answer is

0.1± 1.985

√362.71

95

(1

10

)= [−1.2, 1.3].

Page 14: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.4 BRASS ALLOYS 14

8.4 Brass alloys

Exercise 8.4 Brass alloys

When brass is used in a production, the modulus of elasticity, E, of the materialis often important for the functionality. The modulus of elasticity for 6 differentbrass alloys are measured. 5 samples from each alloy are tested. The results areshown in the table below where the measured modulus of elasticity is given inGPa:

Brass alloysM1 M2 M3 M4 M5 M682.5 82.7 92.2 96.5 88.9 75.683.7 81.9 106.8 93.8 89.2 78.180.9 78.9 104.6 92.1 94.2 92.295.2 83.6 94.5 87.4 91.4 87.380.8 78.6 100.7 89.6 90.1 83.8

In an R-run for oneway analysis of variance:

anova( lm(elasmodul ~ alloy) )

the following output is obtained: (however some of the values have been sub-stituted by the symbols A, B, and C)

> anova( lm(elasmodul ~ alloy) )Analysis of Variance Table

Response: elasmodulDf Sum Sq Mean Sq F value Pr(>F)alloy A 1192.51 238.501 9.9446 3.007e-05Residuals B C 23.983

a) What are the values of A, B, and C?

Page 15: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.4 BRASS ALLOYS 15

Solution

The A and B are the degrees of freedom, which in the oneway ANOVA is k− 1 andn− k, where k = 6 is the number of groups and n = 30 is the number of observations.C can be found by

C = SSE = MSE · (n− k) = 23.983 · 24 = 575.59

So the answer is:

A = 5, B = 24 and C = 575.59.

b) The assumptions for using the one-way analysis of variance is (choose theanswer that lists all the assumptions and that NOT lists any unnecessaryassumptions):

1 ) The data must be normally and independently distributed withineach group and the variances within each group should not differsignificantly from each other

2 ) The data must be normally and independently distributed withineach group

3 ) The data must be normally and independently distributed and haveapproximately the same mean and variance within each group

4 ) The data should not bee too large or too small

5 ) The data must be normally and independently distributed withineach group and have approximately the same IQR-value in each group

Solution

It is difficult to make a lot of arguments here, but simply emphasize that only inAnswer 1 all assumptions needed, and no unnecessary assumptions, are listed.

c) Compute a 95% confidence interval for the single pre-planned differencebetween brass alloy 1 and 2.

Page 16: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.4 BRASS ALLOYS 16

Solution

A pre-planned post hoc 95% confidence interval between two groups in a one-wayANOVA is

y1 − y2 ± t0.975

√MSE

(1n1

+1n2

).

So we have to compute the means of the M1 and M2 groups

y1 = 84.62, y2 = 81.14,

(=3.48) and then plug in MSE = 23.983 and n1 = n2 = 5.

3.48+ c(-1,1)* qt(0.975, 24)*sqrt(23.983*2/5)

[1] -2.912 9.872

So the answer is:

3.48± t0.025

√23.983

(25

)= [2.91, 9.87].

Page 17: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.5 PLASTIC TUBES 17

8.5 Plastic tubes

Exercise 8.5 Plastic tubes

Some plastic tubes for which the tensile strength is essential are to be produced.Hence, sample tube items are produced and tested, where the tensile strengthis determined. Two different granules and four possible suppliers are used inthe trial. The measurement results (in MPa) from the trial are listed in the tablebelow:

Granuleg1 g2

Supplier a 34.2 33.1Supplier b 34.8 31.2Supplier c 31.3 30.2Supplier d 31.9 31.6

The following is run in R:

D <- data.frame(strength=c(34.2,34.8,31.3,31.9,33.1,31.2,30.2,31.6),supplier=factor(c("a","b","c","d","a","b","c","d")),granule=factor(c(1,1,1,1,2,2,2,2))

)anova(lm( strength ~ supplier + granule, data=D))

with the following result:

Page 18: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.5 PLASTIC TUBES 18

Analysis of Variance Table

Response: strengthDf Sum Sq Mean Sq F value Pr(>F)

supplier 3 10.0338 3.3446 3.2537 0.1792granule 1 4.6512 4.6512 4.5249 0.1233Residuals 3 3.0837 1.0279

a) Which distribution has been used to find the p-value 0.1792?

Solution

The p-value is from the F-test from a two-way ANOVA using the F(3, 3)-distribution.

Hence the correct answer is:

The F-distribution with the degrees of freedom ν1 = 3 and ν2 = 3.

b) What is the most correct conclusion based on the analysis among the fol-lowing options (use α = 0.05)?

1 ) A significant difference has been found between the variances fromthe analysis of variance

2 ) A significant difference has been found between the means for the 2granules but not for the 4 suppliers

3 ) No significant difference has been found between the means for nei-ther the 4 suppliers nor the 2 granules

4 ) A significant difference has been found between the means for as wellthe 4 suppliers as the 2 granules

5 ) A significant difference has been found between the means for the 4suppliers but not for the 2 granules

Page 19: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.5 PLASTIC TUBES 19

Solution

Since both of the p-values are larger than 0.05 none of the two usual hypothesis tests(of no group difference ) are significant. So the correct answer is:

3 ) No significant difference has been found between the means for neither the 4suppliers nor the 2 granules

Page 20: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 20

8.6 Joining methods

Exercise 8.6 Joining methods

To compare alternative joining methods and materials a series of experimentsare now performed where three different joining methods and four differentchoices of materials are compared.

Data from the experiment are shown in the table below:

Material RowJoining methods 1 2 3 4 averageA 242 214 254 248 239.50B 248 214 248 247 239.25C 236 211 245 243 233.75Column average 242 213 249 246

In an R-run for two-way analysis of variance:

Strength <- c(242,214,254,248,248,214,248,247,236,211,245,243)Joiningmethod <- factor(c("A","A","A","A","B","B","B","B","C","C","C","C"))Material <- factor(c(1,2,3,4,1,2,3,4,1,2,3,4))anova(lm(Strength ~ Joiningmethod + Material))

the following output is generated (where some of the values are replaced by thesymbols A, B, C, D, E and F):

Analysis of Variance Table

Response: StrengthDf Sum Sq Mean Sq F value Pr(>F)

Joiningmethod A 84.5 B C 0.05041 .Material D E 825.00 F 1.637e-05 ***Residuals 6 49.5 8.25---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

a) What are the values for A, B and C?

Page 21: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 21

Solution

A is the degrees of freedom for Joiningmethod, wich is the number of groups/levelsminus 1: A=2 (and then actually the question can allready be answered). BUT also:

B = MS(Joiningmethod) = 84.5/2 = 84.5/2 = 42.25,

and

C =MS(Joiningmethod)

MSE=

42.258.25

= 5.12.

So the answer becomes:

A=2, B=42.25 and C=5.12.

b) What are the conclusions concerning the importance of the two factors inthe experiment (using the usual level α = 5%)?

Solution

We can read off the answer from the two p-values given in the output - one of themis below α (Material p-value) and one is NOT (Joiningmethod p-value).

So the answer is:

Significant differences between materials can be concluded, but not between joiningmethods.

c) Do post hoc analysis for as well the Materials as Joining methods (Confi-dence intervals for pairwise differences and/or hypothesis tests for thosedifferences).

Page 22: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 22

Solution

First we find the treatment and block means (and we print the ANOVA table):

options(digits=2)Strength <- c(242,214,254,248,248,214,248,247,236,211,245,243)Joiningmethod <- factor(c("A","A","A","A","B","B","B","B","C","C","C","C"))Material <- factor(c(1,2,3,4,1,2,3,4,1,2,3,4))

fit <- lm(Strength ~ Joiningmethod + Material)anova(fit)

Analysis of Variance Table

Response: StrengthDf Sum Sq Mean Sq F value Pr(>F)

Joiningmethod 2 85 42 5.12 0.05 .Material 3 2475 825 100.00 0.000016 ***Residuals 6 49 8---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

tapply(Strength, Joiningmethod, mean)

A B C240 239 234

tapply(Strength, Material , mean)

1 2 3 4242 213 249 246

We can find the 0.05/3 (Bonferroni-corrected) LSD-value from the two-way ver-sion of Remark Remark 8.13 (see Section 8.3.3) for the comparison of the 3Joiningmethods:

LSD_bonf <- qt(1-0.05/6, 6) * sqrt(2*8.25/4)LSD_bonf

[1] 6.7

We see that none of the three Joining methods are different from each other (al-though close), which matches fine with the p-value just above 0.05.

Page 23: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 23

And we then do the same for the 4 Materials (that is, 6 pairwise comparisons): Wecan find the 0.05/6 (Bonferroni-corrected) LSD-value from the two-way version ofRemark 8.13 for the comparison of the 4 Materials:

LSD_bonf <- qt(1-0.05/12, 6) * sqrt(2*8.25/3)LSD_bonf

[1] 9.1

Solution

So we see that Material 2 is significantly smaller than each of the other three butnone of these 3 are significantly different from each other:

plot(Strength ~ Material)text(1:4, c(242, 213, 249, 246), c("a", "b", "a", "a"), cex=2, col=2)

1 2 3 4

210

220

230

240

250

Material

Stre

ngth

a

b

a a

d) Do residual analysis to check for the assumptions of the model:

1. Normality

2. Variance homogeneity

Page 24: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 24

Solution

First the residual normality plot:

qqnorm(fit$residuals)qqline(fit$residuals)

-1.5 -0.5 0.5 1.5

-3-2

-10

12

34

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Solution

Then the investigation of variance homogeneity:

plot(Joiningmethod, fit$residuals)

A B C

-3-2

-10

12

34

x

y

Page 25: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.6 JOINING METHODS 25

plot(Material, fit$residuals)

1 2 3 4

-3-2

-10

12

34

x

y

There may some indications of lower variability within Materials 2 and 4 comparedto 1 and 3 (We do not, however, have the methodology (i.e. a test of difference invariance) in the course to deal with this).

Page 26: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.7 REMOULADE 26

8.7 Remoulade

Exercise 8.7 Remoulade

A supermarket has just opened a delicacy department wanting to make its ownhomemade “remoulade” (a Danish delicacy consisting of a certain mixture ofpickles and dressing). In order to find the best recipe a taste experiment wasconducted. 4 different kinds of dressing and 3 different types of pickles wereused in the test. Taste evaluation of the individual “remoulade” versions werecarried out on a continuous scale from 0 to 5.

The following measurement data were found:

Dressing type RowPickles type A B C D averageI 4.0 3.0 3.8 2.4 3.30II 4.3 3.1 3.3 1.9 3.15III 3.9 2.3 3.0 2.4 2.90Column average 4.06 2.80 3.36 2.23

In an R-run for twoway ANOVA:

anova(lm(Taste ~ Pickles + Dressing))

the following output is obtained, where some of the values have been substi-tuted by the symbols A, B, C, D, E and F):

anova(lm(Taste ~ Pickles + Dressing))Analysis of Variance Table

Response: TasteDf Sum Sq Mean Sq F value Pr(F)

Pickles A 0.3267 0.16333 E 0.287133Dressing B 5.5367 1.84556 F 0.002273Residuals C D 0.10556

a) What are the values of A, B, and C?

Page 27: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.7 REMOULADE 27

Solution

From the general definition of the two-way ANOVA table (see page 356 of Chapter8) the degrees of freedom are k − 1, l − 1 and (k − 1)(l − 1), where k = 3 is thenumber of rows, l = 4 is the number of columns.

So the answer is:

A=2, B=3 and C=6.

b) What are the values of D, E, and F?

Solution

E and F are the observed F-statistics, which are:

Fobs,Pickles =MSPickles

MSE=

0.163330.10556

= 1.547

Fobs,Dressing =MSDressing

MSE=

1.845560.10556

= 17.48

Actually, only one answer option has these two values. The D= SSE could be foundfrom the total sum of squares

SST =3

∑i=1

4

∑j=1

(yij − 2.23)2,

and thenD = SSE = SST− 0.3267− 5.5367.

Or more easily using that the degrees of freedom (= (r− 1)(b− 1) = 6) and then

D = SSE = 6 ·MSE = 6 · 0.10556 = 0.633.

In any case, the answer is:

D = 0.633, E = 1.55 and F = 17.48

c) With a test level of α = 5% the conclusion of the analysis, what is theconclusion of the tests?

Page 28: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.7 REMOULADE 28

Solution

We look at the P-values in the ANOVA table, and observe that the Dressing p-valueis BELOW 0.05 and the Pickles p-value is ABOVE 0.05, and hence the answer is:

Only the choice of the dressing type has a significant influence on the taste.

Page 29: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.8 TRANSPORT TIMES 29

8.8 Transport times

Exercise 8.8 Transport times

In a study the transport delivery times for three transport firms are compared,and the study also involves the size of the transported item. For delivery timesin days, the following data found:

The size of the item RowSmall Intermediate Large average

Firm A 1.4 2.5 2.1 2.00Firm B 0.8 1.8 1.9 1.50Firm C 1.6 2.0 2.4 2.00Coulumn average 1.27 2.10 2.13

In R was run:

anova(lm(Time ~ Firm + Itemsize))

and the following output was obtained: (wherein some of the values, however,has been replaced by the symbols A, B, C and D)

Analysis of Variance Table

Response: TimeDf Sum Sq Mean Sq F value Pr(>F)

Firm 2 A B 4.2857 0.10124Itemsize 2 1.44667 C D 0.01929Residuals 4 0.23333 0.05833

a) What is A, B, C and D?

Solution

We have a two-way ANOVA situation. The definition of the terms in the givenANOVA table can all be found in Section 8.3, as:

C =SS(Bl)DF(Bl)

=1.44667

2= 0.723,

D =MS(Bl)

MSE=

CMSE

=0.723

0.05833= 12.4.

Page 30: Comparing means of multiple groups - ANOVA (solutions to ...

Chapter 8 8.8 TRANSPORT TIMES 30

And since B is given easily by A:

B =SS(Tr)DF(Tr)

=A2

,

where DF(Tr) denotes the degrees of freedom for the treatment Firm. We just tofind A = SS(Tr). We could use the defining formula, and that the overall average isy·· = 5.5/3 = 1.83

A = SS(Tr) = 3 · (2− 1.83)2 + 3 · (1.5− 1.83)2 + 3 · (2− 1.83)2 = 0.5,

but more easily we could find B from the F-value as

B = 4.2857 · 0.05833 = 0.25,

and thenA = 2, B = 0.5.

Hence the correct answer is:

A = 0.5, B = 0.25, C =0.723 and D = 12.4.

b) What is the conclusion of the analysis (with a significance level of 5%)?

Solution

We look at the two p-values, and see that the Itemsize p-value is less than 5%("groups significantly different") and that the Firm p-value is NOT ("groups NOTsignificantly different") and hence the correct answer is:

Only the size of the item has a significant influence on the delivery time.