Analysis of Variance (ANOVA)
description
Transcript of Analysis of Variance (ANOVA)
EGR 252 Spring 2009 - Ch.13 Part 1 1
Analysis of Variance (ANOVA)
A single-factor ANOVA can be used to compare more than two means.For example, suppose a manufacturer of paper used for grocery bags is concerned about the tensile strength of the paper. Product engineers believe that tensile strength is a function of the hardwood concentration and want to test several concentrations for the effect on tensile strength.
If there are 2 different hardwood concentrations (say, 5% and 15%), then a z-test or t-test is appropriate:
H0: μ1 = μ2
H1: μ1 ≠ μ2
EGR 252 Spring 2009 - Ch.13 Part 1 2
Comparing More Than Two Means
What if there are 3 different hardwood concentrations (say, 5%, 10%, and 15%)?
H0: μ1 = μ2 H0: μ1 = μ3 H0: μ2 = μ3
H1: μ1 ≠ μ2 H1: μ1 ≠ μ3 H1: μ2 ≠ μ3
How about 4 different concentrations (say, 5%, 10%, 15%, and 20%)?
All of the above, PLUS
H0: μ1 = μ4 H0: μ2 = μ4 H0: μ3 = μ4
H1: μ1 ≠ μ4 H1: μ2 ≠ μ4 H1: μ3 ≠ μ4
What about 5 concentrations? 10?
and and
and and
EGR 252 Spring 2009 - Ch.13 Part 1 3
Comparing Multiple Means - Type I Error Suppose α = 0.05 P(Type 1 error) = 0.05
(1 – α) = P (accept H0 | H0 is true) = 0.95Conducting multiple t-tests increases the
probability of a Type 1 errorThe greater the number of t-tests, the
greater the error probability 4 concentrations: (0.95)4 = 0.814
5 concentrations: (0.95)5 = 0.774
10 concentrations: (0.95)10 = 0.599Making the comparisons simultaneously
(as in an ANOVA) reduces the error back to 0.05
EGR 252 Spring 2009 - Ch.13 Part 1 4
Analysis of Variance (ANOVA) Terms
Independent variable: that which is variedTreatment Factor
Level: the selected categories of the factor In a single–factor experiment there are a levels
Dependent variable: the measured resultObservations Replicates (N observations in the total experiment)
Randomization: performing experimental runs in random order so that other factors don’t influence results.
EGR 252 Spring 2009 - Ch.13 Part 1 5
The Experimental Design
Suppose a manufacturer is concerned about the tensile strength of the paper used to produce grocery bags. Product engineers believe that tensile strength is a function of the hardwood concentration and want to test several concentrations for the effect on tensile strength. Six specimens were made at each of the 4 hardwood concentrations (5%, 10%, 15%, and 20%). The 24 specimens were tested in random order on a tensile test machine.
Terms Factor: Hardwood Concentration Levels: 5%, 10%, 15%, 20% a = 4 N = 24
EGR 252 Spring 2009 - Ch.13 Part 1 6
The Results and Partial Analysis
The experimental results consist of 6 observations at each of 4 levels for a total of N = 24 items. To begin the analysis, we calculate the average and total for each level.
Hardwood Observations
Concentration 1 2 3 4 5 6 Totals Averages
5% 7 8 15 11 9 10 60 10.00
10% 12 17 13 18 19 15 94 15.67
15% 14 18 19 17 16 18 102 17.00
20% 19 25 22 23 18 20 127 21.17
383 15.96
EGR 252 Spring 2009 - Ch.13 Part 1 7
To determine if there is a difference in the response at the 4 levels …
1. Calculate sums of squares
2. Calculate degrees of freedom
3. Calculate mean squares
4. Calculate the F statistic
5. Organize the results in the ANOVA table
6. Conduct the hypothesis test
EGR 252 Spring 2009 - Ch.13 Part 1 8
Calculate the sums of squares
TreatmentTotalError
a
i
iTreatment
a
i
n
jijTotal
SSSSSS
N
y
n
ySS
N
yySS
2
1
2
2
1 1
2
96.51224
383109111587
2222222 TotalSS
79.382
24
383
6
1271029460 22222
TreatmentSS
1667.130 TreatmentTotalError SSSSSS
EGR 252 Spring 2009 - Ch.13 Part 1 9
Additional Calculations
Calculate Degrees of Freedomdftreat = a – 1 = 3
df error = a(n – 1) = 20
dftotal = an – 1 = 23
Mean Square, MS = SS/df
MStreat = 382.7917/3 = 127.5972
MSE = 130.1667 /20 = 6.508333
Calculate F = MStreat / MSError = 127.58 / 6.51 = 19.61
EGR 252 Spring 2009 - Ch.13 Part 1 10
Organizing the Results
Build the ANOVA table
Determine significance
fixed α-level compare to Fα,a-1, a(n-1)
p – value find p associated with this F with degrees of freedom a-1, a(n-1)
ANOVA
Source of Variation SS df MS F P-value F crit
Treatment 382.79 3 127.6 19.6 3.6E-06 3.1
Error 130.17 20 6.5083
Total 512.96 23
EGR 252 Spring 2009 - Ch.13 Part 1 11
Conduct the Hypothesis Test
Null Hypothesis: The mean tensile strength is the same for each hardwood concentration.
Alternate Hypothesis: The mean tensile strength differs for at least one hardwood concentration
Compare Fcrit to Fcalc
Draw the graphic
State your decision with respect to the null hypothesis
State your conclusion based on the problem statement
EGR 252 Spring 2009 - Ch.13 Part 1 12
Hypothesis Test Results
Null Hypothesis: The mean tensile strength is the same for each hardwood concentration.
Alternate Hypothesis: The mean tensile strength differs for at least one hardwood concentration
Fcrit less than Fcalc
Draw the graphic
Reject the null hypothesis
Conclusion: The mean tensile strength differs for at least one hardwood concentration.
EGR 252 Spring 2009 - Ch.13 Part 1 13
Hypothesis Test Results
Null Hypothesis: The mean tensile strength is the same for each hardwood concentration.
Alternate Hypothesis: The mean tensile strength differs for at least one hardwood concentration
Fcrit less than Fcalc
Draw the graphic
Reject the null hypothesis
Conclusion: The mean tensile strength differs for at least one hardwood concentration.
EGR 252 Spring 2009 - Ch.13 Part 1 14
Post-hoc Analysis: “Hand Calculations”
1. Calculate and check residuals, eij = Oi - Ei
plot residuals vs treatments normal probability plot
2. Perform ANOVA and determine if there is a difference in the means
3. If the decision is to reject the null hypothesis, identify which means are different using Tukey’s procedure:
4. Model: yij = μ + αi + εij
nMSkqyy Esr /1),,()(
EGR 252 Spring 2009 - Ch.13 Part 1 15
Graphical Methods - Computer Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev +---------+---------+---------+---------
5% 6 10.000 2.828 (----*----)
10% 6 15.667 2.805 (----*-----)
15% 6 17.000 1.789 (----*-----)
20% 6 21.167 2.639 (-----*----)
+---------+---------+---------+---------
8.0 12.0 16.0 20.0
Data
20%15%10%5%
25
20
15
10
5
Boxplot of 5% , 10% , 15% , 20%
EGR 252 Spring 2009 - Ch.13 Part 1 16
Numerical Methods - Computer
Tukey’s test Duncan’s Multiple Range test Easily performed in Minitab
Tukey 95% Simultaneous Confidence Intervals (partial results)
10% subtracted from:
Lower Center Upper ----+---------+---------+---------+-----15% -2.791 1.333 5.458 (-----*-----)
20% 1.376 5.500 9.624 (-----*-----)
----+---------+---------+---------+-----
-7.0 0.0 7.0 14.0
EGR 252 Spring 2009 - Ch.13 Part 1 17
Blocking
Creating a group of one or more people, machines, processes, etc. in such a manner that the entities within the block are more similar to each other than to entities outside the block.
Balanced design: n = 1 for each treatment/block category
Model: yij = μ + α i + βj + εij
EGR 252 Spring 2009 - Ch.13 Part 1 18
Example:Robins Air Force Base uses CO2 to strip paint from F-15’s. You have been asked to design a test to determine the optimal pressure for spraying the CO2. You realize that there are five machines that are being used in the paint stripping operation. Therefore, you have designed an experiment that uses the machines as blocking variables. You emphasized the importance of balanced design and a random order of testing. The test has been run with these results (values are minutes to strip one fighter):
EGR 252 Spring 2009 - Ch.13 Part 1 19
ANOVA: One-Way with Blocking
1. Construct the ANOVA table
Where,
SSBSSASSTSSE
yykSSB
yybSSA
yySST
b
jj
k
ii
k
i
b
jij
1
2
1
2
1 1
2
)(
)(
)(
EGR 252 Spring 2009 - Ch.13 Part 1 20
Blocking Example
Your turn: fill in the blanks in the following ANOVA table (from Excel):
2. Make decision and draw conclusions:
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 89.733 2 44.867 8.492 0.0105 4.458968
Columns 77.733 ___ _____ ____ 0.0553 _______
Error 42.267 8 5.2833
Total 209.73 ___
EGR 252 Spring 2009 - Ch.13 Part 1 21
Two-Way ANOVA
Blocking is used to keep extraneous factors from masking the effects of the one treatment you are interested in studying.
A two-way ANOVA is used when you are interested in determining the effect of two treatments.
Model: yijk = μ + α i + βj + (α β)ijk + εij
α is the main effect of Treatment Aβ is the main effect of Treatment BThe α β component is the interaction effect
EGR 252 Spring 2009 - Ch.13 Part 1 22
Two-Way ANOVA w/ Replication Your fame as an experimental design expert grows. You
have been called in as a consultant to help the Pratt and Whitney plant in Columbus determine the best method of applying the reflective stripe that is used to guide the Automated Guided Vehicles (AGVs) along their path. There are two ways of applying the stripe (paint and coated adhesive tape) and three types of flooring (linoleum and two types of concrete) in the facilities using the AGVs. You have set up two identical “test tracks” on each type of flooring and applied the stripe using the two methods under study. You run 3 replications in random order and count the number of tracking errors per 1000 ft of track. The results are as follows:
EGR 252 Spring 2009 - Ch.13 Part 1 23
Two-Way ANOVA Example
Analysis is the similar to the one-way ANOVA; however we are now concerned with interaction effects
The two-way ANOVA table displays three calculated F values
EGR 252 Spring 2009 - Ch.13 Part 1 24
Two-Way ANOVA
EGR 252 Spring 2009 - Ch.13 Part 1 25
Your Turn
Fill in the blanks …
What does this mean?
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 0.4356 1
_____ 2.3976 0.14748 4.74722
Columns 4.48 2 2.24 12.33 0.00123 3.88529
Interaction 0.9644 ___ 0.4822 _____ 0.11104 3.88529
Within 2.18 ___ 0.1817
Total 8.06 17
EGR 252 Spring 2009 - Ch.13 Part 1 26
What if Interaction Effects are Significant?
For example, suppose a new test was run using different types of paint and adhesive, with the following results:
Linoleum Concrete I Concrete IIPaint 10.7 10.8 12.2
10.9 11.1 12.311.3 10.7 12.5
Adhesive 11.2 11.9 10.911.6 12.2 11.610.9 11.7 11.9
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 0.109 1 0.1089 1.071 0.3211 4.7472
Columns 1.96 2 0.98 9.639 0.0032 3.8853
Interaction 2.831 2 1.4156 13.92 0.0007 3.8853
Within 1.22 12 0.1017
Total 6.12 17
EGR 252 Spring 2009 - Ch.13 Part 1 27
Understanding Interaction EffectsGraphical methods:
graph means vs factors identify where the effect will change the result for one factor
based on the value of the other.
Interaction
10.5
11
11.5
12
12.5
0 1 2 3 4
Floor Type
Tra
ckin
g E
rro
rs
Paint
Adhesive