Post on 21-Dec-2015
Stat 470-4
• Today: Multiple comparisons, diagnostic checking, an example
• After these notes, we will have looked at 1.1-1.3 (skip figures 1.2 and 1.3, last two paragraphs of section 1.3), 1.6 (skip matrix notation and constraints), 1.7 (Tukey method only) and 1.9 (ignore H matrix notation on page 35), 2.1, 2.2
• We will not do 1.5 nor 1.8
• Assignment 1:
Multiple Comparisons
• In previous example, we saw that there was a significant treatment effect…so what?
• If an ANOVA is conducted and the analysis suggests that there is a significant treatment effect, then a reasonable question to ask is
Multiple Comparisons
• Would like to see if there is a difference between treatments i and j
• Can use two-sample t-test statistic to do this
• For testing reject if
• Perform many of these tests
jiAji HH : versus:0
Diagnostic Checking – Residual Analysis
• To support the assumptions on which the analysis is based, we need to check for – have all effects been
captured?
– unequal variances
– non-Normality
– sequence effects
• Should do this before hypothesis testing and multiple comparisons
T1
T2
T3
T4
3
4
5
6
7
8
Treatment
y
Dotplots of y by Treatmen(group means are indicated by lines)
The data plot (limited data) shows no strong evidence of non-Normality or unequal variances
Diagnostic Checking
• ANOVA model:
• Predicted response: , where–
–
• Residual:
• Estimates error
ijiijy
iiy ˆˆˆ
..ˆ y)(ˆ ... yyi
)ˆ( iijij yyr
Diagnostic Plots
• Errors are assumed to be normally distributed– Useful plot
• Errors assumed to be independent– Useful plot
• Equal variances in each group– Useful plot
Normality Check
• Dot plot or histogram of residuals
• Normal probability plot of residuals (via software or by hand - see class handout)
Normal Q-Q Plot of Residual for RESPONSE
Observed Value
.6.4.2-.0-.2-.4-.6
Exp
ect
ed
No
rma
l Va
lue
.6
.4
.2
-.0
-.2
-.4
-.6
Independence Check
• Plot residuals in the time sequence in which the data were collected
• X-axis denotes the sequence, Y-axis denotes the residual values
• Should observe
Independence Check
• Suppose the sequence of the observations (going across rows from top to bottom in the tabled data) is 1, 2, 11, 9, 5, 7, 6, 3, 4, 12, 10, 8
Time Plot of residuals
Sequence
14121086420
Re
sid
ua
l fo
r R
ES
PO
NS
E
.4
.2
-.0
-.2
-.4
-.6
Equal Variances
Plot of Residual Versus Treatment
Packaging
5.04.03.02.01.00.0
Re
sid
ua
l fo
r R
ES
PO
NS
E
.4
.2
-.0
-.2
-.4
-.6
Comments
• The F-test is fairly robust – it is not very sensitive to departures from the assumption of Normal distributions.
• Often, simple transformations, such as the logarithm or square root, can make the Normal distribution assumption and the equal variance assumption more appropriate (Chapter 2)
Summary: Completely Randomized Design, One-Way ANOVA
• Method: Random assignment of treatments to experimental units
• ANOVA: Compare variation among treatments to variation within treatments to assess evidence of a difference among treatments
• Investigate and identify differences among Treatments, if any. Act on the findings
Comment: One-Way Model
• The one-way model,yij = + i + eij, eij ~NID(0, 2) can be and is applied to data obtained in ways other than a completely randomized design
• Example: starting salaries for MBAs at different companies. Company is not a treatment that is applied to experimental units
• Analyzing the data according to the above model can answer whether apparent differences between companies are real or could be just due to chance.
• The randomness involved comes from the randomness of the hiring and salary-determination processes, not the random assignment of treatments to experimental units
General Linear Model
• ANOVA model can be viewed as a special case of the general linear model or regression model
• Suppose have response, y, which is thought to be related to p predictors (sometimes called explanatory variables or regressors)
• Predictors: x1, x2,…,xp
• Model:
Example: Rainfall (Exercise 2.16)
• In winter, a plastic rain gauge cannot be used to collect precipitation because it will freeze and crack. Instead, metal cans are used to collect snowfall and the snow is allowed to melt indoors. The water is then poured into a plastic rain gauge and a measurement recorded. An estimate of snowfall is obtained by multiplying this measurement by 0.44.
• One observer questions this and decides to collect data to test the validity of this approach
• For each rainfall in a summer, she measures: (i) rainfall using a plastic rain gauge, (ii) using a metal can
• What is the current model being used?
Example: Rainfall (Exercise 2.16)
Scatter Plot of Rainfall Data
Rain Collected in Metal Can (x)
76543210
Ra
in C
olle
cte
d in
Pla
stic
Ga
ug
e4.0
3.0
2.0
1.0
0.0
Example: Rainfall (Exercise 2.16)
• Seems to be a linear relationship
• Will use regression to establish linear relationship between x and y
• What should the slope be?
Example: Rainfall (Exercise 2.16)
Coefficientsa
3.579E-02 .012 2.931 .005
.444 .006 .995 76.264 .000
(Constant)
X
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Ya.
ANOVAb
25.860 1 25.860 5816.213 .000a
.245 55 .004
26.105 56
Regression
Residual
Total
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), Xa.
Dependent Variable: Yb.
Model Summaryb
.995a .991 .990 .06668Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Xa.
Dependent Variable: Yb.