Lecture 29 RCBD & Unequal Cell Sizes - Purdue University29-1 Lecture 29 RCBD & Unequal Cell Sizes...
Transcript of Lecture 29 RCBD & Unequal Cell Sizes - Purdue University29-1 Lecture 29 RCBD & Unequal Cell Sizes...
29-1
Lecture 29
RCBD & Unequal Cell Sizes
STAT 512
Spring 2011
Background Reading
KNNL: 21.1-21.6, Chapter 23
29-2
Topic Overview
• Randomized Complete Block Designs
(RCBD)
• ANOVA with unequal sample sizes
29-3
RCBD
• Randomized complete block designs are
useful whenever the experimental units are
non-homogeneous.
• Grouping EU’s into “blocks” of
homogeneous units helps reduce the SSE
and increase the likelihood that we will be
able to see differences among treatments.
• A “block” consists of a complete replication
of the set of treatments. Blocks and
treatments are assumed not to interact.
29-4
RCBD Model
• Assuming no replication, same as two-way ANOVA
with one observation per cell. No interaction between
block and treatment.
ijk i j ijkY µ ρ τ ε= + + +
where ( )20,iid
ijkNε σ∼ and 0
i iρ τ= =∑ ∑
• We refer to iρ as the block effects and
jτ as the
treatment effects.
• We are really only interested in further analysis on
the treatment effects.
29-5
RCBD Example
• Want to study the effects of three different
sealers on protecting concrete patios from
the weather.
• Ten unsealed patios are available spread
across Indianapolis.
• Separate each patio into three portions, and
apply the treatments (randomly) in such a
way that each patio receives each treatment
for 1/3 of the surface.
29-6
RCBD Example (2)
• Patio (location) is a blocking factor. Probably
the weather will be different in each location;
some patios may be better sheltered (trees,
etc.)
• If patio location is important, then failing to
block on patio location would probably mean
that the MSE will be overestimated.
• Blocking requires DF (9 in this case), but
usually if blocking variable is unimportant,
the MSE with/without blocking will be about
the same.
29-7
RCBD Example (3)
Source DF SS MS F Value
patio 9 900 100 9.0
sealer 2 100 50 5.0
Error 18 180 10
Total 29 1180
• If the ANOVA results are as above, then
blocking is clearly important. If we do not
block here... Source DF SS MS F Value
sealer 2 100 50 1.25
Error 27 1080 40
29-8
RCBD Example (4)
Source DF SS MS F Value
patio 9 108 12 1.2
sealer 2 100 50 5.0
Error 18 180 10
Total 29 388
• If the ANOVA results are as above, then
blocking doesn’t appear to have been as
important. In this case if we fail to block... Source DF SS MS F Value
sealer 2 100 50 4.69
Error 27 288 10.7
29-9
Big Picture
• Failing to block when you should block can
cost you the ability to see treatment effects
• Blocking when there is no need usually often
doesn’t cost much at all (though it can if the
SSBlock is small enough relative to df).
• Blocking effectively requires foresight. An
experimenter must guess what sources of
variation will exist in order to block on them.
29-10
Other Advantages of RCBD
• Reasonably simple analysis to perform
• Effective grouping makes results much more
precise.
• Can drop an entire block or treatment if
necessary, without complicating the
analysis.
• Can deliberately introduce extra variability
into the EU’s to widen the range of validity
of the results without sacrificing precision.
29-11
Some Disadvantages of RCBD
• Missing observations are a complex problem
(since generally each treatment is
represented exactly once in each block)
• Loss of error degrees of freedom
• Additional assumptions are required for the
model (additivity, constant variance across
blocks)
29-12
Multiple Blocking Variables
• Often is the case that you the EU’s have
multiple characteristics on which you
could block.
• Example: Consider the effect of three
treatments for asthma. Might block on
both AGE and GENDER.
• Each treatment would be represented once at
each AGE*GENDER combination.
29-13
More on Blocking...
• Quite a bit more information in Chapter 21
o More than one replicate per block
o Factorial treatments
• Would discuss this and related topics in
STAT 514.
29-14
Unequal Sample Sizes
• Encountered for a variety of reasons
including:
� Convenience – usually if we have an
observational study, we have very little
control over the cell sizes.
� Cost Effectiveness – sometimes the cost of
samples is different, and we may use larger
sample sizes when the cost is less
� In experimental studies, you may start with
a balanced design, but lose that balance if
problems occur.
29-15
Unequal Sample Sizes (2)
• What changes?
� Loss of balance brings “intercorrelation”
among the predictors (i.e., variables are no
longer orthogonal)
� Type I and III SS will be different; typically
Type III SS should be used for testing
� LSMeans should be used for testing
� Standard errors for cell means and for
multiple comparisons will be different
� Confidence intervals will have different
widths
29-16
Example
• Examine the effects of gender (A) and bone
development (B) on the rate of growth
induced by a synthetic growth hormone.
• Three categories of Bone Development
Depression (Severe, Moderate, and Mild)
• We categorize people on this basis after they
are in the study (it is an observational
factor); we wouldn’t want to throw away
data just to keep a balanced design.
• Page 954, growth.sas
29-17
Data / Sample Sizes
Severe Moderate Mild
Male
1.4
2.4
2.2
2.1
1.7
0.7
1.1
Female
2.4 2.5
1.8
2.0
0.5
0.9
1.3
29-18
29-19
Interpretation
• Same as any interaction plot
• Effect seems to be greater if disease is more
severe.
• Effect seems greater for women than men.
• Possibly an interaction. The effect of bone
development is enhanced (greater) for
women as compared to men.
• We aren’t saying anything about
significance here – we’ll do that when we
look at the ANOVA.
29-20
ANOVA Output
Source DF SS MS F Value Pr > F
Model 5 4.474 0.895 5.51 0.0172
Error 8 1.300 0.163
Total 13 5.774
R-Square Root MSE growth Mean
0.774864 0.403113 1.642857
29-21
Type I / III SS
Source DF Type I SS MS F Value Pr > F
gender 1 0.00286 0.00286 0.02 0.8978
bone 2 4.39600 2.19800 13.53 0.0027
gen*bone 2 0.07543 0.03771 0.23 0.7980
Source DF Type III SS MS F Value Pr > F
gender 1 0.1200 0.1200 0.74 0.4152
bone 2 4.1897 2.0949 12.89 0.0031
gen*bone 2 0.0754 0.0377 0.23 0.7980
29-22
Type X SS
• There are actually four relevant types of
sums of squares.
� I – Sequential
� II – Added Last (Observation)
� III – Added Last (Cell)
� IV – Added Last (Empty Cells)
29-23
Types I SS
• Sequential Sums of Squares, appropriate for
equal cell sizes.
• SS(A), SS(B|A), SS(A*B|A,B)
• Each observation is weighted equally, with
the result that treatments are weighted in
proportion to their cell size (if unequal,
then not all treatments get the same weight
in the analysis)
29-24
Types II SS
• Variable Added Last SS, appropriate for
equal cell sizes.
• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
• Each observation is weighted equally
29-25
Types III SS
• Variable Added Last SS, appropriate for
unequal cell sizes.
• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
• Each cell/treatment is weighted equally,
but observations are weighted differently.
Type III SS adjusts for the fact that cell
sizes are different, unequal weighting of
observations.
29-26
Type IV SS
• Variable Added Last SS, necessary if there
are empty cells
• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)
• Like Type III SS but additionally takes into
account the possibility of empty cells.
29-27
Data: Design Chart
Severe Moderate Mild
Male xxx xx xx
Female x xxx xxx
29-28
Example: Type I Hypotheses
Main Effect Gender
3 2 2 1 3 37 7 7 7 7 70 11 12 13 21 22 23:H µ µ µ µ µ µ+ + = + +
Main Effect Bone
3 1 2 3 2 35 5 5 54 40 11 21 12 22 13 23:H µ µ µ µ µ µ+ = + = +
Observations weighted equally, treatment
weighted by sample size.
29-29
Example: Type III Hypotheses
Main Effect Gender
( ) ( )1 13 30 11 12 13 21 22 23:H µ µ µ µ µ µ+ + = + +
Main Effect Bone
( ) ( ) ( )1 1 12 2 20 11 21 12 22 13 23:H µ µ µ µ µ µ+ = + = +
Treatments are weighted equally, observations not
weighted equally.
29-30
General Strategy
• Remember that Type I SS and Type III SS
examine different null hypotheses.
• Type III SS are preferred when sample sizes
are not equal, but can be somewhat
misleading if sample sizes differ greatly.
• Type IV SS are appropriate if there are
empty cells.
• Can obtain Type II/IV SS if necessary by
using /ss1 ss2 ss3 ss4 in MODEL
statement
29-31
Example: Type III SS
Source DF Type III SS MS F Value Pr > F
gender 1 0.1200 0.1200 0.74 0.4152
bone 2 4.1897 2.0949 12.89 0.0031
gen*bone 2 0.0754 0.0377 0.23 0.7980
• The interaction and gender effects are not
significant.
• Now look at comparing different levels of
bone; should not ‘change’ models at this
point, so need to average over gender.
29-32
Multiple Comparisons
• Suppose we keep model as is, and examine
effect of bone.
• Output from MEANS statement (WRONG):
Level of ------------growth-----------
bone N Mean Std Dev
mild 5 0.900 0.31622777
moderate 5 2.020 0.31144823
severe 4 2.100 0.47609523
• These numbers are not adjusted for gender.
29-33
Multiple Comparisons (2)
• Output from LSMeans (means are correctly
adjusted for the level of gender – can think
of these as the means for the “average”
gender).
growth LSMEAN
bone LSMEAN Number
mild 0.90000000 1
moderate 2.00000000 2
severe 2.20000000 3
29-34
Multiple Comparisons
• Example – Severe Case
MEANS: 1.4 2.4 2.2 2.4
2.14
+ + +=
(Sum up all severe cases and divide by number of severe
cases regardless of gender)
LSMEANS:
1.4 2.4 2.22.4
3 2.22
+ ++
=
(For severe cases, get averages for men and women and
then take the average – accounts for gender)
29-35
Examining Differences (LSMEANS)
Least Squares Means for effect bone
Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: growth
i/j 1 2 3
1 0.0072 0.0059
2 0.0072 0.7845
3 0.0059 0.7845
• Mild group is significantly different from
the moderate and severe groups (those
groups are aided more by the hormone)
29-36
Examining Differences (LSMEANS)
bone LSMEAN 95% Confidence Limits
mild 0.900 0.475707 1.324293
moderate 2.000 1.575707 2.424293
severe 2.200 1.663307 2.736693
• Growth rate increased for each group, but
increased by about 1 cm / month more in
the moderate/severe groups than in the
mild group
• Note that the widths of these CI’s are
different due to different sample sizes
(severe is wider, since less observations)
29-37
Upcoming in Lecture 30
• A few more examples of unequal sample
sizes.
• Analysis of Covariance