Lecture 29 RCBD & Unequal Cell Sizes - Purdue University29-1 Lecture 29 RCBD & Unequal Cell Sizes...

29-1

Lecture 29

RCBD & Unequal Cell Sizes

STAT 512

Spring 2011

Background Reading

KNNL: 21.1-21.6, Chapter 23

29-2

Topic Overview

• Randomized Complete Block Designs

(RCBD)

• ANOVA with unequal sample sizes

29-3

RCBD

• Randomized complete block designs are

useful whenever the experimental units are

non-homogeneous.

• Grouping EU’s into “blocks” of

homogeneous units helps reduce the SSE

and increase the likelihood that we will be

able to see differences among treatments.

• A “block” consists of a complete replication

of the set of treatments. Blocks and

treatments are assumed not to interact.

29-4

RCBD Model

• Assuming no replication, same as two-way ANOVA

with one observation per cell. No interaction between

block and treatment.

ijk i j ijkY µ ρ τ ε= + + +

where ( )20,iid

ijkNε σ∼ and 0

i iρ τ= =∑ ∑

• We refer to iρ as the block effects and

jτ as the

treatment effects.

• We are really only interested in further analysis on

the treatment effects.

29-5

RCBD Example

• Want to study the effects of three different

sealers on protecting concrete patios from

the weather.

• Ten unsealed patios are available spread

across Indianapolis.

• Separate each patio into three portions, and

apply the treatments (randomly) in such a

way that each patio receives each treatment

for 1/3 of the surface.

29-6

RCBD Example (2)

• Patio (location) is a blocking factor. Probably

the weather will be different in each location;

some patios may be better sheltered (trees,

etc.)

• If patio location is important, then failing to

block on patio location would probably mean

that the MSE will be overestimated.

• Blocking requires DF (9 in this case), but

usually if blocking variable is unimportant,

the MSE with/without blocking will be about

the same.

29-7

RCBD Example (3)

Source DF SS MS F Value

patio 9 900 100 9.0

sealer 2 100 50 5.0

Error 18 180 10

Total 29 1180

• If the ANOVA results are as above, then

blocking is clearly important. If we do not

block here... Source DF SS MS F Value

sealer 2 100 50 1.25

Error 27 1080 40

29-8

RCBD Example (4)

Source DF SS MS F Value

patio 9 108 12 1.2

sealer 2 100 50 5.0

Error 18 180 10

Total 29 388

• If the ANOVA results are as above, then

blocking doesn’t appear to have been as

important. In this case if we fail to block... Source DF SS MS F Value

sealer 2 100 50 4.69

Error 27 288 10.7

29-9

Big Picture

• Failing to block when you should block can

cost you the ability to see treatment effects

• Blocking when there is no need usually often

doesn’t cost much at all (though it can if the

SSBlock is small enough relative to df).

• Blocking effectively requires foresight. An

experimenter must guess what sources of

variation will exist in order to block on them.

29-10

Other Advantages of RCBD

• Reasonably simple analysis to perform

• Effective grouping makes results much more

precise.

• Can drop an entire block or treatment if

necessary, without complicating the

analysis.

• Can deliberately introduce extra variability

into the EU’s to widen the range of validity

of the results without sacrificing precision.

29-11

Some Disadvantages of RCBD

• Missing observations are a complex problem

(since generally each treatment is

represented exactly once in each block)

• Loss of error degrees of freedom

• Additional assumptions are required for the

model (additivity, constant variance across

blocks)

29-12

Multiple Blocking Variables

• Often is the case that you the EU’s have

multiple characteristics on which you

could block.

• Example: Consider the effect of three

treatments for asthma. Might block on

both AGE and GENDER.

• Each treatment would be represented once at

each AGE*GENDER combination.

29-13

More on Blocking...

• Quite a bit more information in Chapter 21

o More than one replicate per block

o Factorial treatments

• Would discuss this and related topics in

STAT 514.

29-14

Unequal Sample Sizes

• Encountered for a variety of reasons

including:

� Convenience – usually if we have an

observational study, we have very little

control over the cell sizes.

� Cost Effectiveness – sometimes the cost of

samples is different, and we may use larger

sample sizes when the cost is less

� In experimental studies, you may start with

a balanced design, but lose that balance if

problems occur.

29-15

Unequal Sample Sizes (2)

• What changes?

� Loss of balance brings “intercorrelation”

among the predictors (i.e., variables are no

longer orthogonal)

� Type I and III SS will be different; typically

Type III SS should be used for testing

� LSMeans should be used for testing

� Standard errors for cell means and for

multiple comparisons will be different

� Confidence intervals will have different

widths

29-16

Example

• Examine the effects of gender (A) and bone

development (B) on the rate of growth

induced by a synthetic growth hormone.

• Three categories of Bone Development

Depression (Severe, Moderate, and Mild)

• We categorize people on this basis after they

are in the study (it is an observational

factor); we wouldn’t want to throw away

data just to keep a balanced design.

• Page 954, growth.sas

29-17

Data / Sample Sizes

Severe Moderate Mild

Male

1.4

2.4

2.2

2.1

1.7

0.7

1.1

Female

2.4 2.5

1.8

2.0

0.5

0.9

1.3

29-19

Interpretation

• Same as any interaction plot

• Effect seems to be greater if disease is more

severe.

• Effect seems greater for women than men.

• Possibly an interaction. The effect of bone

development is enhanced (greater) for

women as compared to men.

• We aren’t saying anything about

significance here – we’ll do that when we

look at the ANOVA.

29-20

ANOVA Output

Source DF SS MS F Value Pr > F

Model 5 4.474 0.895 5.51 0.0172

Error 8 1.300 0.163

Total 13 5.774

R-Square Root MSE growth Mean

0.774864 0.403113 1.642857

29-21

Type I / III SS

Source DF Type I SS MS F Value Pr > F

gender 1 0.00286 0.00286 0.02 0.8978

bone 2 4.39600 2.19800 13.53 0.0027

gen*bone 2 0.07543 0.03771 0.23 0.7980

Source DF Type III SS MS F Value Pr > F

gender 1 0.1200 0.1200 0.74 0.4152

bone 2 4.1897 2.0949 12.89 0.0031

gen*bone 2 0.0754 0.0377 0.23 0.7980

29-22

Type X SS

• There are actually four relevant types of

sums of squares.

� I – Sequential

� II – Added Last (Observation)

� III – Added Last (Cell)

� IV – Added Last (Empty Cells)

29-23

Types I SS

• Sequential Sums of Squares, appropriate for

equal cell sizes.

• SS(A), SS(B|A), SS(A*B|A,B)

• Each observation is weighted equally, with

the result that treatments are weighted in

proportion to their cell size (if unequal,

then not all treatments get the same weight

in the analysis)

29-24

Types II SS

• Variable Added Last SS, appropriate for

equal cell sizes.

• SS(A|B,A*B), SS(B|A,A*B), SS(A*B|A,B)

• Each observation is weighted equally

29-25

Types III SS

• Variable Added Last SS, appropriate for

unequal cell sizes.


• Each cell/treatment is weighted equally,

but observations are weighted differently.

Type III SS adjusts for the fact that cell

sizes are different, unequal weighting of

observations.

29-26

Type IV SS

• Variable Added Last SS, necessary if there

are empty cells


• Like Type III SS but additionally takes into

account the possibility of empty cells.

29-27

Data: Design Chart

Severe Moderate Mild

Male xxx xx xx

Female x xxx xxx

29-28

Example: Type I Hypotheses

Main Effect Gender

3 2 2 1 3 37 7 7 7 7 70 11 12 13 21 22 23:H µ µ µ µ µ µ+ + = + +

Main Effect Bone

3 1 2 3 2 35 5 5 54 40 11 21 12 22 13 23:H µ µ µ µ µ µ+ = + = +

Observations weighted equally, treatment

weighted by sample size.

29-29

Example: Type III Hypotheses

Main Effect Gender

( ) ( )1 13 30 11 12 13 21 22 23:H µ µ µ µ µ µ+ + = + +

Main Effect Bone

( ) ( ) ( )1 1 12 2 20 11 21 12 22 13 23:H µ µ µ µ µ µ+ = + = +

Treatments are weighted equally, observations not

weighted equally.

29-30

General Strategy

• Remember that Type I SS and Type III SS

examine different null hypotheses.

• Type III SS are preferred when sample sizes

are not equal, but can be somewhat

misleading if sample sizes differ greatly.

• Type IV SS are appropriate if there are

empty cells.

• Can obtain Type II/IV SS if necessary by

using /ss1 ss2 ss3 ss4 in MODEL

statement

29-31

Example: Type III SS

Source DF Type III SS MS F Value Pr > F

gender 1 0.1200 0.1200 0.74 0.4152

bone 2 4.1897 2.0949 12.89 0.0031

gen*bone 2 0.0754 0.0377 0.23 0.7980

• The interaction and gender effects are not

significant.

• Now look at comparing different levels of

bone; should not ‘change’ models at this

point, so need to average over gender.

29-32

Multiple Comparisons

• Suppose we keep model as is, and examine

effect of bone.

• Output from MEANS statement (WRONG):

Level of ------------growth-----------

bone N Mean Std Dev

mild 5 0.900 0.31622777

moderate 5 2.020 0.31144823

severe 4 2.100 0.47609523

• These numbers are not adjusted for gender.

29-33

Multiple Comparisons (2)

• Output from LSMeans (means are correctly

adjusted for the level of gender – can think

of these as the means for the “average”

gender).

growth LSMEAN

bone LSMEAN Number

mild 0.90000000 1

moderate 2.00000000 2

severe 2.20000000 3

29-34

Multiple Comparisons

• Example – Severe Case

MEANS: 1.4 2.4 2.2 2.4

2.14

+ + +=

(Sum up all severe cases and divide by number of severe

cases regardless of gender)

LSMEANS:

1.4 2.4 2.22.4

3 2.22

+ ++

=

(For severe cases, get averages for men and women and

then take the average – accounts for gender)

29-35

Examining Differences (LSMEANS)

Least Squares Means for effect bone

Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: growth

i/j 1 2 3

1 0.0072 0.0059

2 0.0072 0.7845

3 0.0059 0.7845

• Mild group is significantly different from

the moderate and severe groups (those

groups are aided more by the hormone)

29-36

Examining Differences (LSMEANS)

bone LSMEAN 95% Confidence Limits

mild 0.900 0.475707 1.324293

moderate 2.000 1.575707 2.424293

severe 2.200 1.663307 2.736693

• Growth rate increased for each group, but

increased by about 1 cm / month more in

the moderate/severe groups than in the

mild group

• Note that the widths of these CI’s are

different due to different sample sizes

(severe is wider, since less observations)

29-37

Upcoming in Lecture 30

• A few more examples of unequal sample

sizes.

• Analysis of Covariance

Lecture 29 RCBD & Unequal Cell Sizes - Purdue University29-1 Lecture 29 RCBD & Unequal Cell Sizes...

Documents

Transcript of Lecture 29 RCBD & Unequal Cell Sizes - Purdue University29-1 Lecture 29 RCBD & Unequal Cell Sizes...