15. ANOVA p1

65
Assignment #7 Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20 th by 2pm in your TA’s homework box

Transcript of 15. ANOVA p1

Page 1: 15. ANOVA p1

Assignment #7

Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20th by 2pm in your TA’s homework box

Page 2: 15. ANOVA p1

Lab Report

•  Posted on web-site •  Dates

–  Rough draft due to TAs homework box on Monday Nov. 16th –  Rough draft returned in your lab section the week of Nov. 23rd –  Final draft due at start of your registered lab section the week of Nov. 30th

•  10% of course grade –  Rough Draft - 5% –  Final draft - 5% –  If you’re happy with your rough draft mark, you can tell your TA to use it for

the final draft

•  Read the “Writing a Lab Report” section of your lab notebook for guidance!!

Page 3: 15. ANOVA p1

Reading

For Today: Chapter 15 For Thursday: Chapter 15

Page 4: 15. ANOVA p1

In-class Exercise Do people use more paper when they know it will be recycled? • People given paper and told to test scissors. • Recycling bin either present or not No recycling bin: 4,4,4,4,4,4,4,5,8,9,9,9,9,12,12,13,14,14,14,14,15,23 Recycling bin: 4,5,8,8,8,9,9,9,12,14,14,15,16,19,23,28,40,43,129,130 1.  Make histograms and identify options for test 2.  Choose an test that you can do in class and conduct it

Page 5: 15. ANOVA p1

No Bin Present

Freq

uenc

y

0 20 40 60 80 100 120

02

46

8

Bin Present

Freq

uenc

y

0 20 40 60 80 100 120

0.0

1.0

2.0

3.0

Page 6: 15. ANOVA p1

Mann-Whitney U-Test

H0: The two groups do not differ in paper use HA: The two groups differ in paper use

Page 7: 15. ANOVA p1

Value   4   4   4   4   4   4   4   5   8   9   9   9   9   12  12  13   14   14   14   14   15   23  Rank   4.5   4.5   4.5   4.5   4.5   4.5   4.5   9.5   12.5  18  18  18  18  23  23  25  28.5  28.5  28.5  28.5  32.5  36.5  

Value   4   5   8   8   8   9   9   9   12   14   14   15   16   19   23   28   40   43   129   130  Rank   4.5   9.5   12.5   12.5   12.5   18   18   18   23   28.5   28.5   32.5   34   35   36.5   38   39   40   41   42  

No Bin Present:

Bin Present:

n1= 22 R1=379.5

n2= 20 R2=523.5

Page 8: 15. ANOVA p1

Calculating the test statistic, U

U1 = n1n2 +n1 n1+1( )2

− R1

U2 = n1n2 −U1

U1 = 22(20)+22 23( )2

−379.5= 313.5

U2 = 22(20)−313.5=126.5

U = 313.5

Page 9: 15. ANOVA p1

Mann-Whitney: Large sample approximation For n1 and n2 both greater than 10, use

Z =2U − n1n2

n1n2 n1+ n2 +1( ) / 3

Z = 2(313.5)− 22(20)22(20) 22+ 20+1( ) / 3

= 2.34

P = 2(0.00964) = 0.01928

P<0.05 so we reject the null hypothesis. People use more paper when there is a recycling bin

Page 10: 15. ANOVA p1

Chapter 14 Review

Page 11: 15. ANOVA p1

Goals of experiments

•  Eliminate bias

•  Reduce sampling error (increase precision and power)

Page 12: 15. ANOVA p1

Precise Imprecise

Biased

Unbiased

Page 13: 15. ANOVA p1

Design features that reduce bias

•  Controls

•  Random assignment to treatments

•  Blinding

Page 14: 15. ANOVA p1

Controls

A group which is identical to the experimental treatment in all respects

aside from the treatment itself.

Page 15: 15. ANOVA p1

Randomization

The random assignment of treatments to

units in an experimental study

Breaks the association between possible confounding variables and the

explanatory variable.

Page 16: 15. ANOVA p1

Randomization

Supplemental Oxygen (Explanatory variable)

Survive Mt. Everest (Response variable)

Preparedness (Confounding variable)

?

Page 17: 15. ANOVA p1

Blinding

•  Preventing knowledge of patient and/or experimenter of which treatment is given to whom –  Single blind – blind patient –  Double blind – blind patient and experimenter

•  Unblinded studies usually find much larger effects (sometimes threefold higher), showing the bias that results from lack of blinding

Page 18: 15. ANOVA p1

Reducing sampling error

t =Y 1 −Y 2

sp2 1

n1+1n2

#

$ %

&

' (

Increasing the signal to noise ratio

"Signal"

"Noise"

Page 19: 15. ANOVA p1

Reducing sampling error Increasing the signal to noise ratio

If the "noise" is smaller, it is easier to detect a given "signal".

Can be achieved with smaller s or larger n. €

sp2 1n1

+1n2

"

# $

%

& ' .

Page 20: 15. ANOVA p1

Design features that reduce the effects of sampling error

•  Replication

•  Balance

•  Blocking

•  Extreme treatments

Page 21: 15. ANOVA p1

Replication

The application of every treatment to

multiple, independent experimental units

Page 22: 15. ANOVA p1

Balance

In a balanced experimental design, all

treatments have equal sample size.

Page 23: 15. ANOVA p1

Balance increases precision

SEY 1 − Y 2

= sp2 1

n1+1n2

#

$ %

&

' ( .

For a given total sample size (n1+n2), the standard error is smallest when n1=n2.

Page 24: 15. ANOVA p1

Blocking

The grouping of experimental units that have similar properties.

Within each block, treatments are

randomly assigned to experimental units.

Page 25: 15. ANOVA p1

Extreme Treatments

•  Treatment effects are easiest to detect when they are large.

•  Stronger treatments can increase the signal-to-noise ratio.

•  Caution: effects may not scale linearly

Page 26: 15. ANOVA p1

Experiments with more than one factor

A factor is a single treatment variable whose effects are of interest to the researcher Multiple factors to: •  Make more efficient use of money and

resources •  Estimate effects of interaction between

factors

Page 27: 15. ANOVA p1

Interaction between explanatory variables

The effect of one variable depends on the

state of a second variable

Page 28: 15. ANOVA p1

Example of factorial design and interaction

Page 29: 15. ANOVA p1

What if we can’t do experimental studies?

Observational studies are still useful to detect patterns and generate hypotheses

Best observational studies Minimize bias: •  Controls •  Randomization •  Blinding Minimize sampling error: •  Replication •  Balance •  Blocking •  Extreme treatments

Page 30: 15. ANOVA p1

Matching Every individual in the treatment group is paired with a control individual having the same or very similar values for the suspected confounding variables Does not account for all confounding variables (like randomization does), but only those used to match participants.

Page 31: 15. ANOVA p1

Chapter 15: ANOVA

Comparing the means of more than two groups

Page 32: 15. ANOVA p1

Analysis of variance (ANOVA)

•  Like a t-test, but can compare more than two groups

•  Asks whether any of two or more means is different from any other.

•  In other words, is the variance among groups greater than 0?

Page 33: 15. ANOVA p1

Null hypothesis for simple ANOVA

• H0 : Variance among groups = 0

OR

• H0 : µ1 = µ2 = µ3 = µ4 = ... µk

Page 34: 15. ANOVA p1

X1 X2 X3

Freq

uenc

yµ1 = µ2 = µ3

X1 X2 X3

Freq

uenc

y

Not all µ's equal

H0: all populations have equal means

HA: at least one population mean is different.

Page 35: 15. ANOVA p1

ANOVA's v. t-tests

An ANOVA with 2 groups is mathematically equivalent to a two-tailed 2-sample t-test.

Page 36: 15. ANOVA p1

µ = 67.4

σ = 3.9

µY = µ = 67.4

σY =σn

=3.95

=1.7

NSampling

distribution of Y

Sampling distribution of Y

If we draw multiple samples from the same population, we are also drawing sample means

from an expected distribution.

Page 37: 15. ANOVA p1

σ x =σ x

n

Under the null hypothesis, the sample mean of each group should vary because of sampling error. The standard deviation of sample means, when the true mean is constant, is the standard error:

Page 38: 15. ANOVA p1

σx 2 =

σx2

n

Squaring the standard error, the variance among groups due to sampling error should be:

In ANOVA, we work with variances rather than standard deviations.

Page 39: 15. ANOVA p1

σ x 2 =

σ x2

n+ Variance µi[ ]

If the null hypothesis is not true, the variance among groups should be equal to the variance due to sampling error plus the real variance among population means.

Page 40: 15. ANOVA p1

With ANOVA, we test whether the variance among true group means is greater than zero. We do this by asking whether the observed variance among groups is greater than expected by chance (assuming the null is true):

σ x 2 >

σ x2

n?

Page 41: 15. ANOVA p1

σ x 2 >

σ x2

n?

nσ x 2 > σ x

2 ?

nσ x 2 is estimated by the

“Mean Squares Group”

σ x2 is the variance within groups,

estimated by the “Mean Squares Error”

MSgroup

MSerror

Population parameters Estimates from sample

Page 42: 15. ANOVA p1

Example

Treatment Mean Standard deviation n Control -0.3088 0.6176 8 Knees -0.3357 0.7908 7 Eyes -1.5514 0.7063 7

It is know that circadian rhythm is affected by the schedule of light to the eyes. Can circadian rhythm be affected by the schedule of light to the back of the knees?

Page 43: 15. ANOVA p1

ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P

Groups (treatment) 7.224 2 3.6122 7.29 0.004

Error 9.415 19 0.4955

Total 16.639 21

An ANOVA table is a convenient way to keep track of the important calculations.

Scientific papers often report ANOVA results with

ANOVA tables.

Page 44: 15. ANOVA p1

Partitioning the sum of squares

Yij −Y = (Yi −Y )+ (Yij −Yi )For each data point:

Page 45: 15. ANOVA p1

Partitioning the sum of squares

SStotal = ΣiΣ j (Yij −Y )2 = Σini (Yi −Y )

2 +ΣiΣ j (Yij −Yi )2

For the total variation in the dataset:

Page 46: 15. ANOVA p1

n σ x 2 + Variance µi[ ]( )

MSgroup

Mean squares group

Abbreviation:

Estimates this parameter:

Formula:

MSgroups =SSgroupsdfgroups

Page 47: 15. ANOVA p1

Mean squares group

SSgroup = ni X i − X ( )2∑

dfgroup = k-1

MSgroups =SSgroupsdfgroups

Page 48: 15. ANOVA p1

Mean squares error

Abbreviation:

Estimates this parameter:

Formula:

MSerror =SSerrordferror

σ x2

MSerror

Page 49: 15. ANOVA p1

Mean squares error Error sum of squares =

SSerror = ΣiΣ j (Yij −Yi )2 = dfisi

2 = si2 (ni −1)∑∑

Error degrees of freedom =

dferror = dfi = ni −1( )∑∑ = N − k

MSerror is like the pooled variance in a 2-sample t-test:

MSerror =SSerrordferror

=si2(ni −1)∑N − k

sp2 =

df1s12 + df2s2

2

df1+ df2

Page 50: 15. ANOVA p1

Test statistic: F

If H0 is true, then

n σ x 2 = σ x

2

In other words:

F =n σ x

2

σ x2 = 1

But, the above refer to population parameters. We must estimate F from samples with: MSgroup / MSerror

Page 51: 15. ANOVA p1

F if null hypothesis is false:

We test whether the F ratio is greater than one, as it would be if H0 is false:

F =n σ x

2 + Variance µi[ ]( )σ x2 >1

But we must take into account sampling error. Often, F calculated from data will be greater than one even when the null is true. Hence we must compare F to a null distribution.

Page 52: 15. ANOVA p1

ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P

Groups (treatment) 7.224 2 3.6122 7.29 0.004

Error 9.415 19 0.4955

Total 16.639 21

An ANOVA table is a convenient way to keep track of the important calculations.

Scientific papers often report ANOVA results with

ANOVA tables.

Page 53: 15. ANOVA p1

Example: Body temperature of squirrels in low, medium and

hot environments

Wooden & Walsberg (2004) Body temperature and locomotor capacity in a heterothermic rodent. Journal of Experimental Biology 207:41-46.

Page 54: 15. ANOVA p1

Squirrel body temperature data (°C) Cold: 30.4, 31.0, 31.2, 31.0, 31.5, 30.4, 30.6, 31.1,

31.3, 31.9, 31.4, 31.6, 31.5, 31.4, 30.3, 30.5, 30.3, 30.0, 30.8, 31.0

Warm: 36.3, 37.5, 36.9, 37.2, 37.5, 37.7, 37.5, 37.7,

38.0, 38.0, 37.6, 37.4, 37.9, 37.2, 36.3, 36.2, 36.4, 36.7, 36.8, 37.0, 37.7

Hot: 40.7, 40.6, 40.9, 41.1, 41.5, 40.8, 40.5, 41.0,

41.3, 41.5, 41.3, 41.2, 40.7, 40.3, 40.2, 41.3, 40.7, 41.6, 41.5, 40.5

Page 55: 15. ANOVA p1

Hypotheses

H0: Mean body temperature is the same for all three groups of squirrels. HA: At least one of the three is different from the others.

Page 56: 15. ANOVA p1

Summary data

Group s n

Cold 31.0 0.551 20

Warm 37.2 0.582 21

Hot 41.0 0.430 20 €

x

Total sample size:

N = n∑ = 20 + 21= 20 = 61

Page 57: 15. ANOVA p1

Error Mean square for squirrels

SSerror = dfisi2∑

= 19 0.551( )2 + 20 0.582( )2 +19 0.430( )2

= 16.1

dferror = 19 + 20 +19 = 58

MSerror =16.158

= 0.277

Page 58: 15. ANOVA p1

Squirrel Mean Squares Group:

X =20 31.0( ) + 21 37.2( ) + 20 41.0( )

20 + 21+ 20= 36.4

SSgroup = 20(31.0-36.4)2 +21(37.2-36.4)2 +20(41.0-36.4)2

=1015.7 €

SSgroup = ni X i − X ( )2∑

Page 59: 15. ANOVA p1

Squirrel Mean Squares Group:

dfgroup= k – 1 = 3-1=2

MSgroups =SSgroupsdfgroups

=1015.72

= 507.9

Page 60: 15. ANOVA p1

The test statistic for ANOVA is F

F =MSgroupMSerror

=507.90.277

= 1834.7

MSgroupis always in the numerator, MSerror is always in the denominator

Page 61: 15. ANOVA p1

Compare to Fα(1),df_group,df_error

F0.05(1),2,58 = 3.15. Since 1835 > 3.15, we know P < 0.05 and we can reject the null hypothesis. The variance in sample group means is bigger than expected given the variance within sample groups. Therefore, at least one of the groups has a population mean different from another group.

Page 62: 15. ANOVA p1

ANOVA table – squirrel data

Source SS df MS F P Group 1015.7 2 507.9 1834.7 <0.0001 Error 16.1 58 0.277 Total 1031.8 60

Page 63: 15. ANOVA p1

Variation Explained: R2

R2 =SSgroupsSStotal

The fraction of variation in Y that is “explained” by differences among groups

Page 64: 15. ANOVA p1

Assumptions of ANOVA

(1)  Random samples

(2)  Normal distributions for each population

(3) Equal variances for all populations.

Page 65: 15. ANOVA p1

Kruskal-Wallis test

•  A non-parametric test similar to a single factor ANOVA

•  Uses the ranks of the data points