15. ANOVA p1

Assignment #7

Chapter 12: 18, 24 Chapter 13: 28 Due next Friday Nov. 20th by 2pm in your TA’s homework box

Lab Report

•  Posted on web-site •  Dates

–  Rough draft due to TAs homework box on Monday Nov. 16th –  Rough draft returned in your lab section the week of Nov. 23rd –  Final draft due at start of your registered lab section the week of Nov. 30th

•  10% of course grade –  Rough Draft - 5% –  Final draft - 5% –  If you’re happy with your rough draft mark, you can tell your TA to use it for

the final draft

•  Read the “Writing a Lab Report” section of your lab notebook for guidance!!

Reading

For Today: Chapter 15 For Thursday: Chapter 15

In-class Exercise Do people use more paper when they know it will be recycled? • People given paper and told to test scissors. • Recycling bin either present or not No recycling bin: 4,4,4,4,4,4,4,5,8,9,9,9,9,12,12,13,14,14,14,14,15,23 Recycling bin: 4,5,8,8,8,9,9,9,12,14,14,15,16,19,23,28,40,43,129,130 1.  Make histograms and identify options for test 2.  Choose an test that you can do in class and conduct it

No Bin Present

Freq

uenc

y

0 20 40 60 80 100 120

02

46

8

Bin Present

Freq

uenc

y

0 20 40 60 80 100 120

0.0

1.0

2.0

3.0

Mann-Whitney U-Test

H0: The two groups do not differ in paper use HA: The two groups differ in paper use

Value 4 4 4 4 4 4 4 5 8 9 9 9 9 12 12 13 14 14 14 14 15 23 Rank 4.5 4.5 4.5 4.5 4.5 4.5 4.5 9.5 12.5 18 18 18 18 23 23 25 28.5 28.5 28.5 28.5 32.5 36.5

Value 4 5 8 8 8 9 9 9 12 14 14 15 16 19 23 28 40 43 129 130 Rank 4.5 9.5 12.5 12.5 12.5 18 18 18 23 28.5 28.5 32.5 34 35 36.5 38 39 40 41 42

No Bin Present:

Bin Present:

n1= 22 R1=379.5

n2= 20 R2=523.5

Calculating the test statistic, U

€

U1 = n1n2 +n1 n1+1( )2

− R1

€

U2 = n1n2 −U1

U1 = 22(20)+22 23( )2

−379.5= 313.5

U2 = 22(20)−313.5=126.5

U = 313.5

Mann-Whitney: Large sample approximation For n1 and n2 both greater than 10, use

€

Z =2U − n1n2

n1n2 n1+ n2 +1( ) / 3

Z = 2(313.5)− 22(20)22(20) 22+ 20+1( ) / 3

= 2.34

P = 2(0.00964) = 0.01928

P<0.05 so we reject the null hypothesis. People use more paper when there is a recycling bin

Chapter 14 Review

Goals of experiments

•  Eliminate bias

•  Reduce sampling error (increase precision and power)

Precise Imprecise

Biased

Unbiased

Design features that reduce bias

•  Controls

•  Random assignment to treatments

•  Blinding

Controls

A group which is identical to the experimental treatment in all respects

aside from the treatment itself.

Randomization

The random assignment of treatments to

units in an experimental study

Breaks the association between possible confounding variables and the

explanatory variable.

Randomization

Supplemental Oxygen (Explanatory variable)

Survive Mt. Everest (Response variable)

Preparedness (Confounding variable)

?

Blinding

•  Preventing knowledge of patient and/or experimenter of which treatment is given to whom –  Single blind – blind patient –  Double blind – blind patient and experimenter

•  Unblinded studies usually find much larger effects (sometimes threefold higher), showing the bias that results from lack of blinding

Reducing sampling error

€

t =Y 1 −Y 2

sp2 1

n1+1n2

#

$ %

&

' (

Increasing the signal to noise ratio

"Signal"

"Noise"

Reducing sampling error Increasing the signal to noise ratio

If the "noise" is smaller, it is easier to detect a given "signal".

Can be achieved with smaller s or larger n. €

sp2 1n1

+1n2

"

# $

%

& ' .

Design features that reduce the effects of sampling error

•  Replication

•  Balance

•  Blocking

•  Extreme treatments

Replication

The application of every treatment to

multiple, independent experimental units

Balance

In a balanced experimental design, all

treatments have equal sample size.

Balance increases precision

€

SEY 1 − Y 2

= sp2 1

n1+1n2

#

$ %

&

' ( .

For a given total sample size (n1+n2), the standard error is smallest when n1=n2.

Blocking

The grouping of experimental units that have similar properties.

Within each block, treatments are

randomly assigned to experimental units.

Extreme Treatments

•  Treatment effects are easiest to detect when they are large.

•  Stronger treatments can increase the signal-to-noise ratio.

•  Caution: effects may not scale linearly

Experiments with more than one factor

A factor is a single treatment variable whose effects are of interest to the researcher Multiple factors to: •  Make more efficient use of money and

resources •  Estimate effects of interaction between

factors

Interaction between explanatory variables

The effect of one variable depends on the

state of a second variable

Example of factorial design and interaction

What if we can’t do experimental studies?

Observational studies are still useful to detect patterns and generate hypotheses

Best observational studies Minimize bias: •  Controls •  Randomization •  Blinding Minimize sampling error: •  Replication •  Balance •  Blocking •  Extreme treatments

Matching Every individual in the treatment group is paired with a control individual having the same or very similar values for the suspected confounding variables Does not account for all confounding variables (like randomization does), but only those used to match participants.

Chapter 15: ANOVA

Comparing the means of more than two groups

Analysis of variance (ANOVA)

•  Like a t-test, but can compare more than two groups

•  Asks whether any of two or more means is different from any other.

•  In other words, is the variance among groups greater than 0?

Null hypothesis for simple ANOVA

• H0 : Variance among groups = 0

OR

• H0 : µ1 = µ2 = µ3 = µ4 = ... µk

X1 X2 X3

Freq

uenc

yµ1 = µ2 = µ3

X1 X2 X3

Freq

uenc

y

Not all µ's equal

H0: all populations have equal means

HA: at least one population mean is different.

ANOVA's v. t-tests

An ANOVA with 2 groups is mathematically equivalent to a two-tailed 2-sample t-test.

€

µ = 67.4

€

σ = 3.9

€

µY = µ = 67.4

€

σY =σn

=3.95

=1.7

NSampling

distribution of Y

Sampling distribution of Y

If we draw multiple samples from the same population, we are also drawing sample means

from an expected distribution.

€

σ x =σ x

n

Under the null hypothesis, the sample mean of each group should vary because of sampling error. The standard deviation of sample means, when the true mean is constant, is the standard error:

€

σx 2 =

σx2

n

Squaring the standard error, the variance among groups due to sampling error should be:

In ANOVA, we work with variances rather than standard deviations.

€

σ x 2 =

σ x2

n+ Variance µi[ ]

If the null hypothesis is not true, the variance among groups should be equal to the variance due to sampling error plus the real variance among population means.

With ANOVA, we test whether the variance among true group means is greater than zero. We do this by asking whether the observed variance among groups is greater than expected by chance (assuming the null is true):

€

σ x 2 >

σ x2

n?

€

σ x 2 >

σ x2

n?

nσ x 2 > σ x

2 ?

€

nσ x 2 is estimated by the

“Mean Squares Group”

€

σ x2 is the variance within groups,

estimated by the “Mean Squares Error”

€

MSgroup

€

MSerror

Population parameters Estimates from sample

Example

Treatment Mean Standard deviation n Control -0.3088 0.6176 8 Knees -0.3357 0.7908 7 Eyes -1.5514 0.7063 7

It is know that circadian rhythm is affected by the schedule of light to the eyes. Can circadian rhythm be affected by the schedule of light to the back of the knees?

ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P

Groups (treatment) 7.224 2 3.6122 7.29 0.004

Error 9.415 19 0.4955

Total 16.639 21

An ANOVA table is a convenient way to keep track of the important calculations.

Scientific papers often report ANOVA results with

ANOVA tables.

Partitioning the sum of squares

Yij −Y = (Yi −Y )+ (Yij −Yi )For each data point:

Partitioning the sum of squares

SStotal = ΣiΣ j (Yij −Y )2 = Σini (Yi −Y )

2 +ΣiΣ j (Yij −Yi )2

For the total variation in the dataset:

€

n σ x 2 + Variance µi[ ]( )

€

MSgroup

Mean squares group

Abbreviation:

Estimates this parameter:

Formula:

€

MSgroups =SSgroupsdfgroups

Mean squares group

€

SSgroup = ni X i − X ( )2∑

dfgroup = k-1

€


Mean squares error

Abbreviation:

Estimates this parameter:

Formula:

€

MSerror =SSerrordferror

€

σ x2

€

MSerror

Mean squares error Error sum of squares =

SSerror = ΣiΣ j (Yij −Yi )2 = dfisi

2 = si2 (ni −1)∑∑

Error degrees of freedom =

€

dferror = dfi = ni −1( )∑∑ = N − k

MSerror is like the pooled variance in a 2-sample t-test:

€

MSerror =SSerrordferror

=si2(ni −1)∑N − k

€

sp2 =

df1s12 + df2s2

2

df1+ df2

Test statistic: F

If H0 is true, then

€

n σ x 2 = σ x

2

In other words:

€

F =n σ x

2

σ x2 = 1

But, the above refer to population parameters. We must estimate F from samples with: MSgroup / MSerror

F if null hypothesis is false:

We test whether the F ratio is greater than one, as it would be if H0 is false:

€

F =n σ x

2 + Variance µi[ ]( )σ x2 >1

But we must take into account sampling error. Often, F calculated from data will be greater than one even when the null is true. Hence we must compare F to a null distribution.

ANOVA Table Source of variation Sum of squares df Mean squares F-ratio P

Groups (treatment) 7.224 2 3.6122 7.29 0.004

Error 9.415 19 0.4955

Total 16.639 21

An ANOVA table is a convenient way to keep track of the important calculations.

Scientific papers often report ANOVA results with

ANOVA tables.

Example: Body temperature of squirrels in low, medium and

hot environments

Wooden & Walsberg (2004) Body temperature and locomotor capacity in a heterothermic rodent. Journal of Experimental Biology 207:41-46.

Squirrel body temperature data (°C) Cold: 30.4, 31.0, 31.2, 31.0, 31.5, 30.4, 30.6, 31.1,

31.3, 31.9, 31.4, 31.6, 31.5, 31.4, 30.3, 30.5, 30.3, 30.0, 30.8, 31.0

Warm: 36.3, 37.5, 36.9, 37.2, 37.5, 37.7, 37.5, 37.7,

38.0, 38.0, 37.6, 37.4, 37.9, 37.2, 36.3, 36.2, 36.4, 36.7, 36.8, 37.0, 37.7

Hot: 40.7, 40.6, 40.9, 41.1, 41.5, 40.8, 40.5, 41.0,

41.3, 41.5, 41.3, 41.2, 40.7, 40.3, 40.2, 41.3, 40.7, 41.6, 41.5, 40.5

Hypotheses

H0: Mean body temperature is the same for all three groups of squirrels. HA: At least one of the three is different from the others.

Summary data

Group s n

Cold 31.0 0.551 20

Warm 37.2 0.582 21

Hot 41.0 0.430 20 €

x

Total sample size:

€

N = n∑ = 20 + 21= 20 = 61

Error Mean square for squirrels

€

SSerror = dfisi2∑

= 19 0.551( )2 + 20 0.582( )2 +19 0.430( )2

= 16.1

dferror = 19 + 20 +19 = 58

MSerror =16.158

= 0.277

Squirrel Mean Squares Group:

€

X =20 31.0( ) + 21 37.2( ) + 20 41.0( )

20 + 21+ 20= 36.4

SSgroup = 20(31.0-36.4)2 +21(37.2-36.4)2 +20(41.0-36.4)2

=1015.7 €

SSgroup = ni X i − X ( )2∑

Squirrel Mean Squares Group:

dfgroup= k – 1 = 3-1=2

€


=1015.72

= 507.9

The test statistic for ANOVA is F

€

F =MSgroupMSerror

=507.90.277

= 1834.7

MSgroupis always in the numerator, MSerror is always in the denominator

Compare to Fα(1),df_group,df_error

F0.05(1),2,58 = 3.15. Since 1835 > 3.15, we know P < 0.05 and we can reject the null hypothesis. The variance in sample group means is bigger than expected given the variance within sample groups. Therefore, at least one of the groups has a population mean different from another group.

ANOVA table – squirrel data

Source SS df MS F P Group 1015.7 2 507.9 1834.7 <0.0001 Error 16.1 58 0.277 Total 1031.8 60

Variation Explained: R2

R2 =SSgroupsSStotal

The fraction of variation in Y that is “explained” by differences among groups

Assumptions of ANOVA

(1)  Random samples

(2)  Normal distributions for each population

(3) Equal variances for all populations.

Kruskal-Wallis test

•  A non-parametric test similar to a single factor ANOVA

•  Uses the ranks of the data points

15. ANOVA p1

Documents

Transcript of 15. ANOVA p1