Statistical Methods in Computer Science

Statistical Methods in Computer Science

Hypothesis Testing II:Single-Factor Experiments

Ido Dagan

Empirical Methods in Computer Science © 2006-now Gal Kaminka

2

Single-Factor Experiments

A generalization of treatment experiments Determine effect of independent variable values

(nominal) Effect: On the dependent variable

treatment1 Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1


control Ex1 & Ex2 & .... & Exn ==> Dep3

Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish

baseline)


3

Single-Factor Experiments

An generalization of treatment experiments Determine effect of independent variable values

(nominal) Effect: On the dependent variable




Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish

baseline)

Values of independent variable

Values of dependent variable


4Single-Factor Experiments: Definitions

The independent variable is called the factor Its values (being tested) are called levels

Our goal: Determine whether there is an effect of levels Null hypothesis: There is no effect Alternative hypothesis: At least one level causes an

effect

Tool: One-way ANOVA A simple special case of general Analysis of Variance


5The case for Single-factor ANOVA

(one-way ANOVA) We have k samples (k levels of the factor)

Each with its own sample mean, sample std. deviation for the dependent variable value

We want to determine whether one (at least) is different


…treatment2 Indk & Ex1 & Ex2 & .... & Exn ==> Depk


Values of independent variable = levels of the

factor

Values of dependent variable

Cannot use the tests we learned: Why?




Each with its own sample mean, sample std. deviation We want to determine whether one (at least) is

different

H0: M1=M2=M3=M4

H1: There exist i,j such that Mi <> Mj

Level S. mean S. stdev. N(sample) Mi Si

1 4.24 0.91 292 3.75 1.38 1203 2.85 1.38 594 2.63 1.41 59




Each with its own sample mean, sample std. deviation We want to determine whether one (at least) is

different

H0: M1=M2=M3=M4

H1: There exist i,j such that Mi <> Mj

Level S. mean S. stdev. N(sample) Mi Si

1 4.24 0.91 292 3.75 1.38 1203 2.85 1.38 594 2.63 1.41 59

Why not use t-test to compare every

Mi, Mj?


8

Multiple paired comparisons

Let ac be the probability of an error in a single comparison alpha = the probability of incorrectly rejecting null hypothesis

1-ac: probability of making no error in a single comparison

(1-ac)m: probability of no error in m comparisons

(experiment) ae = 1-(1-ac)

m: probability of an error in the experiment Under assumption of independent comparisons

ae quickly becomes large as m increases


9

Example

Suppose we want to contrast 15 levels of the factor 15 groups, k=15

Total number of pairwise comparisons (m) : 105 15 X (15-1) / 2 = 105

Suppose ac = 0.05 Then ae = 1-(1-ac)

m = 1-(1-0.05)105 = 0.9954

We are very likely to make a type I error!


10

Possible solutions?

Reduce ac until overall ae level is 0.05 (or as needed) Risk: comparison alpha target may become unobtainable

Ignore experiment null hypothesis, focus on comparisons Carry out m comparisons # of errors in m experiments: m X ac

e.g., m=105, ac=0.05, # of errors = 5.25. But which?


11

One-way ANOVA

A method for testing the experiment null hypothesis H0: all levels' sample means are equal to each other

Key idea: Estimate a variance B under the assumption H0 is

true Estimate a “real” variance W (regardless of H0) Use F-test to test hypothesis that B=W

Assumes variance of all groups is the same


12

Some preliminaries

Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i

For example:Class 1 Class 2 Class 3

14.9 11.1 5.715.2 9.5 6.617.9 10.9 6.715.6 11.7 6.810.7 11.8 6.9

Mi 14.86 11 6.54Vi 6.8 0.85 0.23

x1,2

x3,4


13

Some preliminaries

Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i

Let M be the grand sample mean (all elements, all samples)

Let V be the grand sample variance


14The variance contributing to a value

Every element xi,j can be re-written as:

xi,j = M + ei,j

where ei,j is some error component

We can focus on the error component

ei,j = xi,j – M

which we will rewrite as:

ei,j = (xi,j - Mi ) + (Mi - M)


15Within-group and between-group

The re-written form of the error component has two parts

ei,j = (xi,j - Mi ) + (Mi - M) Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean

For example, in the table: x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1


16Within-group and between-group

The re-written form of the error component has two parts

ei,j = (xi,j - Mi ) + (Mi - M) Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean

For example, in the table: x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1

Note within-group and between-group components: Most of the error (variance) is due to the between group!

Can we use this in more general fashion?


17

No within-group variance

Class 1 Class 2 Class 3M 15 11 6

10.67 15 11 6V 15 11 6

14.52 15 11 615 11 6

Mi 15 11 6Vi 0 0 0

No variance within group, in any element


18

No between-group variance

Class 1 Class 2 Class 3M 17 11 2215 26 13 14V 9 18 12

24.86 11 18 812 15 19

Mi 15 15 15Vi 46.5 9.5 31

No variance between groups, in any group


19Comparing within-group and between-groups components

The error component of a single element is:

ei,j =(xi,j - M) = (xi,j - Mi ) + (Mi - M)

Let us relate this to the sample and grand sums-of-squares It can be shown that:

Let us rewrite this as

i j

ii j

iji,i j

ji, MM+Mx=Mx 222

betweenwithintotal SS+SS=SS


20From Sums of Squares (SS) to

variances

We know

... and convert to Mean Squares (as variance estimates):


`

1

2

1

2

IN

Mx

=N

Mx

=df

SS=MS i j

iji,

I

=ki

i jiji,

within

withinwithin

11

22

I

MMN=

I

MM

=df

SS=MS i

iii j

i

between

betweenbetween



variances

We know

... and convert to variances:


IN

Mx

=N

Mx

=df

SS=MS i j

iji,

I

=ki

i jiji,

within

withinwithin

2

1

2

1

11

22

I

MMN=

I

MM

=df

SS=MS i

iii j

i

between

betweenbetween

Degrees of freedom



variances

We know

... and convert to variances:


IN

Mx

=N

Mx

=df

SS=MS i j

iji,

I

=ki

i jiji,

within

withinwithin

2

1

2

1

11

22

I

MMN=

I

MM

=df

SS=MS i

iii j

i

between

betweenbetween

# of levels (samples)



variances

We know

... and convert to variances:betweenwithintotal SS+SS=SS

`

1

2

1

2

IN

Mx

=N

Mx

=df

SS=MS i j

iji,

I

=ki

i jiji,

within

withinwithin

11

22

I

MMN=

I

MM

=df

SS=MS i

iii j

i

between

betweenbetween


24

Determining final alpha level

MSwithin is an estimate of the (inherent) population variance Which does not depend on the null hypothesis (M1=M2=... MI)

Intuition: It’s an “average” of variances in the individual groups

MSbetween estimates the population variance + the treatment effect It does depend on the null hypothesis

Intuition: It’s similar to an estimate for the variance of the samples means, where each component is multiplied by Ni

Recall: N · sample mean variance = population variance

If the null hypothesis is true – the two values estimate the inherent variance, and should be equal up to the sampling variation

So now we have two variance estimates for testing Use F-test

F = Msbetween / MSwithin

Compare to F-distribution with dfbetween, dfwithin

Determine alpha level (significance)


25

ExampleClass 1 Class 2 Class 3

M 14.9 11.1 5.710.8 15.2 9.5 6.6

V 17.9 10.9 6.714.64 15.6 11.7 6.8

10.7 11.8 6.9Mi 14.86 11 6.54Vi 6.8 0.85 0.23

173.310.86.54510.811510.814.865 222 =++=SSbetween

86.72

173.3

13

173.3===

df

SS=MS

between

betweenbetween


26


M 14.9 11.1 5.710.8 15.2 9.5 6.6

V 17.9 10.9 6.714.64 15.6 11.7 6.8

10.7 11.8 6.9Mi 14.86 11 6.54Vi 6.8 0.85 0.23

22 14.8610.7...14.8614.9 ++=SSwithin

2.612

31.5

315

31.5===

df

SS=MS

within

withinwithin

31.56.546.9...1111.8....1111.1 222 =+++++


27


M 14.9 11.1 5.710.8 15.2 9.5 6.6

V 17.9 10.9 6.714.64 15.6 11.7 6.8

10.7 11.8 6.9Mi 14.86 11 6.54Vi 6.8 0.85 0.23

F=MSbetween

MSwithin

=86.72.6

=32.97Check F

distribution(2,12):Significant!


28Reading the results from statistics

software

You can use a statistics software to run one-way ANOVA

It will give out something like this:

Source df SS MS F pbetween 2 173.3 86.7 32.97

p<0.001within 14 31.5 2.6total 16 204.9

You should have no problem reading this, now.


29

Analogy to linear regression

Analogy to linear regression – where:• the variance of observation is composed of:– the variance of the predictions – plus the variance of the deviations from the corresponding

predictions:

• that is – explained variance (according to the prediction) vs. unexplained variance (due to deviations from prediction)


30

Summary

Treatment and single-factor experiments Independent variable: categorical Dependent variable: “numerical” (ratio/interval)

Multiple comparisons: A problem for experiment hypotheses

Run one-way ANOVA instead Assumes:

populations are normal have equal variances independent random samples (with replacement)

Moderate deviation from normal, particularly with large samples, is still fine Somewhat different variances are fine for roughly equal samples

If significant, run additional tests for details: Tukey's procedure (T method) LSD Scheffe ...

Statistical Methods in Computer Science

Documents

Transcript of Statistical Methods in Computer Science