The logic behind a statistical test. A statistical test is the comparison of the probabilities in...

The logic behind a statistical test.

A statistical test is the comparison of the probabilities in favour of a hypothesis H1with the respective probabilities of an appropriate null hypothesis H0.

Hypothesis correct

1-

1-

Hypothesis wrong

Hypothesis rejected

Hypothesis accepted

Type I error

Type II error Power of a test

Accepting the wrong hypothesis H1 is termed type I error.Rejecting the correct hypothesis H1 is termed ttype II error.

Lecture 11Parametric hypothesis testing

Testing simple hypotheses

Karl Pearson threw 24000 times a coin and wanted to see whether in the real world deviations from the expectation of 12000 numbers and 12000 eagles occur. He got

12012 time the numbers. Does this result deviate from our expectation?

2400024000

12012

24000( 12012) 0.5

i

p Xi

12012 12000( 12012) 1 ( 12012) 1 1 0.438

6000

XX X

The exact solution of the binomial

The normal approximation

60005.0*5.0*24000)1(

24000*5.02

pnpnpq

np

0.95 1.96 12000 1.96 6000 12000 151.8CL x s

2 22 (12000 11988) (12000 12012)

0.02412000 12000

c2 test

Assume a sum of variances of Z-transformed variables

n

i

in nx

Ex

Ex

Ex

Ex

E1

22222322222122 ])[(])[(...])[(])[(])[(

Each variance is one. Thus the expected value of c2 is n

The c2 distribution is a group of distributions of variances in dependence on the number of elements n.

Observed values of c2 can be compared to predicted and allow for statistical hypthesis testing.

Pearson’s coin example

Probability of H0

9 times green, yellow seed3 times green, green seed3 times yellow, yellow seed1 time yellow, green seed

Combination Ratio Observed PredictedGY 9 61 65.25GG 3 16 21.75YY 3 28 21.75YG 1 11 7.25Sum 16 116 116

010203040506070

GY GG YY YG

# ob

serv

ation

s

Character combination

Observed

Predicted

Does the observation confirm the prediction?

2K2

1

(expected value - observed value)

expected value

25.7)1125.7(

75.21)2875.21(

75.21)1675.21(

25.65)6125.65( 2222

2

The Chi2 test has K-1 degrees of freedom.

2K2

1

(expected value - observed value)

expected value

25.7)1125.7(

75.21)2875.21(

75.21)1675.21(

25.65)6125.65( 2222

2

All statistical programs give the probability of the null hypothesis, H0.

Advices for applying a χ2-test

• χ2-tests compare observations and expectations. Total numbers of observations and expectations must be equal.

• The absolute values should not be too small (as a rule the smallest expected value should be larger than 10). At small event numbers the Yates correction should be used.

• The classification of events must be unequivocal.• χ2-tests were found to be quite robust. That means they are conservative and rather

favour H0, the hypothesis of no deviation. • The applicability of the χ2-test does not depend on the underlying distributions. They

need not to be normally of binomial distributed.

2K2

1

(expected frequency - observed frequency)

expected frequencyN

Dealing with frequencies

2K2

1

( expected value - observed value 0.5)

expected value

G-test or log likelihood test

c2 relies on absolute differences between observed and expected frequencies. However, it is also possible to take the quotient L = observed / expected as a measure

of goodness of fit

observed

expected

p2ln( ) 2 ln

pG L

1

2 lnk

i

OG O

E

G is approximately c2 distributed with k - 1 degrees of freedom

A species - area relation is expected to follow a power function of the form S = 10A0..3. Do the following data points (Area, species number) confirm this expectations:

A1 (1,12), A2 (2,18), A3 (4,14), A4 (8,30), A5 (16,35), A6 (32,38), A7 (64,33), A8 (128,35), A9 (256,56), A10 (512,70)? We try different tests.

Area Richness Estimate Chi2 Test G1 12 10 0.4 2.1878592 18 12.31144 2.628422 6.8371654 14 15.15717 0.088343 -1.111838 30 18.66066 6.890466 14.24339

16 35 22.97397 6.295189 14.7345232 38 28.28427 3.337381 11.2206564 33 34.82202 0.095335 -1.7735

128 35 42.87094 1.445074 -7.09961256 56 52.78032 0.196406 3.315948512 70 64.98019 0.387787 5.208893

Sum 21.7644 95.52699df 9 9

Chi2 distribution 0.009656 1.26E-16

01020304050607080

0 100 200 300 400 500 600

Spec

ies

richn

ess

Area

Both tests indicate that the regression line doesn’t fit

1

10

100

1 100

Spec

ies

richn

ess

Area

The pattern is better seen in a double log plot.

We have seven points above and 3 points below the regression line.

Is there a systematic error?

1

10

100

1 100

Spec

ies

richn

ess

Area

Tests for systematic errors.

The binomial

17.02110

)3(3

0

10

i ixp

The c2 test

6.15

)53(5

)57( 222

21.0)1;6.1( p

Area Richness Estimate Chi2 Test G1 12 13.584 0.184707 -1.487832 18 16.17099 0.206868 1.9287474 14 19.25067 1.432132 -4.458848 30 22.91684 2.189268 8.079756

16 35 27.28122 2.183902 8.72022832 38 32.47677 0.939318 5.96831664 33 38.66179 0.829135 -5.22536

128 35 46.0247 2.640844 -9.58406256 56 54.78984 0.026729 1.223428512 70 65.22425 0.349683 4.946478

Sum 10.98258 20.22174df 9 9

Chi2 distribution 0.276905 0.016592

y = 13.584x0.2515

01020304050607080

0 100 200 300 400 500 600

Spec

ies

richn

ess

Area

Spec

ies

richn

ess

Now we try the best fit model

the G-test identified even the best fit model as having larger deviations than expected from a

simple normal random sample model.

The best fit model

Observation and expectation can be compared by a Kolmogorov-Smirnov test.

The test compares the maximum cumulative deviation with that expected from a normal

distribution.

Area Richness Estimate Kolmogorov-Smirnov1 12 13.584 -1.584 -1.5842 18 16.17099 1.829006 0.2450064 14 19.25067 -5.25067 -5.005668 30 22.91684 7.083156 2.077496

16 35 27.28122 7.718776 9.79627232 38 32.47677 5.523225 15.319564 33 38.66179 -5.66179 9.657709

128 35 46.0247 -11.0247 -1.36699256 56 54.78984 1.210161 -0.15683512 70 65.22425 4.775754 4.618923

Maximum 15.3195df 9

Chi2 distribution 0.082526Probability of difference 0.917474

Kolmogorov-Smirnov test

Both results are qualitatively identical but differ quantitatively.

The programs use different algorithms

110 475

90 325

Curled

Normal

A B

200 800

585

415

1000

2x2 contingency table

1000 Drosophila flies with normal and curled wings and two alleles A and B

suposed to influence wing form.

Do flies with allele have more often curled wings than fiels with allele B?

Combination Observed Predicted Chi2A-curled 110 117 0.418803A-normal 90 83 0.590361B-cureled 475 468 0.104701B-normal 325 332 0.14759Sum 1000 1000 1.261456 0.73830541

Sum curled 585Sum normal 415Sum A 200Sum B 800

Chi2 distribution

26.1332

)325332(468

)475468(83

)9083(117

)110117( 22222

A contingency table chi2 test with n rows and m columns has (n-1) * (m-1)

degrees of freedom.

The 2x2 table has 1 degree of freedom

Predicted number of allele A and curled wings

1171000200

585)( curledAP

Relative abundance distributions

0

0.1

0.2

0.3

0.4

0.5

0 20 40 60

Re

lativ

e a

bu

nd

ance

Species rank order

0.00001

0.0001

0.001

0.01

0.1

1

0 10 20 30 40 50 60

log

rela

tive

ab

un

da

nce

Species rank order

Dominant species

Rare species

Intermediate species

The hollow curve

Evenness

Abundance is the total number of individuals in a

population.Density refers to the number of

individuals in a unit of measurement.

The log-normal distribution

log

rela

tive

ab

un

da

nce

Species rank order

The distribution of species abundance distributions across vertebrates and invertebrates

3 types of distributions: log-

series, power function, lognormal.

We compare 99 such distributions from all over the world.

Distribution Number

Good fit Intermediate fit

Bad fit

Lognormal 59 29 21 9Logseries 59 17 14 28Power function 59 13 24 22

Good fitIntermediate

fitBad fit


Vertebrates

Invertebrates

Row and column sums are identical due to our classification. We expect

equal entries for each cell:

67.193

59

)595959(

)92129()131729()(

FitclassDistrP

Distribution Number

Good fit Intermediate fit

Bad fit


Good fitIntermediate

fitBad fit


Vertebrates

Invertebrates Do vertebrates and invertebrates differ in abundance distributions?

29

2840

59*19

Vert

Inv

Obs

Exp

But if we take the whole pattern we get

Number of log-normal best fits only:

Student’s t-test for equal sample sizes and similar variances

Welch t-test for unequal variances and sample sizes

Bivariate comparisons of means

F-test

2122

F

2

22

1

21

11

ns

ns

xxt

22

21

11

ss

xxnt

F

s

s

ss

xxn

ns

ns

xxn

tSum

Difference

n

i

n

i

2

2

22

21

1

221

22

21

1

2212

2

11

1ndf

11 2

2

2

22

1

2

1

21

2

2

22

1

21

nns

nns

ns

ns

df

1

1

22

11

ndf

ndf

In a physiological experiment mean metabolism rates had been measured. A first treatment gave mean = 100, variance = 45, a second treatment mean = 120, variance = 55.

In the first case 30 animals in the second case 50 animals had been tested. Do means and variances differ?

N1+N2-2Degrees of freedom

The probability level for the null hypothesis2

22

1

21

11

ns

ns

xxt

4.12

5055

3045

100120

t

2122

F

22.14555

)30;50( F

The comparison of variances

Degrees of freedom: N-1

The probability for the null hypothesis of

no difference, H0.

1-0.287=0.713: probability that the first variance (50) is

larger than the second (30).

One sided test

0.57 2*0.287Past gives the probability for a two sided test that one variance is either larger or smaller

than the second.

Two sided test

1 2

2 21 2

t N

Power analysis

Nt

2

21

22

212

Effect size In an experiment you estimated two means

Each time you took 20 replicates. Was this sample size large enough to confirm differences between both means?

20;150

50;180

11

11

sx

sx

We use the t-distribution with 19 degrees of freedom.

15)150180(

205009.2 2

222

N

You needed 15 replicates to confirm a

difference at the 5% error level.

The t-test can be used to estimate the number of observations to detect a significant signal for a given effect size.

From a physiological experiment we want to test whether a certain medicament enhances short time memory.

How many persons should you test (with and without the treatment) to confirm a difference in memory of about 5%?

2 2 21 1 1

2

2

1.05 0.05 0.05 0.05

2.05 2.051.05 1.05

2.05820

0.05

t N N N N

tN t

We don’t know the variances and assume a Poisson random sample.Hences2 = m

We don’t know the degrees of freedom:

We use a large number and get t:

3150)96.1(*820 2 N

Home work and literature

Refresh:

• c2 test• Mendel rules• t-test• F-test• Contingency table• G-test

Prepare to the next lecture:

• Coefficient of correlation• Maximum, minimum of functions• Matrix multiplication• Eigenvalue

Literature:

Łomnicki: Statystyka dla biologów

The logic behind a statistical test. A statistical test is the comparison of the probabilities in...

Documents

Transcript of The logic behind a statistical test. A statistical test is the comparison of the probabilities in...