Bonferonni correction+

18
Bonferonni correction+ Adapted from presentation of Рубанович А.В.

description

Bonferonni correction+. Adapted from presentation of Рубанович А.В. 1000 people guessed the sequence of 10 cards: red or black ?. 12 persons guessed 9 of 10 cards, two of them all 10 cards. Experiments in finding people with paranormal powers : Joseph Rhine (1950). - PowerPoint PPT Presentation

Transcript of Bonferonni correction+

Page 1: Bonferonni correction+

Bonferonni correction+

Adapted from presentation of Рубанович А.В.

Page 2: Bonferonni correction+

Experiments in finding people with paranormal powers:

Joseph Rhine (1950)

1000 people guessed the sequence of 10 cards: red or black?

12 persons guessed 9 of 10 cards, two of them all 10 cards

All these “physics” in further experiments did’t confirm their paranormal abilities

Page 3: Bonferonni correction+
Page 4: Bonferonni correction+
Page 5: Bonferonni correction+

Problems of «multiple comparisons» ?

Genome-wide association: gene expression studies with DNA chips – 500 000 SNP. For the significance level 0.01 we can expect up to 5000 false associations

Meta-studies: joining and comparison of different results obtained by different authors

Multiple testing is dangerous: large probability tofind false association!

Page 6: Bonferonni correction+

Let us generate two identically distributed sampleswith 100 persons with 20-locus genotypes

gencase gencontrol ORreal p1 6 7 0.85 0.7822 12 9 1.4 0.5133 8 7 1.2 0.7964 9 9 1.0 1.005 9 9 1.0 1.006 9 11 0.80 0.6557 10 10 1.0 1.008 8 11 0.70 0.4919 8 9 0.88 0.80810 7 14 0.46 0.12711 12 9 1.4 0.51312 8 10 0.78 0.63713 12 10 1.2 0.67014 10 10 1.0 1.0015 7 6 1.2 0.78216 12 11 1.1 0.83517 10 9 1.1 0.81918 11 9 1.2 0.65519 13 10 1.3 0.53220 7 5 1.4 0.564

gencase gencontrol ORreal p1 14 7 2.2 0.1272 12 5 2.6 0.08963 15 14 1.1 0.8534 8 12 0.64 0.3715 5 8 0.61 0.4056 9 10 0.89 0.8197 7 6 1.2 0.7828 10 12 0.81 0.6709 12 9 1.4 0.51310 10 12 0.81 0.67011 12 12 1.0 1.0012 14 9 1.6 0.29713 15 7 2.3 0.088114 7 9 0.76 0.61715 6 6 1.0 1.0016 9 10 0.89 0.81917 11 7 1.6 0.34618 8 7 1.2 0.79619 15 13 1.2 0.70520 11 6 1.9 0.225

gencase gencontrol ORreal p1 10 9 1.1 0.8192 12 10 1.2 0.6703 12 13 0.91 0.8414 10 16 0.58 0.2395 11 13 0.83 0.6836 13 13 1.0 1.007 12 4 3.3 0.04558 12 8 1.6 0.3719 13 12 1.1 0.84110 14 11 1.3 0.54911 17 10 1.8 0.17812 9 5 1.9 0.28513 8 9 0.88 0.80814 15 9 1.8 0.22115 11 9 1.2 0.65516 9 16 0.52 0.16217 10 14 0.68 0.41418 11 10 1.1 0.82719 10 10 1.0 1.0020 11 6 1.9 0.225

gencase gencontrol ORreal p1 7 8 0.87 0.7962 10 2 5.4 0.02093 17 5 3.9 0.01054 13 12 1.1 0.8415 12 11 1.1 0.8356 7 10 0.68 0.4677 10 12 0.81 0.6708 14 9 1.6 0.2979 14 8 1.9 0.20110 9 12 0.73 0.51311 13 9 1.5 0.39412 9 14 0.61 0.29713 10 13 0.74 0.53214 8 8 1.0 1.0015 14 12 1.2 0.69516 17 7 2.7 0.041217 11 13 0.83 0.68318 10 8 1.3 0.63719 16 10 1.7 0.23920 12 8 1.6 0.371

How it happens? Appearance of false associations

OR p Gene Sample 1 Sample 2Cases ControlsOdd Ratio –w/o association OR=1

1

Should be OR=1

Significant!

234

All 3 loci are Associated with

a disease!

Page 7: Bonferonni correction+

How to avoid false associations?

Applying m independent statistical tests with significance level a, a probability of at least one false association should be

1-(1-a)m < 0.05

Carlo Bonferroni (1935):When applying m independent statistical test, only significant

results are results with

Page 8: Bonferonni correction+

Bonferroni correction kills the significance of certainresults:

Control (100)

Cases (100)

OR p

Mutation 1 1 8 8,61 0,044

Mutation 2 5 15 3,35 0,024

But adjusted by Bonferroni it should be:p < 0,05/2=0,025

Two mutations associated with the disease

1 against 8 with equal size samples :

Page 9: Bonferonni correction+

case_mut1=matrix(1,8,1)case_non_mut1=matrix(0,92,1)control_mut1=matrix(1,1,1)control_non_mut1=matrix(0,99,1)data=rbind(case_mut1,case_non_mut1,control_mut1,control_non_mut1)res=rbind(matrix(1,100,1),matrix(0,100,1))mylogit<- glm(as.formula(res~data), family=binomial(link="logit"), na.action=na.pass)exp(mylogit$coefficients[2])summary(mylogit)[["coefficients"]][,"Pr(>|z|)"]

case_mut1=matrix(1,15,1)case_non_mut1=matrix(0,85,1)control_mut1=matrix(1,5,1)control_non_mut1=matrix(0,95,1)data=rbind(case_mut1,case_non_mut1,control_mut1,control_non_mut1)res=rbind(matrix(1,100,1),matrix(0,100,1))mylogit<- glm(as.formula(res~data), family=binomial(link="logit"), na.action=na.pass)exp(mylogit$coefficients[2])summary(mylogit)[["coefficients"]][,"Pr(>|z|)"]

Example to compute OR

Page 10: Bonferonni correction+

Assessment of individual sensitivity to ionizing radiation and DNA repair efficiency in a healthy population

F. Marcona, C. Andreoli, et al. Mut. Res., 541 (2003)

Not significant! According to Bonferroni shoud be:

Genotypes

Page 11: Bonferonni correction+

High-Throughput Detection of GST Polymorphic Alleles in a Pediatric Cancer Population P. Barnette, R. Scholl, et al. Cancer Epidemiology, Biomarkers & PreventionVol. 13, 304–313, 2004

8 diseases

13 genotypesOR=6,4 P=0,007

OR=2,3 P=0,018

Not significant! Bonferroni correction requests:

Homozygocity in GST prevents cancer!

Control

Page 12: Bonferonni correction+

Bonferroni method creates more problems than it solves (Thomas Perneger, 1998):

Bonferroni correction leads to very high probability to miss proper association!

“Bonferroni adjustments are, at best, unnecessary and, at worst, deleterious to sound statistical inference…”

Page 13: Bonferonni correction+

Errors by statistical testing

Type I Error Probability to reject null hypothesis=probability to find differences where there are any = Probability of false discovery

Type II ErrorProbability to accept wrong null hypothesis= Probability not to find existing differences = Probability to miss proper discovery

Test power = 1- Type II error = Probability to reject correctly null hypothesis = Probability to make a discovery

Null hypothesis – usually about absence of differences in two samples

Traditionally a biologist is trying to avoid Type I error, i.e. to guarantee avoidance of

False discoveries

… and is not taking care aboutthe possibility to miss discovery (Type II Error)

Page 14: Bonferonni correction+

0

0,2

0,4

0,6

0,8

0 5 10 15 20Число тестов

Ош

ибк

а II

род

а

Dependence of Type II error on number of tests usingthe Bonferroni correction

Probability to miss gene with OR=2.7 with sample sizes 100 (case) and 100 (control)

With 100 comparisons to guarantee avoidance of 1 false discovery, we miss 88% proper discoveries!

For m=100 the probability of error is 0.88

1

In single test a probability tomiss the discovery is 0.2

With 5 comparisons we miss 50% of discoveries

Number of tests

Page 15: Bonferonni correction+

New algorithm to test statistical hypothesis: FDR-control

False Discovery Rate control: Benjamini, Hochberg (1995))

Probability of false discovery < Significance level Type I Error < 0.05

Average fraction of false discoveries < Significance level chosen

Traditional principle is replaced by

>105 papers in

Page 16: Bonferonni correction+

Algorithm of FDR control(Benjamini(Benjamini,, Hochberg, 1995) Hochberg, 1995)

Order tests according to p-value Order tests according to p-value : :

pp11 < p < p22 < … < p < … < pmm..

For For FDR control FDR control onon αα level level ( (e.g.e.g. 0.05) 0.05),,

we findwe find

Differences are assumed to be significant Differences are assumed to be significant for for j = 1, …, j*.j = 1, …, j*.

ForFor j > jj > j* * differences are assumed not to be significant

m

jpjj j:max*

Order number ofgene

Significance levelrequired

Total number of tests

(genes)

P-value for j-th test(gene)

Page 17: Bonferonni correction+

BonferroniCorrection

0,005

0,005

0,005

0,005

0,005

0,005

0,005

0,005

0,005

0,005

Example: multiple comparisons on 10 tests

FDR correction

0,005

0,010

0,015

0,020

0,025

0,030

0,035

0,040

0,045

0,050

Test pi

1 0,001

2 0,0055

3 0,01

4 0,015

5 0,02

6 0,04

7 0,3

8 0,5

9 0,6

10 0,8

Significant p-valueswithout correction

Order tests in ascendingorder of p-value

Bonferonni correctionleaves only first value

In first cellBonferroni p-value

In secondtwo times larger

Three times largerand so on ….

For 6th testp-value is larger than FDR

Significant corrections

after FDR control

That’s it!!!

Page 18: Bonferonni correction+

ExampleExample: : expression ofexpression of 3051 3051 genesgenes in leykomiain leykomiaGolub T.R. Molecular classification of cancer: class discovery and class Golub T.R. Molecular classification of cancer: class discovery and class

prediction by gene expression monitoring.prediction by gene expression monitoring. // Science. 2001, v.286.// Science. 2001, v.286.

t-t-testtest: 1045 : 1045 genes, for which genes, for which p<0.05p<0.05 Bonferroni correctionBonferroni correction: 98 : 98 genes with genes with p’<0.000016p’<0.000016 FDR: 681 FDR: 681 genes, for which genes, for which FDR< 0.05FDR< 0.05

t-statistics for the comparison of gene expression in healthy

and ill patients

Number of geneswith this level of t-statistics