Far-Sighted Correction Section 26.1 Near-Sighted Correction zero.
Bonferonni correction+
description
Transcript of Bonferonni correction+
Bonferonni correction+
Adapted from presentation of Рубанович А.В.
Experiments in finding people with paranormal powers:
Joseph Rhine (1950)
1000 people guessed the sequence of 10 cards: red or black?
12 persons guessed 9 of 10 cards, two of them all 10 cards
All these “physics” in further experiments did’t confirm their paranormal abilities
Problems of «multiple comparisons» ?
Genome-wide association: gene expression studies with DNA chips – 500 000 SNP. For the significance level 0.01 we can expect up to 5000 false associations
Meta-studies: joining and comparison of different results obtained by different authors
Multiple testing is dangerous: large probability tofind false association!
Let us generate two identically distributed sampleswith 100 persons with 20-locus genotypes
gencase gencontrol ORreal p1 6 7 0.85 0.7822 12 9 1.4 0.5133 8 7 1.2 0.7964 9 9 1.0 1.005 9 9 1.0 1.006 9 11 0.80 0.6557 10 10 1.0 1.008 8 11 0.70 0.4919 8 9 0.88 0.80810 7 14 0.46 0.12711 12 9 1.4 0.51312 8 10 0.78 0.63713 12 10 1.2 0.67014 10 10 1.0 1.0015 7 6 1.2 0.78216 12 11 1.1 0.83517 10 9 1.1 0.81918 11 9 1.2 0.65519 13 10 1.3 0.53220 7 5 1.4 0.564
gencase gencontrol ORreal p1 14 7 2.2 0.1272 12 5 2.6 0.08963 15 14 1.1 0.8534 8 12 0.64 0.3715 5 8 0.61 0.4056 9 10 0.89 0.8197 7 6 1.2 0.7828 10 12 0.81 0.6709 12 9 1.4 0.51310 10 12 0.81 0.67011 12 12 1.0 1.0012 14 9 1.6 0.29713 15 7 2.3 0.088114 7 9 0.76 0.61715 6 6 1.0 1.0016 9 10 0.89 0.81917 11 7 1.6 0.34618 8 7 1.2 0.79619 15 13 1.2 0.70520 11 6 1.9 0.225
gencase gencontrol ORreal p1 10 9 1.1 0.8192 12 10 1.2 0.6703 12 13 0.91 0.8414 10 16 0.58 0.2395 11 13 0.83 0.6836 13 13 1.0 1.007 12 4 3.3 0.04558 12 8 1.6 0.3719 13 12 1.1 0.84110 14 11 1.3 0.54911 17 10 1.8 0.17812 9 5 1.9 0.28513 8 9 0.88 0.80814 15 9 1.8 0.22115 11 9 1.2 0.65516 9 16 0.52 0.16217 10 14 0.68 0.41418 11 10 1.1 0.82719 10 10 1.0 1.0020 11 6 1.9 0.225
gencase gencontrol ORreal p1 7 8 0.87 0.7962 10 2 5.4 0.02093 17 5 3.9 0.01054 13 12 1.1 0.8415 12 11 1.1 0.8356 7 10 0.68 0.4677 10 12 0.81 0.6708 14 9 1.6 0.2979 14 8 1.9 0.20110 9 12 0.73 0.51311 13 9 1.5 0.39412 9 14 0.61 0.29713 10 13 0.74 0.53214 8 8 1.0 1.0015 14 12 1.2 0.69516 17 7 2.7 0.041217 11 13 0.83 0.68318 10 8 1.3 0.63719 16 10 1.7 0.23920 12 8 1.6 0.371
How it happens? Appearance of false associations
OR p Gene Sample 1 Sample 2Cases ControlsOdd Ratio –w/o association OR=1
1
Should be OR=1
Significant!
234
All 3 loci are Associated with
a disease!
How to avoid false associations?
Applying m independent statistical tests with significance level a, a probability of at least one false association should be
1-(1-a)m < 0.05
Carlo Bonferroni (1935):When applying m independent statistical test, only significant
results are results with
Bonferroni correction kills the significance of certainresults:
Control (100)
Cases (100)
OR p
Mutation 1 1 8 8,61 0,044
Mutation 2 5 15 3,35 0,024
But adjusted by Bonferroni it should be:p < 0,05/2=0,025
Two mutations associated with the disease
1 against 8 with equal size samples :
case_mut1=matrix(1,8,1)case_non_mut1=matrix(0,92,1)control_mut1=matrix(1,1,1)control_non_mut1=matrix(0,99,1)data=rbind(case_mut1,case_non_mut1,control_mut1,control_non_mut1)res=rbind(matrix(1,100,1),matrix(0,100,1))mylogit<- glm(as.formula(res~data), family=binomial(link="logit"), na.action=na.pass)exp(mylogit$coefficients[2])summary(mylogit)[["coefficients"]][,"Pr(>|z|)"]
case_mut1=matrix(1,15,1)case_non_mut1=matrix(0,85,1)control_mut1=matrix(1,5,1)control_non_mut1=matrix(0,95,1)data=rbind(case_mut1,case_non_mut1,control_mut1,control_non_mut1)res=rbind(matrix(1,100,1),matrix(0,100,1))mylogit<- glm(as.formula(res~data), family=binomial(link="logit"), na.action=na.pass)exp(mylogit$coefficients[2])summary(mylogit)[["coefficients"]][,"Pr(>|z|)"]
Example to compute OR
Assessment of individual sensitivity to ionizing radiation and DNA repair efficiency in a healthy population
F. Marcona, C. Andreoli, et al. Mut. Res., 541 (2003)
Not significant! According to Bonferroni shoud be:
Genotypes
High-Throughput Detection of GST Polymorphic Alleles in a Pediatric Cancer Population P. Barnette, R. Scholl, et al. Cancer Epidemiology, Biomarkers & PreventionVol. 13, 304–313, 2004
8 diseases
13 genotypesOR=6,4 P=0,007
OR=2,3 P=0,018
Not significant! Bonferroni correction requests:
Homozygocity in GST prevents cancer!
Control
Bonferroni method creates more problems than it solves (Thomas Perneger, 1998):
Bonferroni correction leads to very high probability to miss proper association!
“Bonferroni adjustments are, at best, unnecessary and, at worst, deleterious to sound statistical inference…”
Errors by statistical testing
Type I Error Probability to reject null hypothesis=probability to find differences where there are any = Probability of false discovery
Type II ErrorProbability to accept wrong null hypothesis= Probability not to find existing differences = Probability to miss proper discovery
Test power = 1- Type II error = Probability to reject correctly null hypothesis = Probability to make a discovery
Null hypothesis – usually about absence of differences in two samples
Traditionally a biologist is trying to avoid Type I error, i.e. to guarantee avoidance of
False discoveries
… and is not taking care aboutthe possibility to miss discovery (Type II Error)
0
0,2
0,4
0,6
0,8
0 5 10 15 20Число тестов
Ош
ибк
а II
род
а
Dependence of Type II error on number of tests usingthe Bonferroni correction
Probability to miss gene with OR=2.7 with sample sizes 100 (case) and 100 (control)
With 100 comparisons to guarantee avoidance of 1 false discovery, we miss 88% proper discoveries!
For m=100 the probability of error is 0.88
1
In single test a probability tomiss the discovery is 0.2
With 5 comparisons we miss 50% of discoveries
Number of tests
New algorithm to test statistical hypothesis: FDR-control
False Discovery Rate control: Benjamini, Hochberg (1995))
Probability of false discovery < Significance level Type I Error < 0.05
Average fraction of false discoveries < Significance level chosen
Traditional principle is replaced by
>105 papers in
Algorithm of FDR control(Benjamini(Benjamini,, Hochberg, 1995) Hochberg, 1995)
Order tests according to p-value Order tests according to p-value : :
pp11 < p < p22 < … < p < … < pmm..
For For FDR control FDR control onon αα level level ( (e.g.e.g. 0.05) 0.05),,
we findwe find
Differences are assumed to be significant Differences are assumed to be significant for for j = 1, …, j*.j = 1, …, j*.
ForFor j > jj > j* * differences are assumed not to be significant
m
jpjj j:max*
Order number ofgene
Significance levelrequired
Total number of tests
(genes)
P-value for j-th test(gene)
BonferroniCorrection
0,005
0,005
0,005
0,005
0,005
0,005
0,005
0,005
0,005
0,005
Example: multiple comparisons on 10 tests
FDR correction
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
0,045
0,050
Test pi
1 0,001
2 0,0055
3 0,01
4 0,015
5 0,02
6 0,04
7 0,3
8 0,5
9 0,6
10 0,8
Significant p-valueswithout correction
Order tests in ascendingorder of p-value
Bonferonni correctionleaves only first value
In first cellBonferroni p-value
In secondtwo times larger
Three times largerand so on ….
For 6th testp-value is larger than FDR
Significant corrections
after FDR control
That’s it!!!
ExampleExample: : expression ofexpression of 3051 3051 genesgenes in leykomiain leykomiaGolub T.R. Molecular classification of cancer: class discovery and class Golub T.R. Molecular classification of cancer: class discovery and class
prediction by gene expression monitoring.prediction by gene expression monitoring. // Science. 2001, v.286.// Science. 2001, v.286.
t-t-testtest: 1045 : 1045 genes, for which genes, for which p<0.05p<0.05 Bonferroni correctionBonferroni correction: 98 : 98 genes with genes with p’<0.000016p’<0.000016 FDR: 681 FDR: 681 genes, for which genes, for which FDR< 0.05FDR< 0.05
t-statistics for the comparison of gene expression in healthy
and ill patients
Number of geneswith this level of t-statistics