Higher criticism for estimating proportion of non-null...
-
Upload
truongkhanh -
Category
Documents
-
view
215 -
download
0
Transcript of Higher criticism for estimating proportion of non-null...
Higher criticism for estimating proportion ofnon-null effect in high-dimensional multiple
comparison
Annika Tillanderjoint work with Tatjana Pavlenko
Karolinska Institutet, Department of Medical Epidemiology and Biostatistics
LinStat 26 August 2014
Annika Tillander bHC for estimating proportion of non-null 1 / 39
Outline
◮ Introduction◮ Covariance structure approximation◮ Separation strength◮ Detection boundary◮ Estimating proportion of non-null effect◮ Higher criticism◮ Application
Annika Tillander bHC for estimating proportion of non-null 2 / 39
Classification
We have n observations where each observationx = (x1, . . . , xp) corresponds to p number of features.Supervised classification problem with C classes, class labelyj = c where j = 1, . . . , n, c ∈ {1, . . . , C}
T = {(x1, y1), (x2, y2), . . . , (xn, yn)} .Using the training data a prediction model is built which enablesprediction of new observations where the outcome is unknown.
Annika Tillander bHC for estimating proportion of non-null 3 / 39
Linear Discriminant Analysis (LDA)
Assume that the outcome in each class is modeled by theGaussian distribution, i.e. xc ∼ N(µc ,Σc), where µc is theclass mean and Σc is the class-wise covariance matrix.
Dc(x) = x′Σ−1c µc − 1
2µ′cΣ
−1c µc + logπc
πc is the prior probability of class c andC∑
c=1πc = 1.
c∗ = argmaxc=1,...,CDc(x)
Annika Tillander bHC for estimating proportion of non-null 4 / 39
Two classes
Assign x to class 1 iflog π1
π2+(
x − 12(µ1 + µ2)
)′Σ−1(µ1 − µ2) ≥ 0
Annika Tillander bHC for estimating proportion of non-null 5 / 39
High dimensionality
It is the relation between the number of observation (n) and thenumber of features (p) that decides the dimensionality of thedata.In the case with more features than available observations, theproblem is said to be ”high-dimensional” (p > n).
Standard asymptoticp is fixed and n → ∞
Asymptotic when p is not fixedp = pn grows with n, pn ≫ n for n → ∞
Annika Tillander bHC for estimating proportion of non-null 6 / 39
gLasso
Using the correlation matrix K−1 instead of covariance allowsfor faster convergence in high-dimensional setting Rothman et al.
(2008).
Let Γ denote the diagonal matrix of true standarddeviationsΣ−1 = Γ−1KΓ−1
Sparse inverse covariance estimation with the graphicallasso Friedman et al. (2009)
K̂λ = arg minK≻0
{
trace(
KK̂−1)
− log detK + λ∥
∥K−1∥
∥
1
}
Annika Tillander bHC for estimating proportion of non-null 7 / 39
Cuthill-McKee ordering
Reducing the bandwidth of sparse symmetric matrices Cuthill &
McKee (1969)
Let S be a p × p symmetric matrix where i denote rowsand j denote columns.The bandwidth of S is the maximum value of |i − j | for thenon-zero elements
Determine a permutation matrix P such that non-zeroelements will cluster about the main diagonalSC = PSPT
Annika Tillander bHC for estimating proportion of non-null 8 / 39
SC= PSPT
S=
SC=
Annika Tillander bHC for estimating proportion of non-null 9 / 39
True block-structure
1 0 0 0 1 0 1 0 0 1
0 1 0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0 1 0
0 1 0 1 0 0 0 1 0 0
1 0 0 0 1 0 1 0 0 1
0 0 1 0 0 1 0 0 1 0
1 0 0 0 1 0 1 0 0 1
0 1 0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0 1 0
1 0 0 0 1 0 1 0 0 1
S=
1 1 1 1 0 0 0 0 0 01 1 1 1 0 0 0 0 0 01 1 1 1 0 0 0 0 0 01 1 1 1 0 0 0 0 0 00 0 0 0 1 1 1 0 0 00 0 0 0 1 1 1 0 0 00 0 0 0 1 1 1 0 0 00 0 0 0 0 0 0 1 1 10 0 0 0 0 0 0 1 1 10 0 0 0 0 0 0 1 1 1
SC=
Annika Tillander bHC for estimating proportion of non-null 10 / 39
Algorithm
Combining gLasso and Cuthill-McKee orderingBootstrap sample, calculate K̂−1
j
Estimate K̂j [λi ] with gLassoSij = 1K̂j [λi ]>0
S̃ik = 1 r∑
j=1Sij
r >qk
Find permutation matrix Pik for skeleton S̃ik with Cuthill-McKeeordering algorithm
Annika Tillander bHC for estimating proportion of non-null 11 / 39
Identifying block-structure
(a)
(b)
λλ == 0.63, limit == 0.99
M1
(c)
λλ == 0.66,limit == 0.99
M2
λλ == 0.66,limit == 0.99
M3
λλ == 0.65,limit == 0.99
M4
Annika Tillander bHC for estimating proportion of non-null 12 / 39
Additive classifier
Block diagonal segmentationΣ−1 = diag
[
Σ−11 , . . . ,Σ−1
b
]
where b is the number of blocks. Both the class means µc andthe observed vector x can be partioned into b disjoint subsetsµc,i = (µc,i1 , . . . , µc,ipi
) and xi = (xi1 , . . . , xipi), pi is the block
size, i = 1, . . . , b, such that for any i 6= j , xi and xj areconditionally independent given the class variable y .
Two-class linear function with additive structure
D(x) =b∑
i=1
(
xi − 12(µ1,i + µ2,i)
)′Σ−1
i (µ1,i − µ2,i)
Annika Tillander bHC for estimating proportion of non-null 13 / 39
Mahalanobis distance
Let π1 = π2 = 1/2 then the optimal misclassificationprobability can be expressed asε = Φ
(
−12
√δ2)
where Φ(·) is the Gaussian cumulative distribution function and
δ2 =∥
∥Σ−1/2µ
∥
∥
2is the Mahalanobis shift vector norm, where
µ = µ1 − µ2 is a shift vector and ‖·‖ denotes the ℓ2 norm.
Annika Tillander bHC for estimating proportion of non-null 14 / 39
Separation strength
The i th block separation strength
δ2i =
∥
∥
∥Σ−1/2i µi
∥
∥
∥
2
Rescaled estimate of the i th separation strengthS2
i = ηµ̂′i Σ̂
−1i µ̂i
where η = n1n2n , µ̂i = µ̂1i − µ̂2i is the shift vector of the sample
class means and Σ̂i is the maximum likelihood estimate of thecovariance matrix of the i th block. S2 ∼ χ2(p0, ω
2) where p0
degrees of freedom and ω2 = ηδ2 the non-centrality parameter.
Annika Tillander bHC for estimating proportion of non-null 15 / 39
Misclassifiation
0 10 20 30 40 50
0.0
0.1
0.2
0.3
0.4
0.5
Data from West et.al
Number of variables
Mis
cla
ssific
atio
n L
DA
0 50 100 150
0.2
00
.25
0.3
00
.35
0.4
00
.45
0.5
0
Data from Pawitan et.al
Number of variables
Mis
cla
ssific
atio
n L
DA
1
1Data from West et al. (2001) and Pawitan et al. (2005)Annika Tillander bHC for estimating proportion of non-null 16 / 39
Sparse and weak setting
Traditional mixture
Fre
quen
cy
−2 0 2 4 6
050
100
150
200
Non informativeInformative
Sparse and weak mixture
Fre
quen
cy−4 −2 0 2 4
050
010
0015
00
Non informativeInformative
Annika Tillander bHC for estimating proportion of non-null 17 / 39
Detection boundary
β: sparsity parameter, proportion of non-null effectω2: weakness parameter, effect strength
Asymptotic sparse and weak modelβ = b−γ where γ ∈ (0, 1)ω2 = 2r log b where r ∈ (0, 1)
Annika Tillander bHC for estimating proportion of non-null 18 / 39
For p0 = 1 and p = b
H0 : Si ∼ N(0,1) i.i.d 1 ≤ i ≤ pH1 : Si ∼ (1 − β)N(0,1) + βN(
√ω2,1) i.i.d 1 ≤ i ≤ p
Ingster (1999), Donoho & Jin (2008), Klaus & Strimmer (2013)
0.5 0.6 0.7 0.8 0.9 1.0
0.0
0.2
0.4
0.6
0.8
1.0
γγ
r
undetectable
detectable
estimable
Annika Tillander bHC for estimating proportion of non-null 19 / 39
Detection boundary for blocks
H0 : S2i ∼ χ2
p0(·; 0) i.i.d, 1 ≤ i ≤ b
H1 : S2i ∼ (1 − β)χ2
p0(·; 0) + βχ2
p0
(
·;ω2)
i.i.d, 1 ≤ i ≤ b
LRi =fH1
(s2i )
fH0(s2
i )=
(1−β)χ2p0(s2
i ;0)+βχ2p0(s2
i ;ω2)
χ2p0(s2
i ;0)
LRb = LRb(s21, s
22, . . . , s
2b;β, ω
2)
Reject H0 ifflog(LRb) > 0
Annika Tillander bHC for estimating proportion of non-null 20 / 39
Detection boundary for blocks
Annika Tillander bHC for estimating proportion of non-null 21 / 39
Estimating proportion of non-null effect
Let πi denote the significance level (”p-value”)
Under the global null hypothesisπi ∼ U(0, 1)
where U denotes the uniform distribution.
β denote the proportion false null hypotheses
β =
b∑
i=11{ωi 6=0}
b
Annika Tillander bHC for estimating proportion of non-null 22 / 39
Estimating proportion of non-null effect
Rank the p-values in increasing order π(1) ≤ π(2) ≤ . . . ≤ π(b)
Storey & Tibshirani (2003)
β̂ = 1 −b∑
1=i1{π(i)>t}(1−t)b
Meinshausen & Rice (2006)
β̂ = maxt∈(0,1)
Fb(t)−t−Bb,α∆(t)1−t
where Fb(t) =
b∑
i=11{πi≤t}
b is the empirical distribution of p-values,∆(t) =
√
t(1 − t) the standard deviation-proportional bounding function and
Bb,α the bounding sequence for ∆(t) at level α given by Bb,α = G−1(1−α)+mbnb
where G is the Gumbel distribution, mb = 2 log2 b + 12 log3 b −
12 log 4π
Annika Tillander bHC for estimating proportion of non-null 23 / 39
Estimating proportion of non-null effect
Annika Tillander bHC for estimating proportion of non-null 24 / 39
Estimating proportion of non-null effect
Annika Tillander bHC for estimating proportion of non-null 25 / 39
Estimating proportion of non-null effect
Annika Tillander bHC for estimating proportion of non-null 26 / 39
Estimating proportion of non-null effect
Annika Tillander bHC for estimating proportion of non-null 27 / 39
Higher criticismDonoho & Jin (2004, 2008, 2009)
HCi,π(i)=
√p
i/p−π(i)√i/p(1−i/p)
i = 1, . . . , p and for fixed α0 ∈ (0, 1) the HC test statistic is
HC∗ = max1≤i≤(α0×p)
HCi,π(i)
Block higher criticism
bHCi,π(i)=
√b
i/b−π(i)√i/b(1−i/b)
i = 1, . . . , b and for fixed α0 ∈ (0, 1) the bHC test statistic is
bHC∗ = max1≤i≤(α0×b)
bHCi,π(i)
Annika Tillander bHC for estimating proportion of non-null 28 / 39
False Discovery Rates
Benjamini & Hochberg (1995)Fdr(πi) = P (Non informative|Π ≤ πi) =
(1−β)πiF (πi )
≤ pi π(i)
Efron et al. (2001), Efron (2004)Lfdr(πi) = P {Non-informative|πi} =
(1−β)fH0(πi )
f (πi )
Sun & Cai (2007)
Ofdr(s2i ) =
(1−β)χ2p0(s2
i ;0)(1−β)χ2
p0(s2i ;0)+βχ2
p0(s2i ;ω
2)
k = max1≤i≤k
{
i ; 1i
i∑
j=1Ofdr(j) ≤ α
}
Annika Tillander bHC for estimating proportion of non-null 29 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 30 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 31 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 32 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 33 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 34 / 39
Simulated data
Annika Tillander bHC for estimating proportion of non-null 35 / 39
Simulated data
bHC Fdrb̃ fpr mc b̃ fpr mc
β ǫ m sd m sd m sd m sd m sd m sd0.010 0.005 36 41.1 0.03 0.04 0.02 0.02 11 5.9 0.00 0.00 0.06 0.080.020 0.005 42 64.5 0.03 0.06 0.04 0.03 12 7.6 0.00 0.00 0.09 0.090.030 0.005 94 106.6 0.08 0.10 0.04 0.05 7 7.7 0.00 0.00 0.17 0.110.010 0.010 39 42.1 0.03 0.04 0.03 0.03 9 6.1 0.00 0.00 0.10 0.100.020 0.010 62 68.8 0.05 0.07 0.04 0.05 7 7.2 0.00 0.00 0.15 0.110.030 0.010 99 103.7 0.09 0.10 0.05 0.08 3 4.0 0.00 0.00 0.26 0.100.010 0.050 53 55.0 0.05 0.05 0.06 0.07 2 2.1 0.00 0.00 0.26 0.080.020 0.050 30 58.2 0.03 0.06 0.14 0.11 1 0.4 0.00 0.00 0.33 0.050.030 0.050 21 24.5 0.02 0.02 0.16 0.13 1 0.4 0.00 0.00 0.34 0.040.010 0.100 21 21.2 0.02 0.02 0.14 0.12 1 0.6 0.00 0.00 0.32 0.050.020 0.100 14 20.5 0.01 0.02 0.20 0.13 1 0.2 0.00 0.00 0.34 0.030.030 0.100 19 34.4 0.02 0.03 0.17 0.12 1 0.3 0.00 0.00 0.33 0.050.010 0.150 23 32.1 0.02 0.03 0.18 0.13 1 0.2 0.00 0.00 0.34 0.040.020 0.150 12 16.1 0.01 0.02 0.22 0.12 1 0.0 0.00 0.00 0.34 0.040.030 0.150 17 31.1 0.02 0.03 0.18 0.12 1 0.0 0.00 0.00 0.34 0.03
Table: Number of blocks selected as informative(
b̃)
, proportion of
falsely selected blocks (fpr ) and the misclassification rate (mc)averaged over 100 runs, presented as mean (m) and standarddeviation (sd) for block size p0 = 20.
Annika Tillander bHC for estimating proportion of non-null 36 / 39
Real dataBlock No.selected blocks Misclassification ratesize bHC Fdr Lfdr bHC Fdr Lfdr All
Breast cancer data I1 657 999 583 0.24 0.23 0.24 -2 328 1461 804 0.23 0.23 0.22 0.285 131 1219 929 0.22 0.27 0.25 0.26
10 65 657 601 0.20 0.26 0.25 0.2815 43 438 393 0.21 0.28 0.26 0.2820 32 328 308 0.19 0.28 0.28 0.2630 21 219 209 0.14 0.26 0.25 0.3040 16 164 156 0.17 0.26 0.25 0.2650 13 131 121 0.15 0.25 0.26 0.25
Prostate cancer data1 126 808 442 0.10 0.20 0.13 -2 630 5547 4082 0.14 0.38 0.35 0.365 252 2520 2427 0.09 0.26 0.28 0.25
10 126 1260 1254 0.08 0.19 0.19 0.1715 84 840 838 0.07 0.14 0.13 0.1520 63 630 620 0.06 0.13 0.12 0.1330 42 420 411 0.05 0.11 0.13 0.1040 31 315 309 0.05 0.11 0.09 0.1250 25 252 247 0.04 0.12 0.09 0.13
Breast cancer data II1 712 165 61 0.02 0.08 0.04 -2 712 1254 696 0.06 0.06 0.04 0.105 285 1313 759 0.04 0.08 0.06 0.14
10 142 712 705 0.04 0.12 0.10 0.0815 95 475 443 0.06 0.10 0.14 0.1620 71 356 351 0.02 0.16 0.10 0.1030 1 237 235 0.10 0.08 0.08 0.1040 1 178 17650
Annika Tillander bHC for estimating proportion of non-null 37 / 39
Real dataBlock No.inf.blocks Misclassification ratesize bHC Fdr Lfdr bHC Fdr Lfdr All
Breast cancer data I1 328 550 315 0.22 0.21 0.23 -2 164 614 342 0.22 0.22 0.21 0.285 65 565 384 0.21 0.26 0.23 0.26
10 32 328 286 0.21 0.30 0.28 0.2715 21 219 188 0.20 0.27 0.28 0.2620 16 164 152 0.16 0.28 0.27 0.2830 10 109 105 0.18 0.24 0.28 0.2640 8 82 77 0.18 0.25 0.27 0.2650 6 65 60 0.20 0.27 0.24 0.26
Prostate cancer data1 315 89 43 0.36 0.34 0.28 -2 157 150 54 0.27 0.30 0.23 0.385 63 260 99 0.16 0.25 0.20 0.31
10 31 218 101 0.09 0.19 0.14 0.2515 21 188 127 0.06 0.21 0.16 0.2320 15 155 134 0.11 0.18 0.16 0.1930 10 105 97 0.11 0.14 0.10 0.1540 7 78 72 0.09 0.10 0.12 0.1350 6 63 60 0.15 0.14 0.14 0.16
Breast cancer data II1 356 230 133 0.22 0.20 0.22 -2 307 217 97 0.02 0.02 0.02 0.275 142 461 289 0.02 0.06 0.02 0.14
10 71 356 271 0.02 0.12 0.06 0.1415 47 237 225 0.04 0.12 0.16 0.1220 35 178 174 0.02 0.12 0.12 0.1830 1 118 115 0.12 0.10 0.08 0.1040 1 89 87 0.18 0.16 0.12 0.1650
Annika Tillander bHC for estimating proportion of non-null 38 / 39
Thank You
Annika Tillander bHC for estimating proportion of non-null 39 / 39
Benjamini, Y. & Hochberg, Y. (1995), ‘Controlling the false discovery rate: A practicaland powerful approach to multiple testing’, Journal of the Royal Statistical Society.Series B (Methodological) 57(1), pp. 289–300.URL: http://www.jstor.org/stable/2346101
Cuthill, E. & McKee, J. (1969), Reducing the bandwidth of sparse symmetric matrices,in ‘Proceedings of the 1969 24th national conference’, ACM ’69, ACM, New York,NY, USA, pp. 157–172.URL: http://doi.acm.org/10.1145/800195.805928
Donoho, D. & Jin, J. (2004), ‘Higher criticism for detecting sparse heterogeneousmixtures’, Ann. Statist pp. 962–994.
Donoho, D. & Jin, J. (2008), ‘Higher criticism thresholding: Optimal feature selectionwhen useful features are rare and weak’, Proceedings of the National Academy ofSciences 105(39), 14790–14795.URL: http://www.pnas.org/content/105/39/14790.abstract
Donoho, D. & Jin, J. (2009), ‘Feature selection by higher criticism thresholding achievesthe optimal phase diagram’, Philosophical Transactions of the Royal Society A:Mathematical, Physical and Engineering Sciences 367(1906), 4449–4470.URL: http://rsta.royalsocietypublishing.org/content/367/1906/4449.abstract
Efron, B. (2004), Local false discovery rates, Technical report, Department of Statistics,Stanford University.
Efron, B., Storey, J. D. & Tibshirani, R. (2001), ‘Microarrays, empirical bayes methods,and false discovery rates’, Genet. Epidemiol 23, 70–86.
Friedman, J., Hastie, T. & Tibshirani, R. (2009), Graphical lasso- estimation ofGaussian graphical models. Manual to the R-package glasso.
Ingster, Y. I. (1999), ‘Minimax detection of a signal for lpn balls’, Mathematical Methodsof Statistics 7, 401–428.
Annika Tillander bHC for estimating proportion of non-null 39 / 39
Klaus, B. & Strimmer, K. (2013), ‘Signal identification for rare and weak features: highercriticism or false discovery rates?’, Biostatistics 14, 129.
Meinshausen, N. & Rice, J. (2006), ‘Estimating the proportion of false null hypothesesamong a large number of independently tested hypotheses’, Annals of Statistics34(1), 373–393.
Pawitan, Y., Bjohle, J., Amler, L., Borg, A., Egyhazi, S., Hall, P., Han, X., Holmberg, L.,Huang, F., Klaar, S., Liu, E. T., Miller, L., Nordgren, H., Ploner, A., Sandelin, K.,Shaw, P. M., Smeds, J., Skoog, L., Wedren, S. & Bergh, J. (2005), ‘Gene expressionprofiling spares early breast cancer patients from adjuvant therapy: derived andvalidated in two population-based cohorts’, Breast Cancer Research 7(6), 953–964.
Rothman, A., Bickel, P., Levina, E. & Zhu, J. (2008), ‘Sparse permutation invariantcovariance estimation’, Electronic Journal of Statistics 2, 494–515.
Storey, J. D. & Tibshirani, R. (2003), ‘Statistical significance for genomewide studies’,Proceedings of the National Academy of Sciences 100(16), 9440–9445.URL: http://www.pnas.org/content/100/16/9440.abstract
Sun, W. & Cai, T. T. (2007), ‘Oracle and adaptive compound decision rules for falsediscovery rate control’, Journal of the American Statistical Association102(479), 901–912.URL: http://amstat.tandfonline.com/doi/abs/10.1198/016214507000000545
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H.,Olson, J. A., Marks, J. R. & Nevins, J. R. (2001), ‘Predicting the clinical status ofhuman breast cancer by using gene expression profiles’, PNAS98(20), 11462–11467.
Annika Tillander bHC for estimating proportion of non-null 39 / 39