Testing a single regression coefficient in high ... · Testing a single regression coefficient in...
Transcript of Testing a single regression coefficient in high ... · Testing a single regression coefficient in...
Testing a single regression coefficient in high
dimensional linear models
Wei Lan
Ping-Shou Zhong Runze Li
Hansheng Wang Chih-Ling Tsai
Technical Report Series #13-125
College of Health and Human Development The Pennsylvania State University
KEYWORDS: False discovery rate; key confounder controlling; high dimensional data; single coefficient test
Address correspondence to Wei Lan, Statistics School and Center of Statistical Research, Southwestern University of Finance and Economics, Chengdu, P.R. China. [email protected].
Testing a Single Regression Coefficient in HighDimensional Linear Models
Wei Lan, Ping-Shou Zhong, Runze Li, Hansheng Wang and Chih-Ling Tsai
November 30, 2013
Abstract
In linear regression models with high dimensional data, the classical z-test (or t-test)for testing the significance of each single regression coefficient is no longer applicable.This is mainly because the number of covariates exceeds the sample size. In thispaper, we propose a simple and novel alternative by introducing the Key conFounderControlling (KFC) method to control for predictors that are highly correlated withthe target covariate. Accordingly, the classical ordinary least squares approach can beemployed to obtain the parameter estimator. In addition, we demonstrate that theresulting estimator is consistent and asymptotically normal. This enables us to applythe z-test to assess the significance of each covariate. Based on the p-value obtainedfrom testing the significance of each covariate, we further conduct multiple hypothesistesting by controlling the false discovery rate at the nominal level. Then, we show thatthe multiple hypothesis testing achieves consistent model selection. Simulation studiesand empirical examples are presented to illustrate the finite sample performance andthe usefulness of the proposed method, respectively.
KEY WORDS: False Discovery Rate; Key Confounder Controlling; High DimensionalData; Single Coefficient Test
∗Wei Lan is Assistant Professor, Statistics School and Center of Statistical Research, Southwestern Uni-versity of Finance and Economics, Chengdu, P.R. China. Email: [email protected]. Ping-Shou Zhongis Assistant Professor, Department of Statistics and Probability, Michigan State University, East Lansing,MI 48823. Email: [email protected]. His research was supported by NSF grant DMS-1309156. RunzeLi is Distinguished Professor, Department of Statistics and The Methodology Center, The PennsylvaniaState University, University Park, PA 16802-2111. Email: [email protected]. His research was supported byNational Institute on Drug Abuse (NIDA) grant P50-DA10075, and National Cancer Institute (NCI) grantR01 CA168676. The content is solely the responsibility of the authors and does not necessarily representthe official views of the NIH or NIDA. Hansheng Wang is Professor, Department of Business Statistics andEconometrics, Guanghua School of Management, Peking University, Beijing, 100871, P.R. China. Email:[email protected]. His research was supported in part by National Natural Science Foundation ofChina (NSFC, 11131002, 11271032), Fox Ying Tong Education Foundation, the Business Intelligence Re-search Center at Peking University, and the Center for Statistical Science at Peking University. Chih-LingTsai is Robert W. Glock Chair professor, Graduate School of Management, University of California, Davis,CA 95616-8609 E-mail: [email protected]
1
1. INTRODUCTION
In linear regression models, it is a common practice to employ the z-test (or t-test)
to assess whether an individual predictor (or covariate) is significant when the number of
covariates (p) is smaller than the sample size (n). This test has been widely applied across
various fields and is available in most statistical software. One usually applies the ordinary
least squares (OLS) approach to estimate regression coefficients and standard errors for
constructing a z-test (or t-test); see, for example, Draper and Smith (1998). However, the
z-test (or t-test) is no longer applicable in a high dimensional linear model since p exceeds n.
This motivates us to modify the classical z-test (or t-test) to accommodate high dimensional
data.
In high dimensional regression analysis, hypothesis testing has attracted considerable at-
tention (Goeman et al., 2006, 2011; Zhong and Chen, 2011). Since these papers mainly focus
on testing a large set of coefficients against a high dimensional alternative, their approaches
are not applicable for testing the significance of a single coefficient. Hence, Buhlmann (2013)
recently applied the ridge estimation approach and obtained a test statistic to examine the
significance of an individual coefficient. His proposed test involves a bias correction, which
is different from the classical z-test (or t-test) via the OLS approach. In the meantime,
Zhang and Zhang (2013) proposed a low dimensional projection procedure to construct the
confidence intervals for a linear combination of a small subset of regression coefficients. The
key assumption behind their procedure is the existence of good initial estimators for the
unknown regression coefficients and the unknown standard deviation of random errors. To
this end, the penalty function with a tuning parameter is required to implement Zhang and
Zhang’s (2013) procedure. Later, van de Geer et al. (2013) extended the results of Zhang
and Zhang (2013) to broad models and general loss functions. In addition, they showed that
their estimator is optimal and pointed out that the ridge estimator of Buhlmann (2013) is
not optimal.
Instead of the ridge estimation and low dimensional projection, Fan and Lv (2008) used
1
the correlation approach to screen out those covariates that have weak correlations with the
response variable. As a result, the total number of predictors that are highly correlated
with the response variable is smaller than the sample size. Motivated by their approach,
we introduce the Key conFounder Controlling (KFC) method to control for predictors that
are highly correlated with the target covariate before a hypothesis test is conducted. If
the total number of key confounders resulting from the KFC procedure is smaller than
the sample size, their effects can be profiled out from both the response and the target
predictor via projections. Based on the profiled response and the profiled predictor, we are
able to employ a classical simple regression model to obtain the OLS estimate of the target
regression coefficient. We then demonstrate that the resulting estimator is√
n-consistent
and asymptotically normal. Accordingly, a z-test statistic can be constructed for testing
the target coefficient. Under some mild conditions, we show that the p-values obtained by
the asymptotic normal distribution satisfy the weak dependence assumption of Storey et al.
(2004). As a result, the multiple hypothesis testing procedure of Storey et al. (2004) can be
directly applied to control the false discovery rate (FDR). Finally, we demonstrate that the
proposed multiple testing procedure achieves model selection consistency.
The rest of the article is organized as follows. Section 2 introduces model notation and
proposes the KFC method. The theoretical properties of hypothesis tests via the KFC as well
as the FDR procedures are obtained. Section 3 presents simulation studies, while Section 4
provides real data analyses. Some concluding remarks are given in Section 5. All technical
details are relegated to the Appendix.
2. THE METHODOLOGY
2.1. The KFC Method
Let (Yi, Xi) be a random vector collected from the i-th subject ( 1 ≤ i ≤ n), where Yi ∈ R1
is the response variable and Xi = (Xi1, · · · , Xip)> ∈ Rp is the associated p-dimensional
predictor vector with E(Xi) = 0 and cov(Xi) = Σ = (σj1j2) ∈ Rp×p. Unless explicitly stated
2
otherwise, we hereafter assume that p À n and n tends to infinity for asymptotic behavior.
In addition, the predictors have been standardized such that E(Xij) = 0 and var(Xij) = 1,
which yields σjj = 1 for every 1 ≤ j ≤ p. Then, consider the linear regression model,
Yi = X>i β + εi, (2.1)
where β = (β1, · · · , βp)> ∈ Rp is an unknown regression coefficient vector, εi is i.i.d random
noise with mean 0 and finite variance σ2, which is independent of Xi. To assess the signifi-
cance of a single coefficient, we test the null hypothesis H0 : βj = 0 for any given j. Without
loss of generality, we focus on testing the first regression coefficient. That is,
H0 : β1 = 0 vs. H1 : β1 6= 0, (2.2)
and the same testing procedure is applicable to the rest of the individual regression coeffi-
cients.
For the sake of convenience, let Y = (Y1, · · · , Yn)> ∈ Rn be a vector of responses,
X = (X1, · · · , Xn)> ∈ Rn×p be the design matrix with the j-th column Xj ∈ Rn, and
E = (ε1, · · · , εn)> ∈ Rn. In addition, let I be an arbitrary index set with cardinality |I|.Then, define XiI = (Xij : j ∈ I)> ∈ R|I|, XI = (X1I , · · · , XnI)> = (Xj : j ∈ I) ∈ Rn×|I|,
ΣI = (σj1j2 : j1 ∈ I, j2 ∈ I) ∈ R|I|×|I|, and ΣIj = Σ>jI = (σj1j2 : j1 ∈ I, j2 = j) ∈ R|I|.
Moreover, define ΣIaIb= (σj1j2 : j1 ∈ Ia, j2 ∈ Ib) ∈ R|Ia|×|Ib| for any two arbitrary index sets
Ia and Ib, which implies ΣII = ΣI .
Before constructing the test statistic, we first control those predictors that are highly
correlated with Xi1 and could generate a confounding effect. To this end, define a predictor
Xij with j 6= 1 to be a key confounder of Xi1, if |σ1j| > c for some pre-specified threshold
value c. Then, denote S = {j : |σ1j| > c} to be a set of all key confounders associated with
Xi1. The choice of c (i.e., the KFC set S) will be discussed in Remark 1. To remove the
confounding effect due to XiS , we construct the profiled response and predictor as Y = QSY
and X1 = QSX1, respectively, where QS = In − XS(X>SXS)−1X>S ∈ Rn×n and In ∈ Rn×n is
an n by n identity matrix. We next follow the OLS approach and obtain the estimate of the
3
target coefficient β1,
β1 = (X>1 X1)−1(X>1 Y) = (X>1 QSX1)
−1(X>1 QSY).
We refer to the above procedure as the Key conFounder Controlling (KFC) method, β1 as
the KFC estimator of β1, and S as the KFC set of Xi1.
2.2. Asymptotic Normality of the KFC Estimator and Test Statistic
To study asymptotic properties of the KFC estimator β1, we define %j1j2(S) = σj1j2 −Σ>Sj1
Σ−1S ΣSj2 ∈ R1, which measures the partial covariance of Xij1 and Xij2 , after controlling
for the effect of XiS = (Xij : j ∈ S)> ∈ R|S|. The following regularity conditions are assumed
throughout this paper.
(C1) Gaussian Condition. Assume that the Xis are independent and normally distributed
with mean 0 and covariance matrix Σ.
(C2) Identification Condition. Let λmin(A) and λmax(A) stand for, respectively, the minimum
and maximum eigenvalues of any arbitrary semi-positive definite matrix A. We then
require that 0 < λmin < λmin(Σ) ≤ λmax(Σ) < λmax < ∞ for some constants λmin and
λmax.
(C3) Predictor Dimension. There exist two positive constants ~ < 1 and ν > 0 such that
log p ≤ νn~ for every n > 0.
(C4) Regression Coefficients. Assume |β| = ∑pj=1 |βj| < Cβ for some constant Cβ > 0.
(C5) Partial Covariance. There exists a constant ξ > 3/2 such that maxj 6∈S |%1j(S)| =
O(|S|−ξ) as |S| → ∞.
(C6) Dimension of the KFC Set. There exists a KFC set |S| and two positive constants Ca
and Cb such that Canν1 ≤ |S| ≤ Cbn
ν2 , where ν1 and ν2 are two positive constants
with 1/(2ξ) < ν1 ≤ ν2 < 1/3 and ~+ 3ν2 < 1.
4
Condition (C1) is a common condition used for high dimensional data to simplify theoretical
proofs; see for example, Wang (2009) and Zhang and Zhang (2013). This condition can be
relaxed to the sub-Gaussian random variables (Wang, 2012; Li et al., 2012) and our theoret-
ical results still hold. Condition (C2) assures the asymptotic identifiability. Otherwise, the
regression model (2.1) becomes asymptotically non-identifiable (Fan and Lv, 2008; Wang,
2009, 2012). By Condition (C3), the predictor dimension could diverge exponentially with
the sample size, so that the predictor dimension can be much larger than the sample size.
Condition (C4) is satisfied when βj is sparse with only a finite number of nonzero coef-
ficients. Condition (C5) requires that the partial covariance between the target covariate
and the other predictor not belonging to S, after controlling for the effect of XiS (i.e., key
confounders), converges towards 0 at a fast speed, as |S| → ∞. This condition is satisfied
for many typical covariance structures, e.g., diagonal, compound symmetric, and autore-
gressive structures. It is worth noting that, according to (C1), the conditional distribution
of Xi1 given Xi,−1 = (Xi2, · · · , Xip)> ∈ Rp−1 remains normal. As a result, there exists a
coefficient vector θ(1) = (θ(1),1, · · · , θ(1),p−1)T ∈ Rp−1 such that Xi1 = X>
i,−1θ(1) + ei1, where
ei1 is a random error that is independent of Xi,−1. Furthermore, under some sparsity and
regularity conditions given by Fan and Lv (2008), S ⊃ Sθ = {j : θ(1),j 6= 0}. This implies
that maxj 6∈S |%1j(S)| = 0. Hence, Condition (C5) is satisfied. Furthermore, this condition is
closely related to the assumption given in Theorem 5 of Zhang and Zhang (2013). Lastly,
Condition (C6) assures that the theoretical results in the theorem given below are valid even
though the size of the KFC set is much smaller than the sample size under Condition (C5).
It is a sufficient condition used to guarantee that maxj 6∈S |%1j(S)| is of order o(n−1/2), so that
the bias of β1 vanishes. Note that the size of the KFC set in Condition (C6) depends on the
rate of maxj 6∈S |%1j(S)| → 0. Thus, Condition (C6) can be dropped if θ(1) has finite non-zero
elements or Σ follows an autoregressive structure so that maxj 6∈S |%1j(S)| = O{
exp(−ζ|S|η)}
for some positive constants ζ and η. Under the above conditions, we obtain the following
result.
Theorem 1. Assume that Conditions (C1)-(C6) hold. We then have n1/2(β1 − β1) →d
5
N(0, σ2β1
), where σ2β1
= τ 2β1
(σ11 −Σ1SΣ−1S ΣS1)
−1, τ 2β1
= β>S∗(ΣS∗ −ΣS∗S+Σ−1S+ΣS+S∗)βS∗ + σ2,
S+ = {1}⋃S, and S∗ = {j : j 6∈ S+}.
Using the results of two lemmas in Appendix A, we are able to prove the above theorem; see
the detailed proofs in Appendix B. By Theorem 1, we construct the test statistic,
Z1 = n1/2β1/σβ1 , (2.3)
where σ2β1
= n−1τ 2β1
(X>1 QSX1
)−1, τ 2
β1= (n − |S+|)−1E>S+ ES+ , ES+ is the residual vector
obtained by regressing Yi on XiS+ , and XiS+ = (Xij : j ∈ S+)>. Applying similar techniques
to those used in the proof of Theorem 1, we can verify that σ2β1
is the consistent estimator of
σ2β1
. As a result, Z1 is asymptotically standard normal under H0, and one can reject the null
hypothesis if |Z1| > z1−α/2, where zα stands for the α-th quantile of the standard normal
distribution. Note that if p < n and S = {j : j 6= 1}, the test statistic Z1 is the same as the
classical z-test statistic.
Remark 1: To make the testing procedure practically useful, one needs to select the KFC
set S. According to Condition (C1), the random vector Xi follows a multivariate normal
distribution. As a result, maxj 6∈S+ |%1j(S)| = 0 if and only if Xi1 and Xij are conditionally
independent given XiS . This indicates that finding the KFC set is equivalent to identifying
nodes (variables) that have edges connected to the variable Xi1 in a Gaussian graph (see
Kalisch and Buhlmann (2007)). Hence, we adopt their approach to find the edges associated
with Xi1. To this end, let %1j(S) = X>1 X1/n be the sample counterpart of %1j(S) and
define F1j(S) = 2−1 log[{1 + %1j(S)}/{1 − %1j(S)}], which is in the spirit of Fisher’s Z-
transformation. Then, we select the smallest size of S (i.e., the largest threshold value
c) such that (n − |S| − 3)1/2 maxj 6∈S+ |F1j(S)| < z1−γ/2, for every j 6∈ S+, where γ is a
pre-specified significance level.
Remark 2. We use a single screening approach to obtain the KFC set of the target covariate,
which yields the KFC estimator of the target regression coefficient. On the other hand, Zhang
and Zhang (2013) employed the scaled lasso procedure (Sun and Zhang, 2012) to obtain the
6
initial estimators of all regression coefficients and the scale parameter estimator. Then, they
apply the classical lasso procedure to find the low dimensional projection vector. In sum,
Zhang and Zhang (2013) applied the lasso approach to find the low dimensional projection
estimator (LDPE) for the target regression coefficient. When X has orthogonal columns and
p < n, both approaches lead to the same parameter estimator as that obtained from the
marginal univariate regression (MUR). However, these two approaches are quite different,
and it seems nearly impossible to find the exact relationship between the KFC estimator
and LDPE when the columns of X are not orthogonal.
2.3. Controlling the False Discovery Rate (FDR)
In identifying significant coefficients among the high dimensional regression coefficients
βj (j = 1, · · · , p), a multiple testing procedure can be considered by testing H0j : βj = 0
simultaneously. Denote the p-value obtained by testing each individual null hypothesis, H0j,
as pj = 2{1 − Φ(|Zj|)
}, where Zj is the test statistic and can be constructed similarly to
that in equation (2.3). To guard against false discoveries, we next develop a procedure to
control the false discovery rate (Benjamini and Hochberg, 1995).
Let N0 = {j : βj = 0} be the set of variables whose associated coefficients are truly
zero and N1 = {j : βj 6= 0} be the set of variables whose associated coefficients are truly
nonzero. For any significance level t ∈ [0, 1], let V (t) = #{j ∈ N0 : pj ≤ t} be the number of
falsely rejected hypotheses, S(t) = #{j ∈ N1 : pj ≤ t} be the number of correctly rejected
hypotheses, and R(t) = #{j : pj ≤ t} be the total number of rejected hypotheses. We adopt
the approach of Storey et al. (2004) to implement the multiple testing procedure, which is
less conservative than the method of Benjamini and Hochberg (1995) and is applicable under
a general weak dependence structure (Storey et al., 2004). To this end, define FDP(t) =
V (t)/[R(t) ∨ 1] and FDR(t) = E{V (t)/[R(t) ∨ 1]
}, where R(t) ∨ 1 = max{R(t), 1}. Then,
the estimator proposed by Storey (2002) is
FDRλ(t) =π0(λ)t
{R(t) ∨ 1}/p, (2.4)
7
where π0(λ) ={(1−λ)p
}−1{p−R(λ)
}is an estimate of π0 = p0/p and λ ∈ [0, 1) is a tuning
parameter. Then, for any pre-specified significance level q and a fixed λ, consider the cutoff
point chosen by the thresholding rule, tq(FDRλ
)= sup
{0 ≤ t ≤ 1 : FDRλ(t) ≤ q
}. We
reject the null hypotheses for those p-values that are less than or equal to tq(FDRλ).
To study the theoretical property of FDRλ(t), we begin by introducing some notation
and definitions. Let p0 = |N0| be the number of null hypotheses and p be the total number
of hypotheses. In addition, let T1,n(t) = p−1∑p
j=1 P (pj ≤ t) be the average probability of
all rejected hypotheses. Define Sj to be the KFC set of covariate Xij. We next demonstrate
that tq(FDRλ
)asymptotically provides strong control of FDR at the pre-specified nominal
level q.
Theorem 2. Assume that p0/p → 1 as p goes to infinity, limn→∞ T1,n(t) = T1(t) and,
for any k ∈ N0,∑
j∈N0
∑l∈Sj
σ2lk = o(p/Λ2
0), where T1(t) is a continuous function and
Λ0 = max{
maxj∈N0 |Sj|, |N1|}. Under Conditions (C1)-(C6), we have that
lim supn→∞ FDR{tq(FDRλ)
} ≤ q.
The proof is given in Appendix C. In general, the dependencies among the test statistics Zj
become stronger as the overlap among the KFC sets increases. To control the dependence
of test statistics, they must have weaker dependence among covariates as the size of overlap
increases. Accordingly, the condition∑
j∈N0
∑l∈Sj
σ2lk = o(p/Λ2
0) for any k ∈ N0, in Theorem
2, controls the overall dependence between the covariates in the union of the KFC sets ∪l∈N0Sl
for any fixed k ∈ N0; see Fan et al. (2012) for a similar condition on dependence. In addition,
Λ0 provides an upper bound on the size of the overlap among the KFC sets. Assume that
Σ follows an autoregressive structure such that the |Sj| are small compared with n. This,
together with Condition (C2), implies that∑
j σ2jk ≤ λ2
max for any 1 ≤ k ≤ p. Hence, the
above condition is satisfied.
2.4. Model Selection Consistency
According to Theorem 2, for any given significance level q > 0, the FDR can be con-
trolled asymptotically by setting the threshold at t = tq(FDRλ). This result motivates us
8
to further investigate the model selection consistency by letting q → 0. In fact, the model
selection consistency in high dimensional linear models has been intensively studied in the
variable selection literature. There is a large body of papers discussing the model selection
consistency via the penalized likelihood approach (e.g., Meinshausen and Buhlmann, 2006;
Zhao and Yu, 2007; Huang et al., 2007). However, the use of p-values for model selection
has not received considerable attention. Some exceptions include Bunea et al. (2006) who
considered the variable selection consistency using p-values under the condition p = o(n1/2),
and Meinshausen et al. (2009) who investigated the consistency of a two-step procedure in-
volving screening and then a multiple test procedure. It is worth noting that the p-value
obtained in Meinshausen et al. (2009) is not designed for assessing the significance of single
coefficient. The aim of this section in our paper is to study the model selection consistency
using p-values obtained from the Z-tests in Section 2.2.
For any given nominal levels αn, let N αn1 = {j : pj ≤ αn} be an estimate of N1, the
set containing all the variables whose associated coefficients are truly nonzero. Assume
that αn → 0 as n → ∞. By Theorem 2, the probability of obtaining false discovers is
P{|N αn
1 ∩N0| ≥ 1} ≤ αn → 0, which implies that P
{N αn1 ⊂ N1
} → 1. Thus, this procedure
requires a sure screening property P{N1 ⊂ N αn
1
} → 1 to obtain model selection consistency.
Before demonstrating this property, two additional assumptions are given below.
(C7) There exist two positive constants κ and Cκ such that minj∈N1 |βj| > Cκn−κ for κ+~ <
1/2, where ~ is defined in (C3).
(C8) There exists some positive constant Ce such that for any ` > 0 and 1 ≤ j ≤ p,
P(n−1|X>
j E| > `)≤ exp(−Cen`2).
Condition (C7) is a minimum signal assumption, and similar conditions are commonly con-
sidered in variable screening literature (Fan and Lv, 2008; Wang, 2009). We further as-
sume that the random errors εi are independent normally distributed. Using the fact that
n−1‖Xj‖2 → 1, n−1/2X>j E follows a normal distribution with finite variance for j = 1, · · · , p,
9
Condition (C8) is satisfied. The above conditions, together with Conditions (C1) to (C6),
lead to the following result.
Theorem 3. Under Conditions (C1)-(C8), there exists a sequence of significance levels
αn → 0 such that P{N αn
1 = N1
} → 1.
The proof of Theorem 3 is given in Appendix D. According to the proof of Theorem 3, one
can select αn at the level of αn = 2{1−Φ(n)} with ~ < < 1/2− κ. This selection implies
that pαn/ log(p) → 0 as n → ∞, which is similar to the assumption (Cq) in Bunea et al.
(2006). Compared with the penalized likelihood method, the proposed testing procedure is
able to control the false discovery rate and the family-wise error rate for the given αn. This
is important especially in the finite sample case; see Meinshausen et al. (2009) for a detailed
discussion.
3. SIMULATION STUDIES
To demonstrate the finite sample performance of the proposed methods, we consider three
simulation studies with different covariance patterns and distributions among predictors.
Each simulation includes three different sample sizes (n=100, 200, 500) and two different
dimensions of predictors (p=500 and 1,000). All simulation results presented in this section
were based on 1,000 realizations. The nominal level α of the KFC test and the significance
level q of FDR are both set to 5%. Moreover, to determine the KFC set for each predictor,
three different significance levels were considered (α = 0.01, 0.05, and 0.10). Since the results
were similar, we only report the case with the nominal level α = 0.05.
To study the significance of each individual regression coefficient, consider the pro-
posed test statistic Zrj for testing the j-th coefficient in the r-th simulation, where j =
1, · · · , p and r = 1, · · · , 1000. Then, define an indicator measure as Irj = I(|Zrj| >
z1−α/2) and compute the empirical rejection probability (ERP) for the j-th coefficient test,
ERPj = 1000−1∑1000
r=1 Irj. As a result, ERPj is the empirical size under the null hypothesis
10
H0j : βj = 0, while it is the empirical power under the alternative hypothesis. Subse-
quently, define the average empirical size (ES) and the average empirical power (EP) as
ES = |N0|−1∑
j∈N0ERPj and EP = |N1|−1
∑j∈N1
ERPj, respectively. Accordingly, ES and
EP provide overall measures for assessing the performance of the single coefficient test. Based
on the p-values of the Zrj tests, we next employ the multiple testing procedure of Storey et
al. (2004) to study the performance of multiple tests via the empirical FDR discussed in
Section 2.3. It is worth noting that we adopt the commonly used tuning parameter λ = 1/2
in the first two examples, and its robustness is evaluated in Example 3. To assess the effect of
model selection consistency, we examine the average true rate TR = |N α1 ∩N1|/|N1| and the
average false rate FR = |N α1 ∩N0|/|N0|. When the true model can be identified consistently,
TR and FR should approach 1 and 0, respectively, as the sample size gets large. For the
sake of comparison, we also examine the marginal univariate regression (MUR) test (i.e.,
the classical t-test obtained from the marginal univariate regression model) in Monte Carlo
studies.
Example 1: Autocorrelated Predictors
Consider a linear regression model with autocorrelated predictors Xi generated from
a multivariate normal distribution with mean 0 and covariance Σ = (σj1j2) ∈ Rp×p with
σj1j2 = 0.5|j1−j2|. Although different predictors are correlated with each other, the correlation
decreases to 0 as the distance |j1 − j2| between Xij1 and Xij2 increases. The regression
coefficient vector β is such that β1 = 3, β4 = 1.5, β7 = 2, and βj = 0 for any j 6∈ {1, 4, 7},the variance of ε is σ2 = 1.
Table 1 shows that KFC is slightly better than MUR in controlling the size, while MUR
has a larger power than KFC. After closely examining the MUR’s performance, however, we
find that its ES can be misleading. For example, Xi2 ∈ N0 and it is moderately correlated
with a non-zero predictor Xi1 ∈ N1. As a result, the empirical size for testing H0 : β2 = 0
obtained from MUR could be as large as 0.90 in almost all realizations. On the other hand,
the most predictors in N0 are nearly independent of those predictors in N1 and the response
11
variable. Accordingly, MUR can have a reasonably average empirical size and a high average
true rate TR. This misleading result can be detected by its empirical false discovery rate,
which is much greater than the nominal level. Moreover, FDR gets larger as the sample size
increases, and this phenomenon is also observed at the average false rate FR. Therefore, the
MUR approach should be used with caution when testing a single coefficient, conducting
multiple hypothesis tests, or selecting variables.
In contrast to MUR, the KFC approach not only controls the size well, but also leads
to FDR closing to the nominal level as the sample size increases. Furthermore, the average
TR increases to 1 and the average FR decreases to 0, which are consistent with theoretical
findings. In sum, KFC performs well for testing a single coefficient, and the resulting p-values
are reliable for multiple hypothesis tests and model selections.
Example 2: Equally Correlated Predictors
Consider a model with equally correlated predictors Xi generated from a multivariate
normal distribution with mean 0 and a compound symmetric covariance matrix Σ = (σj1j2) ∈Rp×p, where σj1j2 = 1 if j1 = j2 and σj1j2 = 0.5 for any j1 6= j2. In addition, the regression
coefficients are set as follows: βj = 5 for 1 ≤ j ≤ 3, and βj = 0 for j > 3, and the variance
of ε is σ2 = 1.
Table 2 presents the results of ES, EP, FDR, TR, and FR for both KFC and MUR
methods. It indicates that MUR performs poorly in terms of both ES and FDR measures.
This finding is not surprising because every predictor in N0 is equally correlated with those
predictors in N1. As a result, the marginal correlation between any predictor in N0 and the
response variable is well bounded away from 0. Thus, MUR’s empirical rejection probability
is close to 100%, which leads to highly inflated ES and FDR. Furthermore, FR tends to 1 as
the sample size increases, which implies that MUR tends to over reject the null hypothesis
in large samples. On the other hand, the ES and FDR of KFC are close to the nominal level.
Moreover, TR and EP increase to 1 and FR decreases to 0 as the sample size gets larger.
The above results are supported by the theoretical findings and demonstrate that KFC is
12
reliable.
From the above two simulation studies, we find that FDR plays an important role for
examining the reliable of test statistics. Hence, we next study the accuracy for the estimation
of FDR discussed in Section 2.3. Since we are interested in the statistical behavior of
the number of false discoveries V (t), we follow Fan et al.’s (2012) suggestion and compare
FDRλ(t) in (2.4), with λ = 1/2, to FDP(t) calculated via V (t)/[R(t) ∨ 1]. For the sake of
illustration, we consider the same simulation settings as given in Examples 1 and 2 with
n = 100 and p = 500. Panels A and B in Figure 1 depict FDRλ(t) and FDP(t), obtained
via the KFC method for Examples 1 and 2, respectively, across various t values. In contrast,
Panels C and D are calculated via the MUR approach. Figure 1 clearly shows that FDRλ(t)
calculated from the p-values of KFC are reliable and consistent with the theoretical finding
in Theorem 2. However, MUR does not provide accurate estimates of FDP, and it should
be used with caution in high dimensional data analysis.
Example 3: Robustness of Covariate Distribution and λ Parameter
In the first two examples, the covariate vector Xis were generated from a multivariate
normal distribution with λ = 1/2. To assess the robustness of KFC against the covariate
distribution and λ, we conduct simulation studies for various λs and three distributions
of Xi = Σ1/2Zi, where each element of Zi is randomly generated from the standard nor-
mal distribution, the standardized exponential distribution exp(1), and the normal mixture
distribution 0.1N(0, 32) + 0.9N(0, 1), respectively, for i = 1, · · · , n, and the Σs are corre-
spondingly defined in Examples 1 and 2. Since all results are qualitatively similar, we only
report the case when λ = 0.1 and Zi follows a standardized exponential distribution. The
results in Tables 3 and 4 show similar findings to those in Tables 1 and 2, respectively.
Hence, Monte Carlo studies indicate that the KFC approach is robust against the covariate
distribution and the threshold parameter λ.
4. REAL DATA ANALYSIS
To illustrate the usefulness of the proposed method, we consider two empirical examples.
13
The first example studies supermarket data in China (Wang, 2009), in which p (6,398) is
much larger than n (464). The second example analyzes a corn spectroscopic data set, with
p = 700 and n = 80, that has been intensively studied by Lee et al. (2011).
4.1. Supermarket Data
This data set contains a total of n = 464 daily records. For each record, the response vari-
able (Yi) is the number of customers and the predictors (Xi1, · · · , Xip) are the sales volumes
of p = 6, 398 products. Consider a linear relationship between Yi and Xi = (Xi1, · · · , Xip)> ∈
Rp, given by Yi = X>i β + εi, where both the response and predictors are standardized so
that they have zero sample mean and unit sample variance.
To identify important products (predictors) that are associated with Yi, we apply the
proposed KFC and MUR methods and test the significance of each regression coefficient,
namely H0j : βj = 0 vs H1j : βj 6= 0. If the underlying coefficient structure is highly sparse,
the number of predictors in N1 is very small. This implies that most of the p-values are
computed for the case where the null hypothesis is true, and the asymptotic distribution
of those p-values should be uniform[0,1]. This, together with a small number of p-values
calculated for H0j for j ∈ N1, would yield a very flat histogram except for a bump around
the point 0. Figure 2 shows the histogram of the p-values for testing H0j (j = 1, · · · , p)
using the proposed KFC method. As one can observe from the histogram, the pattern
indicates that most of the H0j are true and the p-values are asymptotically valid. There
were 1,426 p-values that are less than the significance level α = 5%. After controlling the
false discoveries rate via the method of Storey et al. (2004) at the level of q = 5%, the
number of hypotheses H0j being rejected is 132. In other words, we have identified 132 most
important products on which the supermarket decision maker (or manager) might perform
further analysis. However, the histogram of the p-values generated from the MUR tests in
Figure 2 shows a heavily skewed pattern with a very thin tail. This suggests that most of its
p-values are very small. It rejected a total of 5,648 hypotheses H0j after controlling the FDR
at the level of q = 5%. In sum, KFC provides a simple and efficient approach to identifying
important products from high dimensional data, while MUR fails to accomplish this.
14
4.2. Corn Spectroscopic Data
The data consist of 80 samples (n=80) of corn measured on 3 different near-infrared
(NIR) spectrometers (m5spec, mp5spec and mp6spec). Each spectrum contains 700 channels
measured in a wavelength ranging from 1100nm to 2498nm at 2 nm intervals. The response
variables of interest are the moisture, oil, protein and starch values in each sample. In this
study, we only consider moisture as the response variable (Yi ∈ R1), and the covariates
(Xi ∈ Rp) are measured on the m5spec spectrometer, with p = 700. Following Lee et al.’s
(2011) study, there is a linear relationship between Yi and Xi, which is Yi = X>i β + εi, for
some independent random error εi. The main goal of this analysis is to identify important
variables that are associated with Yi.
We apply the proposed KFC and MUR methods to test the significance of each regression
coefficient, namely H0j : βj = 0 vs H1j : βj 6= 0. Figure 3 depicts the two histograms of
the p-values computed via the KFC and MUR methods, respectively. It indicates that, for
the KFC method, most of the H0js are valid and the histogram of the p-values are flat.
Specifically, 95 variables were significant under significance level α = 5%. After controlling
the false discovery rate at the level of q = 5%, the number of hypotheses H0j being rejected
is decreased to 6. In contrast, for the MUR method, the histogram of the p-values are
extremely skewed with very thin tails. This finding is not surprising since the covariates in
the model are highly multicollinear, as observed by Lee et al. (2011). In addition, all of the
variables are significant even after controlling the false discovery rate. This implies that the
information obtained from the MUR method was quite limited, and the resulting p-values
might be invalid. In sum, the KFC method effectively identifies 6 most important channels
that are significantly associated with the moisture of 80 corn samples.
The above two above examples indicate that the KFC method not only practically pro-
vides a simple and efficient approach to compute the p-value for testing a single coefficient in
a high dimensional linear model, but also results in reliable p-values for multiple hypothesis
testing.
15
5. DISCUSSION
In linear regression models with high dimensional data, we propose a single screening
procedure, Key conFounder Controlling (KFC), to control for predictors that are highly
correlated with the target covariate. This allows us to employ the classical ordinary least
squares approach to obtain the parameter estimator. We then demonstrate that the resulting
estimator is asymptotically normal. Accordingly, we extend the classical t-test (or z-test)
for testing the single coefficient to high dimensional data. Based on the p-value obtained
from testing the significance of each covariate, the multiple hypothesis testing is established
by controlling the false discovery rate at the nominal level. Moreover, we show that multiple
hypothesis testing leads to consistent model selection.
The proposed KFC method can be extended to testing a small subset of regression coef-
ficients. Consider the hypothesis:
H0 : βM = 0 vs. H1 : βM 6= 0, (5.1)
where M is a pre-specified index set with a fixed size and βM = (βj : j ∈M)> ∈ R|M| is the
subvector of β corresponding to M. Without loss of generality, we assume that M = {j :
1 ≤ j ≤ |M|} and 1 < |M| ¿ n. Then, define an overall KFC set of M as SM =⋃
j∈M Sj,
where Sj is KFC set for the jth predictor in M. Accordingly, the target parameter estimator
βM = (βj : j ∈M)> ∈ R|M| can be estimated by βM = (X>MQSMXM)−1(X>MQSMY), where
XM = (Xj : j ∈ M) ∈ Rn×|M|. Applying similar techniques to those used in the proof
of Theorem 1, we can show that n1/2(βM − βM) →d N(0, Σβ), where Σβ = σ2M(ΣM −
Σ>SMΣ−1
S ΣSM)−1 with σ2M = βT
S∗M(ΣS∗M −ΣS∗MS+
MΣ−1
S+M
ΣS+MS∗M
)βS∗M + σ2, S+
M = ∪j∈MS+j and
S∗M = {j : j 6∈ S+M}. Consequently, an F -type test statistic can be constructed to test (5.1).
To broaden the usefulness of the proposed method, we conclude the article by discussing
two possible research avenues. Firstly, from the model aspect, it would be practically useful
to extend the KFC method to generalized linear models, single index models, partial linear
models, and survival models. Secondly, from the data aspect, it is important to general-
16
ize the proposed KFC method to accommodate category explanatory variables, repeated
measurements, and missing observations. We believe these extensions would enhance the
usefulness of KFC in high dimensional data analysis.
TECHNICAL APPENDIX
Appendix A. Four Useful Lemmas
Before proving the theoretical results, we present the following four lemmas which are
needed in the proofs. The first lemma is directly borrowed from Lemma A.3 of Bickel and
Levina (2008), and the second lemma can be found in Bendat and Piersol (1966). As a
result, we only verify the third and fourth lemmas.
Lemma 1. Let σj1j2 = n−1∑
i Xij1Xij2 and assume that Condition (C1) holds. Then, there
exist three positive constants ζ0 > 0, C1 > 0, and C2 > 0, such that P (|σj1j2 − σj1j2| > ζ) ≤C1 exp(−C2nζ2) for any 0 < ζ < ζ0 and every 1 ≤ j1, j2 ≤ p.
Lemma 2. Let (U1, U2, U3, U4)> ∈ R4 be a 4-dimensional normal random vector with E(Uj) =
0 and var(Uj) = 1 for 1 ≤ j ≤ 4. We then have E(U1U2U3U4) = δ12δ34 + δ13δ24 + δ14δ23,
where δij = E(UiUj).
Lemma 3. Assume that Conditions (C1)–(C3) hold, and m = O(nν2) for some positive con-
stant ν2 which satisfies 3ν2+~ < 1, where ~ is given in (C3). Then, max|S|≤m |n−1X>1 QSX1−(σ11 − Σ1SΣ−1
S ΣS1)| →p 0.
Proof. Since n−1X>1 QSX1 = n−1X>1 X1−(n−1X>1 XS)(n−1X>SXS)−1(n−1X>SX1) and n−1X>1 X1 →p
σ11, it suffices to show that
max|S|≤m
∣∣∣(n−1X>1 XS)(n−1X>SXS)−1(n−1X>SX1)− Σ1SΣ−1S ΣS1
∣∣∣ →p 0. (A.1)
Denote ‖A‖ = {tr(AA>)}1/2 for any arbitrary matrix A. Then, we obtain (A.1) if the three
uniform convergence results given below can be proved.
max|S|≤m
∥∥∥n−1X>SX1 − ΣS1
∥∥∥ = op(1), (A.2)
max|S|≤m
∥∥∥n−1X>SX1
∥∥∥ < 2λmax, (A.3)
17
and max|S|≤m
∥∥∥n−1X>SXS − ΣS∥∥∥ = op(1). (A.4)
By Condition (C2), one can easily verify that ‖ΣS1‖2 ≤ λ2max. This, in conjunction with
(A.2), implies (A.3). Accordingly, it suffices to demonstrate (A.2) and (A.4).
It is noteworthy that, for any S satisfying |S| ≤ m, we have
‖n−1X>SX1 − ΣS1
∥∥ ={ ∑
j∈S(σ1j − σ1j)
2}1/2
≤ m1/2 maxj∈S
|σ1j − σ1j|.
This, together with the Bonferroni inequality, Condition (C1), Lemma 1, and the fact that
#{S ⊂ {1, · · · , p} : |S| ≤ m} ≤ pm, implies
P(
max|S|≤m
∥∥n−1X>SX1 − ΣS1
∥∥ > ε)≤
∑
|S|≤m
P(∥∥n−1X>SX1 − ΣS1
∥∥ > ε)
≤∑
|S|≤m
P(
maxj∈S
|σ1j − σ1j| > ε/m1/2)≤
∑
|S|≤m
∑j∈S
P(|σ1j − σ1j| > ε/m1/2
)
≤ pmmC1 exp(−C2nm−1ε2) = C1 exp(− C2nm−1ε2 + log m + m log p
). (A.5)
Furthermore, by the assumptions in Lemma 3 (m = O(nν2)) and Condition (C3) (log p ≤νn~), we have that m log p = O(nν2+~). Moreover, using the assumptions in Lemma 3 again
(3ν2 + ~ < 1), the right-hand side of (A.5) converges towards 0 as n →∞. Hence, we have
proved (A.2). Applying similar techniques to those used in the proof of (A.2), we can also
demonstrate (A.4). This completes the entire proof.
Lemma 4. Assume that (a) limp→∞ V (t)/p0 = G0(t) and limp→∞ S(t)/(p − p0) = G1(t),
where G0(t) and G1(t) are continuous functions; (b) 0 < G0(t) ≤ t for t ∈ (0, 1]; (c)
limp→∞ p0/p = 1. Then, we have lim supp→∞ FDR{tα(FDRλ)} ≤ α.
Proof. By slightly modifying the proof of Theorem 4 in Storey et al. (2004), we can
demonstrate the result. The detailed proof can be obtained upon request from the authors.
Appendix B. Proof of Theorem 1
Let T1 = (X>1 QSX1)−1X>1 QSE and T2 = (X>1 QSX1)
−1X>1 QSXS∗βS∗ . Then, β1 − β1 =
T1 + T2. Using the fact that E(εi|Xi) = 0, one can show that cov(T1, T2) = E(T1T2) −E(T1)E(T2) = 0. Therefore, T1 and T2 are uncorrelated. To prove the theorem, hence,
it suffices to show that (√
nT1,√
nT2)> is asymptotically bivariate normal. By Conditions
18
(C1)–(C3) and Lemma 2, we obtain that ‖n−1X>1 XS − Σ1S‖ →p 0, ‖n−1X>SXS − ΣS‖ →p 0,
and maxS∣∣n−1/2X>1 QSX1 − (σ11 − Σ1SΣ−1
S ΣS1)∣∣ →p 0. Accordingly, we have
√nT1 =
{1 + op(1)
}(σ11 − Σ1SΣ−1
S ΣS1
)−1(n−1/2X>1 E − n−1/2Σ1SΣ−1
S X>S E
)
={1 + op(1)
}(σ11 − Σ1SΣ−1
S ΣS1
)−1{
n−1/2
n∑i=1
(Xi1 − Σ1SΣ−1S XiS)εi
}.
Applying the same arguments as those given above, we also obtain that
√nT2 =
{1 + op(1)
}(σ11 − Σ1SΣ−1
S ΣS1
)−1{
n−1/2
n∑i=1
(Xi1 − Σ1SΣ−1S XiS)XiS∗βS∗
}.
Let ξi1 = (Xi1 − Σ1SΣ−1S XiS)εi and δi1 = (Xi1 − Σ1SΣ−1
S XiS)XiS∗βS∗ . Then, it can be
shown that E(ξi1) = 0, var(ξi1) = σ2(σ11−Σ1SΣ−1S ΣS1), and E(δi1) = (Σ1S∗−Σ1SΣ−1
S ΣSS∗)βS∗ .
Using Conditions (C4)-(C6), we further obtain that√
n|(Σ1S∗ − Σ1SΣ−1S ΣSS∗)βS∗| < Cβ ×√
n maxj 6∈S+ |%1j(S)| → 0. Moreover,
var(δi1) → E(δ2i1) = E
{(Xi1 − Σ1SΣ−1
S XiS)2β>S∗XiS∗X>iS∗βS∗
}
= E
[(Xi1 − Σ1SΣ−1
S XiS)2β>S∗E{
XiS∗X>iS∗|XiS+
}βS∗
]
→(σ11 − Σ1SΣ−1
S ΣS1
)β>S∗
(ΣS∗ − ΣS∗S+Σ−1
S+ΣS+S∗)βS∗ .
The bivariate Central Limit Theorem, together with the above results, implies that
(√nT1,
√nT2
)>=
(1 + op(1)
)(σ11 − Σ1SΣ−1
S ΣS1
)−1{
n−1/2∑i=1
(ξi1, δi1)>}
is asymptotically bivariate normal with mean zero and diagonal covariance matrix V =
Diag(Vii). In addition, V11 = σ2(σ11 − Σ1SΣ−1S ΣS1)
−1 and V22 =(σ11 − Σ1SΣ−1
S ΣS1
)−1β>S∗(
ΣS∗ − ΣS∗S+Σ−1S+ΣS+S∗
)βS∗ . Consequently,
√n(T1+T2) is asymptotically normal with mean
zero and variance V11 + V22, which completes the proof.
Appendix C: Proof of Theorem 2
We mainly apply Lemma 4 to prove Theorem 2. To this end, we need to show the
19
following two results,
1
p
p∑j=1
I(pj ≤ t)− T1,n(t) → 0 a.s. and (A.6)
1
p0
∑j∈N0
I(pj ≤ t)−G0,n(t) → 0 a.s., (A.7)
as p → ∞, where G0,n(t) = p−10
∑j∈N0
P (pj ≤ t). Since the proofs for (A.6) and (A.7) are
quite similar, we only verify (A.6). By the law of large numbers, it is enough to show that
var{1
p
p∑j=1
I(pj ≤ t)}
= O(p−δ) for any δ > 0. (A.8)
It is worth noting that the left-hand side of (A.8) is equivalent to
var{1
p
p∑j=1
I(|Zj| ≥ z1−t/2)}
= var{1
p
∑j∈N0
I(|Zj| ≥ z1−t/2)}
+ var{1
p
∑j∈N1
I(|Zj| ≥ z1−t/2)}
+2
p2
∑j1∈N0
∑j2∈N1
cov{
I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}
:= J1 + J2 + 2J3. (A.9)
Using the fact that var{I(|Zj| ≥ z1−t/2)
} ≤ E{I(|Zj| ≥ z1−t/2)
} ≤ 1 and the assump-
tion that p0/p → 1, together with the Cauchy-Schwarz inequality, we have that J2 ≤p−2|N1|
∑j∈N1
var{I(|Zj| ≥ z1−t/2)
} ≤ (p− p0)2/p2 → 0. In addition, applying the Cauchy-
Schwarz inequality, we obtain
J23 ≤ var
{1
p
∑j∈N0
I(|Zj| ≥ z1−t/2)}
var{1
p
∑j∈N1
I(|Zj| ≥ z1−t/2)}≤ (p− p0)
2p20/p
4 → 0.
Accordingly, to prove (A.8), we only need to show that J1 = O(p−δ) for some δ > 0. It can
be seen that
J1 =1
p2
∑j∈N0
var{
I(|Zj| ≥ z1−t/2)}
+1
p2
∑
j1 6=j2,j1,j2∈N0
cov{
I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}
:= J11 + J12.
Since J11 ≤ p0/p2 → 0 as p →∞, it suffices to show that J12 = O(p−δ) for some δ > 0. Note
20
that
cov{
I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)}
:= I1 + I2 + I3 + I4,
where I1 = E{I(Zj1 ≥ z1−t/2)I(Zj2 ≥ z1−t/2)
} − E{I(Zj1 ≥ z1−t/2)}E{I(Zj2 ≥ z1−t/2)
},
I2 = E{I(Zj1 ≥ z1−t/2)I(Zj2 ≤ −z1−t/2)} − E{I(Zj1 ≥ z1−t/2)}E{I(Zj2 ≤ −z1−t/2)
}, I3 =
E{I(Zj1 ≤ −z1−t/2)I(Zj2 ≥ z1−t/2)} − E{I(Zj1 ≤ −z1−t/2)}E{I(Zj2 ≥ z1−t/2)
}and I4 =
E{I(Zj1 ≤ −z1−t/2)I(Zj2 ≤ −z1−t/2)} − E{I(Zj1 ≤ −z1−t/2)}E{I(Zj2 ≤ −z1−t/2)
}. Since
the proofs for I1 to I4 are essentially the same, we only focus on I1.
Applying the asymptotic expansion of Zj given in the proof of Theorem 1, we have
Zj =
√nβj
σβj
=
√nβj
σβj
+ n−1/2
n∑i=1
uij + op(1), (A.10)
where uij = σ−1βj
(δij + ξij), δij = (Xij − ΣjSjΣ−1Sj
XSji)XiS∗j βS∗j , ξij = (Xij − ΣjSjΣ−1Sj
XSji)εi,
and σ2βj
= (σjj − ΣjSjΣ−1Sj
ΣSjj){β>S∗j (ΣS∗j − ΣS∗j S+
jΣ−1
S+j
ΣS+j S∗j )βS∗j + σ2
}. As a result, for any
j ∈ N0, Zj = n−1/2∑n
i=1 uij + op(1) and Zj can be expressed as a summation of independent
and identically distributed (i.i.d.) random variables uij. In addition, Condition (C1) implies
that uij has an exponential tail. This, together with the bivariate large deviation result
(Zhong et al., 2013), leads to
E{
I(Zj1 ≥ z1−t/2)I(Zj2 ≥ z1−t/2)}
= U(z1−t/2, z1−t/2; ρj1j2
){1 + o(1)
},
where ρj1j2 = corr(uij1 , uij2) and
U(a, b; ρ) ={
2π(1− ρ2)1/2}−1
∫ ∞
a
∫ ∞
b
exp{− 1
2(1− ρ2)(y2
1 + y22 − 2ρy1y2)
}dy1dy2.
Without loss of generality, we assume that ρj1j2 = corr(uij1 , uij2) > 0 and z1−t/2 > 0. Then,
by using the inequality in Willink (2004), we have
Φ(z1−t/2)Φ(ζz1−t/2) ≤ U(z1−t/2, z1−t/2; ρj1j2) ≤ Φ(z1−t/2
)Φ
(ζz1−t/2
)(1 + ρj1j2
), (A.11)
where ζ ={(1− ρj1j2)/(1 + ρj1j2)
}1/2. Accordingly, we obtain that
Φ(z1−t/2){
Φ(ζz1−t/2)− Φ(z1−t/2)}≤ I1 ≤ Φ(z1−t/2)
{Φ(ζz1−t/2)(1 + ρj1j2)− Φ(z1−t/2)
}.
21
After algebraic simplification, (1− ζ)/ρj1j2 → 1 as ρj1j2 → 0. Hence, I1/(CIρj1j2) → 1 for a
positive constant CI , which implies that cov{I(|Zj1| ≥ z1−t/2), I(|Zj2| ≥ z1−t/2)
} ≈ CI |ρj1j2|.Consequently, if
∑j∈N0
|ρjk| = o(p) for any k ∈ N0, then J12 = O(p−δ) for some δ > 0.
To complete the proof, we next verify the above condition∑
j∈N0|ρjk| = o(p) for any
k ∈ N0. By the Cauchy-Schwarz inequality, we need to show that∑
j∈N0ρ2
jk = o(p). Condi-
tion (C2) and Condition (C4) yield σjj − ΣjSjΣ−1Sj
ΣSjj ≤ λ−1min uniformly and β>S∗j (ΣS∗j −
ΣS∗j S+jΣ−1
S+j
ΣS+j S∗j )βS∗j is bounded, respectively, for any j ∈ N0. Hence, maxj σ2
βj< ∞.
In addition, (A.10) implies var(uij) = 1. As a result, we only need to demonstrate that∑
j∈N0v2
jk = o(p), where vjk = cov(uij, uik).
It can be shown that vjk = vjk,1 + vjk,2, where vjk,1 = cov(ξij, ξik), vjk,2 = cov(δij, δik),
and ξij and δij are defined after equation (A.10). Hence, to complete the proof, it suffices to
show the following results:
p−1∑j∈N0
v2jk,1 = o(1) and p−1
∑j∈N0
v2jk,2 = o(1). (A.12)
We begin with proving the first equation of (A.12). Applying the Cauchy-Schwarz inequality,
it can be shown that
v2jk,1 = σ4
(σjk − ΣjSj
Σ−1Sj
ΣSjk − ΣjSkΣ−1Sk
ΣSkj − ΣjSjΣ−1Sj
ΣSjSkΣ−1Sk
ΣSkk
)2
≤ 4σ4{
σ2jk + (ΣjSj
Σ−1Sj
ΣSjk)2 + (ΣjSk
Σ−1Sk
ΣSkj)2 + (ΣjSj
Σ−1Sj
ΣSjSkΣ−1Sk
ΣSkk)2}
.
We then study the above four components separately. According to Condition (C2), one can
easily demonstrate that∑
j∈N0σ2
jk ≤ λ2max, which leads to p−1
∑j∈N0
σ2jk ≤ p−1λ2
max = o(1).
In addition, we have
(ΣjSjΣ−1Sj
ΣSjk)2 = ΣjSj
Σ−1Sj
ΣSjkΣjSjΣ−1Sj
ΣSjk ≤ tr(Σ−1Sj
ΣSjjΣjSjΣ−1Sj
)‖ΣSjk‖2
= ΣjSjΣ−2Sj
ΣSjj‖ΣSjk‖2 ≤ λ−2min‖ΣjSj
‖2‖ΣSjk‖2.
This, together with the assumption of Theorem 2 and Condition (C2), leads to
p−1∑j∈N0
(ΣjSjΣ−1Sj
ΣSjk)2 ≤ p−1
∑j∈N0
λ−2min‖ΣjSj
‖2||ΣSjk‖2 ≤ λ−2minλ
2maxp
−1∑j∈N0
‖ΣSjk‖2
22
= λ−2minλ
2maxp
−1∑j∈N0
∑
l∈Sj
σ2lk = o(1).
Employing similar techniques, we can show that p−1∑
j(ΣjSkΣ−1Sk
ΣSkj)2 = o(1) and
p−1∑
j(ΣjSjΣ−1Sj
ΣSjSkΣ−1Sk
ΣSkk)2 = o(1). The above results complete the proof of the first
equation in (A.12).
We next prove the second equation of (A.12). According to the result in the proof of The-
orem 1, E(δij) ≤ Cδ maxl∈S+j|ρjl(Sj)| = o(1) for some positive constant Cδ. It follows that
p−1∑
j∈N0E2(δij)E
2(δik) = o(1); hence, we only need to show that p−1∑
j∈N0E2(δijδik) =
o(1). After algebraic simplification, we obtain that
E(δijδik
)= E
(XijXiS∗j βS∗j XikXiS∗kβS∗k
)− E(ΣjSj
Σ−1Sj
XSjiXiS∗j βS∗j XikXiS∗kβS∗k)
− E(XijXiS∗j βS∗j ΣkSk
Σ−1Sk
XSkiXiS∗kβS∗k)
+ E(ΣjSj
Σ−1Sj
XSjiXiS∗j βS∗j ΣkSkΣ−1Sk
XSkiXiS∗kβS∗k)
:= Q1j −Q2j −Q3j + Q4j.
For the sake of simplicity, we suppress the subscript i in the rest of the proof.
For any fixed k ∈ N0, Q1 :=∑
j∈N0Q2
1j, which can be bounded as follows,
Q1 ≤ CQΛβ
∑j∈N0
∑
l∈S∗j ∩N1
∑
u∈S∗k∩N1
E2(XjXkXlXu)|βl||βu|
= CQΛβ
∑j∈N0
∑
u∈S∗j ∩S∗k∩N1
E2(XjXkX2u)β2
u
+ CQΛβ
∑j∈N0
∑
l∈S∗j ∩S1
∑
u∈S∗k∩N1,u 6=l
E2(XjXkXlXu)|βl||βu|
:= CQΛβQ11 + CQΛβQ12,
where CQ is a positive constant and Λβ = maxj,k∈N0
∑l∈S∗j ∩N1
∑u∈S∗k∩N1
|βl||βu|.
By Lemma 2, E(XjXkX2u) = σjkσuu +2σjuσku. Then, using the fact that
∑j∈N0
σ2ju < ∞
for any u ∈ {1, · · · , p}, we have∑
u∈N1
∑j∈N0
σ2ju = O(|N1|) = o(p). This, in conjunction
with Condition (C4), for any k ∈ N0, leads to
Q11 ≤∑j∈N0
∑
u∈S∗j ∩S∗k∩N1
(2σ2
jkσ2uuβ
2u + 8σ2
juσ2kuβ
2u
)= o(p).
Furthermore, by Lemma 2, for any j 6= k 6= u 6= l, we have E(XjXkXlXu) = σjkσlu +
23
σjlσku + σjuσkl. After algebraic simplification, we obtain that
Q12 ≤ C12
∑j∈N0
∑
l∈S∗j ∩N1
∑
u∈S∗k∩N1,u6=l
(σ2
jkσ2lu|βlβu|+ σ2
jlσ2ku|βlβu|+ σ2
juσ2kl|βlβu|
)
for some positive constant C12 > 0. This, together with the fact that∑
q∈N1
∑j∈N0
σ2jq = o(p)
and Condition (C4), yields Q12 = o(p). Moreover, Condition (C4) implies that Λβ < C2β <
∞. Combining the above results, we finally have Q1 = o(p).
We subsequently study the bounded behavior of Q4 =∑
j∈N0Q2
4j. Let ΣjSjΣ−1Sj
:=
(dj1, · · · , dj|Sj |). By Condition (C2), the djl are bounded constants such that∑
l d2jl < ∞ for
l = 1, · · · , |Sj|. We then apply Lemma 2 and obtain that
Q4j =∑
q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
djq1dkl1βq2βl2E(Xq1Xq2Xl1Xl2)
=∑
q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
djq1dkl1βq2βl2
(σq1q2σl1l2 + σq1l1σq2l2 + σq1l2σq2l1
).
By the Cauchy-Schwarz inequality, we have
Q24j ≤
∑q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
d2jq1
d2kl1
β2q2
β2l2×
∑q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
(σq1q2σl1l2 + σq1l1σq2l2 + σq1l2σq2l1
)2
:= Q41,j ×Q42,j.
Conditions (C2) and (C4) imply that
Q41,j ≤∑
q1∈Sj
d2jq1
∑
q2∈S∗jβ2
q2
∑
l1∈Sk
d2kl1
∑
l2∈S∗k
β2l2
< ∞.
In addition, by the Cauchy-Schwarz inequality, we obtain
Q42,j ≤ 3∑
q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
(σ2
q1q2σ2
l1l2+ σ2
q1l1σ2
q2l2+ σ2
q1l2σ2
q2l1
):= 3(Q421,j + Q422,j + Q423,j).
24
Let Q421 =∑
j∈N0Q421,j. After algebraic simplification, we have that
Q421 =∑j∈N0
∑q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
σ2q1q2
σ2l1l2
=∑j∈N0
∑
q1∈Sj ,q2∈S∗j ∩N1
σ2q1q2
∑
l1∈Sk,l2∈S∗k∩N1
σ2l1l2
.
By Condition (C2), we obtain that∑
l1∈Sk,l2∈S∗k∩N1σ2
l1l2≤ ∑
l2∈S∗k∩N1
∑l1∈Sk
σ2l1l2
= O(|N1|).In addition, the assumption of Theorem 2 leads to
∑j∈N0
∑
q1∈Sj ,q2∈S∗j ∩N1
σ2q1q2
≤∑
q2∈N1
∑j∈N0
∑q1∈Sj
σ2q1q2
=∑
q2∈N1
o(p/|N1|2
)= o(p/|N1|).
Combining the above results, we obtain that Q421 = o(p). Analogously, we can demonstrate
that Q423 =∑
j∈N0Q423,j = o(p).
Subsequently, let Q422 =∑
j∈N0Q422,j, which can be simplified as
Q422 =∑j∈N0
∑q1∈Sj ,q2∈S∗j ∩N1,
l1∈Sk,l2∈S∗k∩N1
σ2q1l1
σ2q2l2
=∑j∈N0
∑
q1∈Sj ,l1∈Sk
σ2q1l1
∑
q2∈S∗j ∩N1,l2∈S∗k∩N1
σ2q2l2
.
By Condition (C2), we obtain that∑
q2∈S∗j ∩N1,l2∈S∗k∩N1σ2
q2l2≤ ∑
q2∈N1
∑l2∈N1
σ2q2l2
= O(|N1|).In addition, the assumption of Theorem 2 leads to
∑j∈N0
∑
q1∈Sj ,l1∈Sk
σ2q1l1
=∑
l1∈Sk
∑j∈N0
∑q1∈Sj
σ2q1l1
≤ maxk|Sk|o
(p/Λ2
0
)= o(1).
As a result, we have that Q422 = o(p). This, together with Q421 = o(p) and Q423 = o(p),
yields Q42 = o(p). In sum, we obtain Q4 = o(p). Employing similar techniques to those used
in the proofs of Q1 = o(p) and Q4 = o(p), we are able to show that Q2 = o(p) and Q3 = o(p).
Consequently, we complete the proof of the second equation in (A.12), which implies that
(A.6) holds.
It is worth noting that G0,n(t) → t. This, in conjunction with (A.6), (A.7), and the
assumptions T1,n → T1(t) with T1(t) continuous and p0/p → 1 as p → ∞, indicates that
Conditions (a), (b), and (c) in Lemma 4 hold. Accordingly, the proof of Theorem 2 is
complete.
Appendix D. Proof of Theorem 3
Let Zj = n1/2βj/σβjbe the test statistic and pj be the corresponding p-value for j =
25
1, · · · , p. Define αn = 2{1 − Φ(n)} for some ~ < < 1/2 − κ, where ~ and κ are given in
(C7); hence, αn → 0 as n →∞. To prove the theorem, it suffices to show that
limn→∞
P{
V (αn) > 0}→ 0 and lim
n→∞P
{S(αn)/(p− p0) = 1
}→ 1.
It is worth noting that n → ∞ implicitly implies p → ∞. We demonstrate the above
equations in the following two steps accordingly.
STEP I. We show that P{V (αn) > 0} → 0. Using the fact that τ 2βj≥ σ2 for 1 ≤ j ≤ p,
we have
|Zj| ={∣∣∣(X>
j QSjXj)
−1X>j QSj
E + (X>j QSj
Xj)−1X>
j QSjXS∗j βS∗j
∣∣∣}/
(n1/2σβj)
≤ σ−1∣∣∣(X>
j QSjXj)
−1/2X>j QSj
E∣∣∣ + σ−1
∣∣∣(X>j QSj
Xj)−1X>
j QSjXS∗j βS∗j
∣∣∣.
This, together with Bonferroni’s inequality, leads to
P{
V (αn) > 0}
= P(
maxj∈N0
|Zj| > z1−αn/2
)
≤ P(
maxj∈N0
∣∣(X>j QSj
Xj)−1/2X>
j QSjE/σ
∣∣ > n/2)
+P(
maxj∈N0
∣∣(X>j QSj
Xj)−1X>
j QSjXS∗j βS∗j
∣∣ > σn/2). (A.13)
Consider the quantity,∣∣(X>
j QSjXj)
−1/2X>j QSj
E/σ∣∣, which is in the first term of the
right-hand side of the above equation. Employing the same technique as used in the proof of
Lemma 3, we obtain that maxj |n−1X>j QSj
Xj − (σjj −ΣjSjΣ−1Sj
ΣSjj)| = op(1). By Condition
(C2), one can easily verify that σjj − ΣjSjΣ−1Sj
ΣSjj ≥ λ−1max. The above two results lead to
n−1X>j QSj
Xj ≥ λ−1max uniformly for any j. In addition, (A.2), (A.3), and (A.4) imply that
maxj |X>j QSj
E| = maxj |X>j E −ΣjSj
Σ−1Sj
XSjE|×{1+ op(1)}. Accordingly, there exists some
constant Cm such that
maxj∈N0
∣∣(X>j QSj
Xj)−1/2X>
j QSjE/σ
∣∣ ≤ Cm maxj∈N0
n−1/2|X>j E|+ Cm max
j∈N0
n−1/2|ΣjSjΣ−1Sj
XSjE|.
By the Cauchy-Schwarz inequality, we have∣∣ΣjSj
Σ−1Sj
XSjE∣∣ ≤ ‖ΣjSj
Σ−1Sj‖maxj∈Sj
|X>j E|. By
Condition (C2), we further obtain ‖ΣjSjΣ−1Sj‖2 = ΣjSj
Σ−2Sj
ΣSjj ≤ λ2maxλ
−2min. As a result, there
26
exists some constant Cλ such that
maxj∈N0
∣∣(X>j QSj
Xj)−1/2X>
j QSjE/σ
∣∣ ≤ Cλ maxj∈N0
n−1/2|X>j E|.
This, in conjunction with Bonferroni’s inequality, yields
P(
maxj∈N0
∣∣(X>j QSj
Xj)−1/2X>
j QSjE/σ
∣∣ > n/2)≤
∑j∈N0
P(n−1
∣∣X>j E
∣∣ > C−1λ n−1/2/2
)
≤ 2p exp{− Chn
2} ≤ 2 exp
{− Chn2 + νn~
}
for some positive constant Ch. By definition, ~ < 2. Thus, the first term in the right-hand
side of the above equation, −CEn2/4, dominates the second term νn~, which immediately
leads to
limn→∞
P(
maxj∈N0
∣∣(X>j QSj
Xj)−1/2X>
j QSjE/σ
∣∣ > n/2)→ 0.
We next consider the quantity (X>j QSj
Xj)−1/2X>
j QSjXS∗j βS∗j , which is in the second term
of the right-hand side of equation (A.13). It is worth noting that
(X>j QSj
Xj)−1/2X>
j QSjXS∗j βS∗j = (X>
j QSjXj/n)−1/2n1/2
∑
j∗∈S∗j%jj∗(Sj)βj∗
≤ Cβ
{min
jn−1X>
j QSjXj
}−1/2
maxj∗∈S∗j
|n1/2%jj∗(Sj)|.
Using the results of maxj |n−1X>j QSj
Xj−(σjj−ΣjSjΣ−1Sj
ΣSjj)| = op(1) and σjj−ΣjSjΣ−1Sj
ΣSjj ≥λ−1
max discussed after (A.13), we have that {minj n−1X>j QSj
Xj}−1/2 = Op(1). In addi-
tion, Condition (C5), together with the fact that Sj satisfies Condition (C6), leads to
maxj∗∈S∗j |n1/2%jj∗(Sj)| = o(1). By Corollary 1 of Kalisch and Buhlmann (2007), we im-
mediately obtain
maxj,S∗j
P{
maxj∗∈S∗j
|n1/2%jj∗(Sj)| > O(nb/2)}→ 0
for every ~ < b < 1. Taking b = (~+ )/2, we then have maxj∗∈S∗j |n1/2%jj∗(Sj)| = o(nb/2) =
o(n). This, in conjunction with the above result {minj n−1X>j QSj
Xj}−1/2 = Op(1), results
in
P(
maxj∈N0
∣∣(X>j QSj
Xj)−1X>
j QSjXS∗j βS∗j
∣∣ > σn/2)→ 0.
In sum, we have shown the asymptotic behavior of the first component in the right-hand
side of (A.13).
27
STEP II. We prove that limn→∞ P{
S(αn)/(p− p0) = 1}→ 1. By definition, we have
(p− p0)−1S(αn) = (p− p0)
−1∑j∈N1
I(|n1/2βj/σβj
| > n).
Applying the asymptotic result of the first component in the right-hand side of (A.13),
we have maxj |n1/2(βj − βj)/σβj| = o(n). Then, using the theorem’s assumption that
minj∈N1 |βj| ≥ Cκn−κ for some constants Cκ > 0 and κ > 0, we further obtain minj∈N1 |n1/2βj/σβj
| =O(n1/2−κ). Moreover, by Bonferroni’s inequality and the fact that + κ < 1/2, we have
P((p− p0)
−1S(αn) = 1)
= P(
minj∈N1
|n1/2βj/σβj| > n
)
≥ P(
minj∈N1
|n1/2βj/σβj| > n
)− P
(maxj∈N1
|n1/2(βj − βj)/σβj| > 2n
)→ 1,
which completes the proof of Step II. Consequently, the entire proof is complete.
REFERENCES
Bickel, P. and Levina, E. (2008). “Regularized estimation of large covariances matrix,” The
Annals of Statistics, 36, 199–227.
Bendat, J. S. and Piersol, A. G. (1966), Measurement and Analysis of Random Data, New
York: Wiley.
Benjamini, Y. and Hochberg, Y. (1995). “Controlling the false discovery rate: A practical
and powerful approach to multiple testing,” Journal of the Royal Statistical Society,
Series B, 57, 289–300.
Buhlmann, P. (2013). “Statistical significance in high-dimensional linear models,” Bernoulli,
to appear.
Bunea, F., Wegkamp, M., and Auguste, A. (2006). “Consistent variable selection in high
dimensional regression via multiple testing,” Journal of Statistical Planning and Infer-
ence, 136, 4349-4364.
Draper, N. R., and Smith, H. (1998). Applied Regression Analysis, 3rd Edition, New York:
Wiley.
28
Fan, J. and Lv, J. (2008). “Sure independence screening for ultra-high dimensional feature
space (with discussion),” Journal of the Royal Statistical Society, Series B, 70, 849–911.
Fan, J., Han, X., and Gu, W. (2012). “Estimating False Discovery Proportion Under
Arbitrary Covariance Dependence,” Journal of the American Statistical Association,
107, 1019–1035.
Goeman, J., Geer, V. D., and Houwelingen, V. (2006). “Testing against a high-dimensional
alternative,” Journal of the Royal Statistical Society, Series B, 68, 477–493.
Goeman, J., Houwelingen, V., and Finos, L. (2011). “Testing against a high dimensional al-
ternative in the generalized linear model: asymptotic type I error control,” Biometrika,
98, 381–390.
Huang, J., Ma, S., and Zhang, C.H. (2007). “Adaptive Lasso for sparse high-dimensional
regression models,” Statistica Sinica, 18, 1603-1618.
Kalisch, M. and Buhlmann, P. (2007). “Estimating high-dimensional directed acyclic
graphs with the PC-algorithm,” Journal of Machine Learning Research, 8, 613–636.
Lee, D., Lee, W., Lee, Y., and Pawitan, Y. (2001). “Sparse partial least-squares regression
and its applications to high-throughput data analysis,” Chemometrics and Intelligent
Laboratory Systems, 109, 1–8.
Li, R., Zhong, W., and Zhu, L. P. (2012). “Feature screening via distance correlation
learning,” Journal of the American Statistical Association, 107, 1129–1139.
Meinshausen, N. and Buhlmann, P. (2006). “High-dimensional graphs and variable selection
with the Lasso,” Annals of Statistics, 34, 1436-1462.
Meinshausen, N., Meier, L., and Buhlmann, P. (2009). “P-values for high-dimensional
regression,” Journal of the American Statistical Association, 104, 1671-1681.
Storey, J. D. (2002). “A direct approach to false discovery rates”, Journal of the Royal
Statistical Society, Series B, 64, 479–498.
Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). “Strong control, conservative point
estimation and simultaneous conservative consistency of false discovery rates: a unified
approach,” Journal of the Royal Statistical Society, Series B, 66, 187–205.
29
Sun, T. and Zhang, C. H. (2012). “Scaled sparse linear regression,” Biometrika, 99, 879–
898.
Wang, H. (2009). “Forward regression for ultra-high dimensional variable screening,” Jour-
nal of the American Statistical Association, 104, 1512–1524.
Wang, H. (2012). “Factor profiled independence screening,” Biometrika, 99, 15–28.
Willink, R. (2004). “Bounds on the bivariate normal distribution function,” Communica-
tions in Statistics: Theory and Methods, 33, 2281–2297.
van de Geer, S., Buhlmann, P., and Ritov, Y. (2013). “On asymptotically optimal confi-
dence regions and tests for high-dimensional models,” Working Paper.
Zhang, C. H. and Zhang, S. S. (2013). “Confidence intervals for low dimensional parameters
in high dimensional linear models,” Journal of the Royal Statistical Society: Series B,
in Press.
Zhao, P. and Yu, B. (2006). “On model selection consistency of Lasso,” Journal of Machine
Learning Research, 7, 2541-2563.
Zhong, P. S. and Chen, S. X. (2011). “Tests for high dimensional regression coefficients
with factorial designs,” Journal of the American Statistical Association, 106, 260–274.
Zhong, P. S., Chen, S. X., and Xu, M. (2013). “Tests alternative to higher criticism for
high dimensional means under sparsity and column-wise dependence,” The Annals of
Statistics, to appear.
30
Table 1: Simulation results for Example 1 with α = 5% and q = 5%.
p n Methods ES EP FDR TR FR500 100 MUR 0.062 0.988 0.349 0.805 0.004
KFC 0.058 0.779 0.103 0.495 0.001
200 MUR 0.062 1.000 0.539 0.992 0.008KFC 0.054 0.950 0.072 0.767 0.001
500 MUR 0.062 1.000 0.642 1.000 0.011KFC 0.051 0.999 0.055 0.965 0.000
1000 100 MUR 0.058 0.986 0.328 0.771 0.002KFC 0.058 0.778 0.111 0.440 0.000
200 MUR 0.057 1.000 0.508 0.983 0.005KFC 0.053 0.944 0.072 0.704 0.000
500 MUR 0.057 1.000 0.658 1.000 0.007KFC 0.051 1.000 0.058 0.957 0.000
Table 2: Simulation results for Example 2 with α = 5% and q = 5%.p n Test ES EP FDR TR FR
500 100 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.055 0.739 0.093 0.442 0.000
200 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.050 0.936 0.053 0.813 0.000
500 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.052 0.999 0.050 1.000 0.000
1000 100 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.060 0.701 0.140 0.353 0.000
200 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.052 0.907 0.073 0.753 0.000
500 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.049 0.999 0.049 1.000 0.000
31
Table 3: Simulation results for Example 3 with α = 5%, q = 5%, Σ as given in Example 1,λ = 0.1, and Xis being generated from a standardized exponential distribution.
p n Test ES EP FDR TR FR500 100 MUR 0.062 0.985 0.364 0.804 0.004
KFC 0.057 0.752 0.072 0.456 0.000
200 MUR 0.061 1.000 0.541 0.989 0.008KFC 0.053 0.947 0.049 0.717 0.000
500 MUR 0.062 1.000 0.644 1.000 0.011KFC 0.051 0.999 0.050 0.963 0.000
1000 100 MUR 0.058 0.988 0.342 0.757 0.002KFC 0.058 0.756 0.083 0.403 0.000
200 MUR 0.057 1.000 0.513 0.983 0.004KFC 0.053 0.941 0.058 0.580 0.000
500 MUR 0.056 1.000 0.639 1.000 0.005KFC 0.052 1.000 0.048 0.944 0.000
Table 4: Simulation results for Example 3 with α = 5%, q = 5%, Σ as given in Example 2,λ = 0.1, and Xis being generated from a standardized exponential distribution.
p n Test ES EP FDR TR FR500 100 MUR 1.000 1.000 0.994 1.000 1.000
KFC 0.062 0.801 0.100 0.390 0.000
200 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.054 0.995 0.054 0.765 0.000
500 MUR 1.000 1.000 0.994 1.000 1.000KFC 0.052 1.000 0.053 1.000 0.001
1000 100 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.071 0.754 0.110 0.310 0.000
200 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.058 0.986 0.076 0.632 0.000
500 MUR 1.000 1.000 0.997 1.000 1.000KFC 0.053 1.000 0.051 1.000 0.000
32
0.0
0.2
0.4
0.6
0.8
1.0
Threshold value t
Pro
babi
lity
0 0.006 0.014 0.022 0.03 0.038
True FDPEstimated FDP
A
0.0
0.2
0.4
0.6
0.8
1.0
Threshold value t
Pro
babi
lity
0 0.006 0.014 0.022 0.03 0.038
True FDPEstimated FDP
B
0.0
0.2
0.4
0.6
0.8
1.0
Threshold value t
Pro
babi
lity
0 0.006 0.014 0.022 0.03 0.038
True FDPEstimated FDP
C
0.0
0.2
0.4
0.6
0.8
1.0
Threshold value t
Pro
babi
lity
0 0.006 0.014 0.022 0.03 0.038
True FDPEstimated FDP
D
Figure 1: Panels A and B depict the estimated FDP value (i.e. FDR0.5(t)) against the trueFDP value obtained via the KFC method for Examples 1 and 2, respectively, while PanelsC and D are obtained via the MUR approach.
33
KFC
Fre
quen
cy
0.0 0.4 0.8
020
0040
0060
00
MUR
Fre
quen
cy
0.0 0.4 0.8
020
0040
0060
00
Figure 2: Supermarket data. The histograms of the p-values for the KFC and MUR tests.
KFC
Fre
quen
cy
0.0 0.4 0.8
020
040
060
0
MUR
Fre
quen
cy
0.0e+00 1.5e−05
020
040
060
0
Figure 3: Corn spectroscopic data. The histograms of the p-values for the KFC and MURtests.
34