The Potential of Temporary Immersion Bioreactors (TIBs) in Meeting ...
A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp ›...
Transcript of A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp ›...
![Page 1: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/1.jpg)
1
A significance test for the lasso
Robert Tibshirani, Stanford University
Gold medal address, SSC 2013
Joint work with Richard Lockhart (SFU), Jonathan Taylor (Stanford), and RyanTibshirani (Carnegie-Mellon Univ.)
Reaping the benefits of LARS: A special thanks to Brad Efron, Trevor Hastie andIain Johnstone
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 2: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/2.jpg)
2
Richard Lockhart Jonathan Taylor Ryan Tibshirani Simon Fraser University Asst Prof, CMU Vancouver (PhD Student of Taylor,(PhD . Student of David Blackwell, Stanford 2011) Berkeley, 1979)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 3: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/3.jpg)
3
An Intense and Memorable Collaboration!
With substantial and unique contributions from all four authors:
Quarterback and cheerleader
Expert in “elementary” theory
Expert in “advanced” theory
The closer: pulled together the elementary and advanced views into a coherent whole
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 4: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/4.jpg)
4
Overview
• Not a review, but instead some recent (unpublished work) on inference in thelasso.
• Although this is “yet another talk on the lasso”, it may have something tooffer mainstream statistical practice.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 5: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/5.jpg)
5
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 6: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/6.jpg)
6
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 7: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/7.jpg)
7
The Lasso
Observe n predictor-response pairs (xi , yi ), where xi ∈ Rp and yi ∈ R. FormingX ∈ Rn×p, with standardized columns, the Lasso is an estimator defined by thefollowing optimization problem (??):
minimizeβ0∈R, β∈Rp
1
2‖y − β01− Xβ‖2 + λ‖β‖1
• Penalty =⇒ sparsity (feature selection)
• Convex problem (good for computation and theory)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 8: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/8.jpg)
8
The LassoWhy does `1-penalty give sparse βλ?
minimizeβ∈Rp
1
2‖y − Xβ‖2 subject to ‖β‖1 ≤ s
+βOLS
●
βλ β1
β2
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 9: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/9.jpg)
9
Prostate cancer exampleN = 88, p = 8. Predicting log-PSA, in men after prostate cancer surgery
Shrinkage Factor s
Coe
ffici
ents
0.0 0.2 0.4 0.6 0.8 1.0
−0.
20.
00.
20.
40.
6
lcavol
lweight
age
lbph
svi
lcp
gleason
pgg45
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 10: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/10.jpg)
10
Emerging themes
• Lasso (`1) penalties have powerful statistical and computational advantages
• `1 penalties provide a natural to encourage/enforce sparsity and simplicity inthe solution.
• “Bet on sparsity principle” (In the Elements of Statistical learning). Assumethat the underlying truth is sparse and use an `1 penalty to try to recover it.If you’re right, you will do well. If you’re wrong— the underlying truth is notsparse—, then no method can do well. [Bickel, Buhlmann, Candes, Donoho,Johnstone,Yu ...]
• `1 penalties are convex and the assumed sparsity can lead to significantcomputational advantages
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 11: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/11.jpg)
11
Old SSC logo
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 12: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/12.jpg)
12
New SSC logo? (Thanks to Jacob Bien)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 13: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/13.jpg)
13
Setup and basic question
• Given an outcome vector y ∈ Rn and a predictor matrix X ∈ Rn×p, weconsider the usual linear regression setup:
y = Xβ∗ + σε, (1)
where β∗ ∈ Rp are unknown coefficients to be estimated, and thecomponents of the noise vector ε ∈ Rn are i.i.d. N(0, 1).
• Given fitted lasso model at some λ can we produce a p-value for eachpredictor in the model? Difficult! (but we have some ideas for this- futurework)
• What we show today: how to provide a p-value for each variable as it isadded to lasso model
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 14: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/14.jpg)
14
Forward stepwise regression
• This procedure enters predictors one a time, choosing the predictor that mostdecreases the residual sum of squares at each stage.
• Defining RSS to be the residual sum of squares for the model containing kpredictors, and RSSnull the residual sum of squares before the kth predictorwas added, we can form the usual statistic
Rk = (RSSnull − RSS)/σ2 (2)
(with σ assumed known for now), and compare it to a χ21 distribution.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 15: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/15.jpg)
15
Simulated example- Forward stepwise- F statistic
0 2 4 6 8 10
05
1015
Chi−squared on 1 df
Test
sta
tistic
N = 100, p = 10, true model null
Test is too liberal: for nominal size 5%, actual type I error is 39%.Can get proper p-values by sample splitting: but messy, loss of power
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 16: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/16.jpg)
16
Degrees of Freedom
Degrees of Freedom used by a procedure, y = h(y):
dfh =1
σ2
n∑i=1
cov(yi , yi )
where y ∼ N(µ, σ2In) [?].
Measures total self-influence of yi ’s on their predictions.
Stein’s formula can be used to evaluate df [?].
For fixed (non-adaptive) linear model fit on k predictors, df = k.
But for forward stepwise regression, df after adding k predictors is > k.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 17: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/17.jpg)
17
Degrees of Freedom – Lasso
• Remarkable result for the Lasso:
dflasso = E [#nonzero coefficients]
• For least angle regression, df is exactly k after k steps (under conditions).So LARS spends one degree of freedom per step!
• Result has been generalized in multiple ways in (Ryan Tibs & Taylor) ?, e.g.for general X , p, n.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 18: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/18.jpg)
18
Question that motivated today’s work
Is there a statistic for testing in lasso/LARS having onedegree of freedom?
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 19: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/19.jpg)
19
Quick review of least angle regression
Least angle regression is a method for constructing the path of lasso solutions.
A more democratic version of forward stepwise regression.
• find the predictor most correlated with the outcome,
• move the parameter vector in the least squares direction until some otherpredictor has as much correlation with the current residual.
• this new predictor is added to the active set, and the procedure is repeated.
• If a non-zero coefficient hits zero, that predictor is dropped from the activeset, and the process is restarted. [This is “lasso” mode, which is what weconsider here.]
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 20: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/20.jpg)
20
Least angle regression
−1 0 1 2
−1
01
23
45
− log(λ)
Coe
ffici
ents
λ1 λ2 λ3 λ4 λ5
1
1
1
1
3
3
3
2
2
4
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 21: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/21.jpg)
21
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 22: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/22.jpg)
22
The covariance test statisticSuppose that we want a p-value for predictor 2, entering at the 3rd step.
−1 0 1 2
−1
01
23
45
− log(λ)
Coe
ffici
ents
λ1 λ2 λ3 λ4 λ5
1
1
1
1
3
3
3
2
2
4
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 23: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/23.jpg)
23
Compute covariance at λ4: 〈y,Xβ(λ4)〉
−1 0 1 2
−1
01
23
45
− log(λ)
Coe
ffici
ents
λ1 λ2 λ3 λ4 λ5
1
1
1
1
3
3
3
2
2
4
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 24: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/24.jpg)
24
Drop x2, yielding active yet A; refit at λ4, and compute resulting covariance at λ4,giving
T =(〈y,Xβ(λ4)〉 − 〈y,XAβA(λ4)〉
)/σ2
−1 0 1 2
−1
01
23
45
− log(λ)
Coe
ffici
ents
λ1 λ2 λ3 λ4 λ5
1
1
1
1
3
3
3
2
2
4
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 25: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/25.jpg)
25
The covariance test statistic: formal definition
• Suppose that we wish to test significance of predictor that enters LARS at λj .
• Let A be the active set before this predictor added
• Let the estimates at the end of this step be β(λj+1)
• We refit the lasso, keeping λ = λj+1 but using just the variables in A. This
yields estimates βA(λj+1). The proposed covariance test statistic is definedby
Tj =1
σ2·(〈y,Xβ(λj+1)〉 − 〈y,XAβA(λj+1)〉
). (3)
• measures how much of the covariance between the outcome and the fittedmodel can be attributed to the predictor which has just entered the model.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 26: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/26.jpg)
26
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 27: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/27.jpg)
27
Main result
Under the null hypothesis that all signal variables are in the model:
Tj =1
σ2·(〈y,Xβ(λj+1)〉 − 〈y,XAβA(λj+1)〉
)→ Exp(1)
as p, n→∞.
More details to come
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 28: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/28.jpg)
28
Comments on the covariance test
Tj =1
σ2·(〈y,Xβ(λj+1)〉 − 〈y,XAβA(λj+1)〉
). (4)
• Generalization of standard χ2 or F test, designed for fixed linear regression,to adaptive regression setting.
• Exp(1) is the same as χ22/2; its mean is 1, like χ2
1: overfitting due to adaptiveselection is offset by shrinkage of coefficients
• Form of the statistic is very specific- uses covariance rather than residual sumof squares (they are equivalent for projections)
• Covariance must be evaluated at specific knot λj+1
• Applies when p > n or p ≤ n: although asymptotic in p, Exp(1) seem to be avery good approximation even for small p
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 29: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/29.jpg)
29
Simulated example- Lasso- Covariance statisticN = 100, p = 10, true model null
0 1 2 3 4 5 6 7
05
1015
Exp(1)
Test
sta
tistic
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 30: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/30.jpg)
30
Example: Prostate cancer data
Stepwise Lasso
lcavol 0.000 0.000lweight 0.000 0.052
svi 0.041 0.174lbph 0.045 0.929
pgg45 0.226 0.353age 0.191 0.650lcp 0.065 0.051
gleason 0.883 0.978
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 31: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/31.jpg)
31
Simplifications
• For any design, the first covariance test T1 can be shown to equalλ1(λ1 − λ2).
• For orthonormal design, Tj = λj(λj − λj+1) for all j ; for general designs,Tj = cjλj(λj − λj+1)
• For orthonormal design, λj = |β(j)|, the jth largest univariate coefficient inabsolute value. Hence
Tj = (|β(j)|(|β(j)| − |β(j+1)|). (5)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 32: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/32.jpg)
32
Rough summary of theoretical results
Under somewhat general conditions, after all signal variables are in the model,distribution of T for kth null predictor → Exp(1/k)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 33: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/33.jpg)
33
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 34: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/34.jpg)
34
Theory for orthogonal case
Global null case: first predictor to enter
Recall that in this setting,
Tj = λj(λj − λj+1)
and λj = |β(j)|, βj ∼ N(0, 1)
So the question is:
Suppose V1 > V2 . . . > Vn are the order statistics from a χ1 distribution (absolutevalue of a standard Gaussian).
What is the asymptotic distribution of V1(V1 − V2)?
[Ask Richard Lockhart!]
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 35: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/35.jpg)
35
Theory for orthogonal caseGlobal null case: first predictor to enter
Lemma
Lemma 1: Top two order statistics: Suppose V1 > V2 . . . > Vp are the orderstatistics from a χ1 distribution (absolute value of a standard Gaussian) withcumulative distribution function F (x) = (2Φ(x)− 1)1(x > 0), where Φ(x) isstandard normal cumulative distribution function. Then
V1(V1 − V2)→ Exp(1). (6)
Lemma
Lemma 2: All predictors. Under the same conditions as Lemma 1,
(V1(V1 − V2), . . . ,Vk(Vk − Vk+1))→ (Exp(1),Exp(1/2), . . .Exp(1/k))
Proof uses a uses theorem from ?. We were unable to find these remarkablysimple results in the literature.Robert Tibshirani, Stanford University A significance test for the lasso
![Page 36: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/36.jpg)
36
Heuristically, the Exp(1) limiting distribution for T1 can be seen as follows:
• The spacings |β(1)| − |β(2)| have an exponential distribution asymptotically,
while |β(1)| has an extreme value distribution with relatively small variance.
• It turns out that |β(1)| is just the right (stochastic) scale factor to give thespacings an Exp(1) distribution.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 37: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/37.jpg)
37
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 38: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/38.jpg)
38
Simulations of null distribution
TABLES OF SIMULATION RESULTS ARE BORING !!!!
SHOW SOME MOVIES INSTEAD
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 39: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/39.jpg)
39
Proof sketch
We use a theorem from ? on the asymptotic distributions of extreme orderstatistics.
1 Let E1,E2 be independent standard exponentials. There are constants an andbn such that
W1n ≡ bn(V1 − an) −→W1 = log(E1)
2 For those same constants put W2n = bn(X2 − an). Then
(W1n,W2n) −→ (W1,W2) = (− log(E1),− log(E1 + E2))
3 The quantity of interest T is a function of W1n,W2n. A change of variablesshows that T −→ log(E2 + E1)− log(E1) = log(1 + E2/E1), which is Exp(1).
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 40: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/40.jpg)
40
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 41: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/41.jpg)
41
General X results
Under appropriate condition on X, as p,N →∞,
1 Global null case: T1 = λ1(λ1 − λ2)→ Exp(1).
2 Non-null case: After the k strong signal variables have entered, under thenull hypothesis that the rest are weak,
Tk+1
n,p→∞≤ Exp(1)
Jon Taylor: “Something magical happens in the math”
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 42: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/42.jpg)
42
Sketch of proof: k = 1
• Assume that y ∼ N(Xβ0, σ2I ), and, for simplicity diag(XTX ) = 1. Let
Uj = XTj y , R = XTX ,
• We are interested in T1 = λ1(λ1 − λ2)/σ2. Can show thatλ1 = ‖XT y‖∞ = maxj,sj sjX
Tj y and
λ2 = maxj 6=j1, s∈{−1,1}
sUj − sRj,j1 Uj1
1− ss1Rj,j1
. (7)
• Defineg(j , s) = sUj for j = 1, . . . p, s ∈ {−1, 1}. (8)
h(j1,s1)(j , s) =g(j , s)− ss1Rj,j1 g(j1, s1)
1− ss1Rj,j1
for j 6= j1, s ∈ {−1, 1}. (9)
M(j1, s1) = maxj 6=j1, s
h(j1,s1)(j , s), (10)
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 43: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/43.jpg)
43
Sketch of proof— continued
• Key fact:{g(j1, s1) > g(j , s) for all j , s
}={
g(j1, s1) > M(j1, s1)},
and M(j1, s1) is independent of g(j1, s1). Motivated from expected Eulercharacteristic for a Gaussian random field [Adler, Taylor, Worsley]
• Use this to write
P(T1 > t) =∑j1,s1
P(
g(j1, s1)(g(j1, s1)−M(j1, s1)
)/σ2 > t, g(j1, s1) > M(j1, s1)
).
Condition on M, assume M →∞, and use Mill’s ratio applied to tail ofGaussian to get the result.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 44: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/44.jpg)
44
Conditions on X
• The main condition is that for each (j , sj) the random variable Mj,s(g) growssufficiently fast.
• A sufficient condition: for any j , we require the existence of a subset S notcontaining j such that the variables Ui , i ∈ S are not too correlated, in thesense that the conditional variance of any one on all the others is boundedbelow. This subset S has to be of size at least log p.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 45: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/45.jpg)
45
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 46: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/46.jpg)
46
HIV mutation dataN = 1057 samplesp = 217 mutation sites (xij=0 or 1)y = a measure of drug resistance
2 4 6 8 10
0.0
0.4
0.8
Step
Pva
lue
Forward stepwiseLasso
2 4 6 8 10
040
00
Step
Test
err
or
The data were randomly divided 50 times into training and test sets of size (150,907).
Top row shows the training set p-values for forward stage regression and the lasso. The
bottom panels show the test set error for the models of each size.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 47: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/47.jpg)
47
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 48: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/48.jpg)
48
Case of Unknown σ
LetWk =
(〈y ,X β(λk+1)〉 − 〈y ,XAβA(λk+1)〉
). (11)
and assuming n > p, let σ2 =∑n
i=1(yi − µfull)2/(n − p). Then asymptotically
Fk =Wk
σ2∼ F2,n−p (12)
[Wj/σ2 is asymptotically Exp(1) which is the same as χ2
2/2, (n − p) · σ2/σ2 isasymptotically χ2
n−p and the two are independent.]
When p > n, σ2 miust be estimated with more care.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 49: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/49.jpg)
49
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 50: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/50.jpg)
50
Extensions
• Elastic Net: Tj is simply scaled by (1 + λ2), where λ2 multiplies the `2
penalty.
• Generalized likelihood models:
Tj = [S0I−1/20 Xβ(λj+1)− S0
T I−1/20 XAβA(λj+1)]/2
where S0, I0 are null score and information matrices, respectively. Works e.g.for generalized linear models and Cox model.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 51: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/51.jpg)
51
Talk Outline
1 Review of lasso, LARS, forward stepwise
2 The covariance test statistic
3 Null distribution of the covariance statistic
4 Theory for orthogonal case
5 Simulations of null distribution
6 General X results
7 Example
8 Case of Unknown σ
9 Extensions to elastic net, generalized linear models, Cox model
10 Discussion and Future work
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 52: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/52.jpg)
52
Future work
• Generic (non-sequential) lasso testing problem: given a lasso fit at a knotλ = λk , what is the p-value for dropping any predictor from model? Wethink we know how to do this, but the details are yet to be worked out
• model selection and FDR using the p-values proposed here
• More general framework! For essentially any regularized loss+penaltyproblem, can derive a p-value for each event along the path. [Group lasso,Clustering, PCA, graphical models ...]
• Software: R library
covTest(larsobj,x,y),
where larsobj is fit from LARS or glmpath [logistic or Cox model (Park andHastie)]. Produces p-values for predictors as they are entered.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 53: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/53.jpg)
53
Stepping back: food for thought
• Does this work suggest something fundamental about lasso/LARS, and theknots λ1, λ2, . . .?
• perhaps LARS/lasso is more “correct” than forward stepwise?
In forward stepwise, a predictor needs to win just one “correlation contest” toenter the model, and then its coefficient is unconstrained; − > overfitting
In LARS, a predictor needs to win a continuous series of correlation contests,at every step, to increase its coefficient.
The covariance test suggests that LARS is taking exactly the right-sized step.
Robert Tibshirani, Stanford University A significance test for the lasso
![Page 54: A significance test for the lasso - Stanford Universitystatweb.stanford.edu › ~tibs › ftp › covtest-talk.pdfThe covariance test statistic: formal de nition Suppose that we wish](https://reader033.fdocuments.in/reader033/viewer/2022060211/5f04d3067e708231d40fe40d/html5/thumbnails/54.jpg)
54
References
Robert Tibshirani, Stanford University A significance test for the lasso