ACTG 6310 Chapter 14 – Management Accounting in a Changing Environment.
TGDR: An Introduction - School of Public Healthjulianw/downloads/JW.TGDR.pdf · Application: ACTG...
Transcript of TGDR: An Introduction - School of Public Healthjulianw/downloads/JW.TGDR.pdf · Application: ACTG...
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
TGDR: An Introduction
Julian Wolfson
Student Seminar
March 28, 2007
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
1 Variable Selection
2 Penalization, Solution Paths and TGDR
3 Applying TGDR
4 Extensions
5 Final Thoughts
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Some motivating examples
We are interested in identifying which covariates from a setX = {X1, . . . ,Xp} best predict an outcome Y measured on n individuals,where p >> n. For example:
Y is blood pressure at age 50, X is a set of answers from a lengthyFood Frequency Questionnaire
Y is an indicator of volcano activity, X is a set of geologicalmeasurements in the vicinity of the volcano
Y is a survival endpoint (T ,C ) representing time to acquisition ofHIV drug resistance, X is a portion of the viral genome
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
For the last example, which we will pursue, a typical dataset mighthave n = 300 individuals with amino acid sequences of length 500.
500 sites × 21 possible AAs per site ≈ 10000 covariates.
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
The Problem
When p >> n, standard regression approaches yield estimates withhuge variance and poor predictive ability
Cox regression typically fails with even modestly large numbers ofcovariates (≈ 100)
Standard approaches typically force small/no bias of the parameterestimates, and so do not “trade off” bias and variance.
MSE = Var + Bias2
IdeaAccept some bias in exchange for more stable estimates with betterpredictive power
Select a subset of variables which “best” predicts the outcome
Use the available data to estimate their relative importance
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
The Problem
When p >> n, standard regression approaches yield estimates withhuge variance and poor predictive ability
Cox regression typically fails with even modestly large numbers ofcovariates (≈ 100)
Standard approaches typically force small/no bias of the parameterestimates, and so do not “trade off” bias and variance.
MSE = Var + Bias2
IdeaAccept some bias in exchange for more stable estimates with betterpredictive power
Select a subset of variables which “best” predicts the outcome
Use the available data to estimate their relative importance
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Loss functions
Estimation is based on a loss function L:
Squared-error loss (linear regression):
L =∑
(Yi − Xiβ)2
Negative Log-likelihood (many contexts):
L = −`(β;X )
Negative Log partial likelihood (Cox regression):
L = −`p(β;X )
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Penalization
Common way to trade off bias and variance: penalize loss functionL via P(β)
Yields modified loss L∗.
Two common penalties:1 P(β) =
∑β2
i (Ridge regression)2 P(β) =
∑|βi | (LASSO)
Examples
Linear regression, ridge penalty:
L∗ =∑
(Yi − Xiβ)2 + λ∑
β2i
Cox regression, LASSO penalty:
L∗ = −`p(β, X ) + λ∑
|βi |
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Penalization
Common way to trade off bias and variance: penalize loss functionL via P(β)
Yields modified loss L∗.
Two common penalties:1 P(β) =
∑β2
i (Ridge regression)2 P(β) =
∑|βi | (LASSO)
Examples
Linear regression, ridge penalty:
L∗ =∑
(Yi − Xiβ)2 + λ∑
β2i
Cox regression, LASSO penalty:
L∗ = −`p(β, X ) + λ∑
|βi |
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
We seek
β̂ = arg minβ
L∗ ≡ arg minβ
[L + λP(β)]
Constrained optimization problem (equivalent to “arg minβ L subj toP(β) ≤ λ”)
λ controls how much the estimates are penalized
It also indexes a one-dimensional path through the parameter space
“Optimal” λ usually chosen via cross-validation
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Solution Paths
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Problems of Penalization?
Choice of penalty P(β) defines a set of possible paths- but what if none of these paths passes near the true parameter value?
We might prefer a technique which does not require us to choose apenalty function a priori
Constrained optimization procedures can be tricky to use
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Problems of Penalization?
Choice of penalty P(β) defines a set of possible paths- but what if none of these paths passes near the true parameter value?
We might prefer a technique which does not require us to choose apenalty function a priori
Constrained optimization procedures can be tricky to use
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Enter TGDR
TGDR:Threshold Gradient Descent RegularizationSuggested by Friedman and Popescu (2004)
IdeaConstruct paths in the parameter space iteratively
Choose a point on the constructed path which is “closest” to thetrue parameter value (usually via cross-validation)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Iterative path construction
Basic calculus: g(β) = ∂f∂β gives direction of steepest descent
Steepest descent algorithm for finding minimum of a function f :
β̂(λ + ∆λ) = β̂(λ) + ∆ · g(β)∣∣∣β=β̂(λ)
To reduce instability of estimates, consider instead the step
β̂(λ + ∆λ) = β̂(λ) + ∆ · T(β) · g(β)∣∣∣β=β̂(λ)
Ti (β) = 1[|gi | >= τ · maxk=1,...,p
(|gk |)]
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Thresholding
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Thresholding
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Thresholding
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Recap
We now have a general method for constructing paths in the parameterspace. To apply it, we need:
A (differentiable) loss function (squared error, log-likelihood, etc.)
A way to choose threshold parameter τ
A way to choose path parameter λ
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
TGDR for Cox regression
Gui and Li (2005) extended TGDR for Cox regression (partiallikelihood loss)
Recall:L = −`p(β;X )
g = −∂L
∂β
We started by adapting TGDR to handle time-varying covariates
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Application: ACTG 398
Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years
Current drug regimen
Endpoint of Interest
(T ,C ), where
T is the time until a patient “fails” a drug regimen
C is the censoring indicator
Question
Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Application: ACTG 398
Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years
Current drug regimen
Endpoint of Interest
(T ,C ), where
T is the time until a patient “fails” a drug regimen
C is the censoring indicator
Question
Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Application: ACTG 398
Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years
Current drug regimen
Endpoint of Interest
(T ,C ), where
T is the time until a patient “fails” a drug regimen
C is the censoring indicator
Question
Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Results: ACTG 398 Data
Estimated coefficients from training set (60% of data)
70R 74V 103N 108I 118I 122E 123E 181C 184V 190Aτ K L K V V K D Y M G0.5 0.134 0.258 0.134 −0.164 0.1310.55 0.115 0.421 0.096 0.092 0.117 −0.255 0.1280.6 0.115 0.421 0.117 −0.164 0.1280.65 0.118 0.434 0.125 −0.143 0.1280.7 0.092 0.535 0.086 0.088 0.207 −0.143 0.2290.75 0.105 0.542 0.078 −0.080 0.085 0.075 0.184 −0.143 0.2210.8 0.434 −0.1430.85 −0.063 0.087 0.554 0.143 −0.082 0.088 0.142 0.119 −0.201 0.3680.9 −0.069 0.083 0.554 0.147 −0.082 0.087 0.079 0.119 −0.202 0.3100.95 −0.062 0.145 0.541 0.206 −0.207 0.147 0.141 0.105 −0.204 0.3800.96 −0.062 0.092 0.541 0.206 −0.148 0.144 0.141 0.094 −0.203 0.3870.97 −0.066 0.098 0.535 0.208 −0.149 0.082 0.143 0.087 −0.204 0.3860.98 −0.066 0.092 0.535 0.146 −0.149 0.084 0.143 0.094 −0.205 0.3810.99 −0.066 0.086 0.535 0.147 −0.150 0.087 0.143 0.094 −0.205 0.380
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Results (cont’d)
Get η̂ = X β̂ from test set (40% of data)
HR = Hazard ratio comparing group with η̂ ≥ 0 (“high risk”) toη̂ < 0 (“low risk”)
τ HR 95% CI0.5 2.258 1.438 3.5460.55 2.360 1.499 3.7160.6 2.025 1.290 3.1780.65 2.025 1.290 3.1780.7 2.384 1.492 3.8100.75 2.349 1.476 3.7390.8 2.054 1.311 3.2170.85 2.441 1.549 3.8460.9 2.475 1.571 3.9000.95 2.429 1.537 3.8370.96 2.429 1.537 3.8370.97 2.463 1.558 3.8930.98 2.463 1.558 3.8930.99 2.463 1.558 3.893
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Extensions
For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`
∂β ≡ ˙̀, the score function.
Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:
Missing dataMeasurement errorHeteroskedasticity. . .
Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go
Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Extensions
For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`
∂β ≡ ˙̀, the score function.
Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:
Missing dataMeasurement errorHeteroskedasticity. . .
Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go
Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Extensions
For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`
∂β ≡ ˙̀, the score function.
Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:
Missing dataMeasurement errorHeteroskedasticity. . .
Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go
Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Extensions
For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`
∂β ≡ ˙̀, the score function.
Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:
Missing dataMeasurement errorHeteroskedasticity. . .
Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go
Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Extensions
For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`
∂β ≡ ˙̀, the score function.
Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:
Missing dataMeasurement errorHeteroskedasticity. . .
Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go
Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
Crazy ideas (i.e. future work)
TGDR with more sophisticated steps (Newton-Raphson, BFGS, etc.)
Incorporating biological knowledge (restricting some coefficients> 0, etc.)
TGDR for GEE? (based on estimating functions...)
TGDR as a meta-method? (TGDR with LASSO loss...)
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
In Conclusion
TGDR is...
Variable selection based on thresholded gradient descent
Beautifully simple
Computationally tractable
Easy to extend to more complex data structures
But TGDR is not...
Popular (yet)
Particularly amenable to inference (confidence intervals?)
Well studied from a theoretical perspective:
When does it work?How well does it work?How does it compare to competing methods?
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
In Conclusion
TGDR is...
Variable selection based on thresholded gradient descent
Beautifully simple
Computationally tractable
Easy to extend to more complex data structures
But TGDR is not...
Popular (yet)
Particularly amenable to inference (confidence intervals?)
Well studied from a theoretical perspective:
When does it work?How well does it work?How does it compare to competing methods?
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts
A word about LATEX and presentations
This presentation is a PDF file generated from a LATEX (text) document,with the help of a package called beamer. More info available at
http : //latex− beamer.sourceforge.net/
Ask me if you have any questions... but no guarantees.