TGDR: An Introduction - School of Public Healthjulianw/downloads/JW.TGDR.pdf · Application: ACTG...

35
Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007

Transcript of TGDR: An Introduction - School of Public Healthjulianw/downloads/JW.TGDR.pdf · Application: ACTG...

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

TGDR: An Introduction

Julian Wolfson

Student Seminar

March 28, 2007

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

1 Variable Selection

2 Penalization, Solution Paths and TGDR

3 Applying TGDR

4 Extensions

5 Final Thoughts

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Some motivating examples

We are interested in identifying which covariates from a setX = {X1, . . . ,Xp} best predict an outcome Y measured on n individuals,where p >> n. For example:

Y is blood pressure at age 50, X is a set of answers from a lengthyFood Frequency Questionnaire

Y is an indicator of volcano activity, X is a set of geologicalmeasurements in the vicinity of the volcano

Y is a survival endpoint (T ,C ) representing time to acquisition ofHIV drug resistance, X is a portion of the viral genome

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

For the last example, which we will pursue, a typical dataset mighthave n = 300 individuals with amino acid sequences of length 500.

500 sites × 21 possible AAs per site ≈ 10000 covariates.

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

The Problem

When p >> n, standard regression approaches yield estimates withhuge variance and poor predictive ability

Cox regression typically fails with even modestly large numbers ofcovariates (≈ 100)

Standard approaches typically force small/no bias of the parameterestimates, and so do not “trade off” bias and variance.

MSE = Var + Bias2

IdeaAccept some bias in exchange for more stable estimates with betterpredictive power

Select a subset of variables which “best” predicts the outcome

Use the available data to estimate their relative importance

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

The Problem

When p >> n, standard regression approaches yield estimates withhuge variance and poor predictive ability

Cox regression typically fails with even modestly large numbers ofcovariates (≈ 100)

Standard approaches typically force small/no bias of the parameterestimates, and so do not “trade off” bias and variance.

MSE = Var + Bias2

IdeaAccept some bias in exchange for more stable estimates with betterpredictive power

Select a subset of variables which “best” predicts the outcome

Use the available data to estimate their relative importance

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Loss functions

Estimation is based on a loss function L:

Squared-error loss (linear regression):

L =∑

(Yi − Xiβ)2

Negative Log-likelihood (many contexts):

L = −`(β;X )

Negative Log partial likelihood (Cox regression):

L = −`p(β;X )

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Penalization

Common way to trade off bias and variance: penalize loss functionL via P(β)

Yields modified loss L∗.

Two common penalties:1 P(β) =

∑β2

i (Ridge regression)2 P(β) =

∑|βi | (LASSO)

Examples

Linear regression, ridge penalty:

L∗ =∑

(Yi − Xiβ)2 + λ∑

β2i

Cox regression, LASSO penalty:

L∗ = −`p(β, X ) + λ∑

|βi |

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Penalization

Common way to trade off bias and variance: penalize loss functionL via P(β)

Yields modified loss L∗.

Two common penalties:1 P(β) =

∑β2

i (Ridge regression)2 P(β) =

∑|βi | (LASSO)

Examples

Linear regression, ridge penalty:

L∗ =∑

(Yi − Xiβ)2 + λ∑

β2i

Cox regression, LASSO penalty:

L∗ = −`p(β, X ) + λ∑

|βi |

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

We seek

β̂ = arg minβ

L∗ ≡ arg minβ

[L + λP(β)]

Constrained optimization problem (equivalent to “arg minβ L subj toP(β) ≤ λ”)

λ controls how much the estimates are penalized

It also indexes a one-dimensional path through the parameter space

“Optimal” λ usually chosen via cross-validation

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Solution Paths

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Problems of Penalization?

Choice of penalty P(β) defines a set of possible paths- but what if none of these paths passes near the true parameter value?

We might prefer a technique which does not require us to choose apenalty function a priori

Constrained optimization procedures can be tricky to use

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Problems of Penalization?

Choice of penalty P(β) defines a set of possible paths- but what if none of these paths passes near the true parameter value?

We might prefer a technique which does not require us to choose apenalty function a priori

Constrained optimization procedures can be tricky to use

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Enter TGDR

TGDR:Threshold Gradient Descent RegularizationSuggested by Friedman and Popescu (2004)

IdeaConstruct paths in the parameter space iteratively

Choose a point on the constructed path which is “closest” to thetrue parameter value (usually via cross-validation)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Iterative path construction

Basic calculus: g(β) = ∂f∂β gives direction of steepest descent

Steepest descent algorithm for finding minimum of a function f :

β̂(λ + ∆λ) = β̂(λ) + ∆ · g(β)∣∣∣β=β̂(λ)

To reduce instability of estimates, consider instead the step

β̂(λ + ∆λ) = β̂(λ) + ∆ · T(β) · g(β)∣∣∣β=β̂(λ)

Ti (β) = 1[|gi | >= τ · maxk=1,...,p

(|gk |)]

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Thresholding

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Thresholding

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Thresholding

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Recap

We now have a general method for constructing paths in the parameterspace. To apply it, we need:

A (differentiable) loss function (squared error, log-likelihood, etc.)

A way to choose threshold parameter τ

A way to choose path parameter λ

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

TGDR for Cox regression

Gui and Li (2005) extended TGDR for Cox regression (partiallikelihood loss)

Recall:L = −`p(β;X )

g = −∂L

∂β

We started by adapting TGDR to handle time-varying covariates

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years

Current drug regimen

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years

Current drug regimen

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Application: ACTG 398

Relevant DataHIV envelope protein sequences collected post-infection forapproximately two years

Current drug regimen

Endpoint of Interest

(T ,C ), where

T is the time until a patient “fails” a drug regimen

C is the censoring indicator

Question

Which amino acid positions on HIV (mutations, insertions, deletions) areassociated with time until drug regimen failure?

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Results: ACTG 398 Data

Estimated coefficients from training set (60% of data)

70R 74V 103N 108I 118I 122E 123E 181C 184V 190Aτ K L K V V K D Y M G0.5 0.134 0.258 0.134 −0.164 0.1310.55 0.115 0.421 0.096 0.092 0.117 −0.255 0.1280.6 0.115 0.421 0.117 −0.164 0.1280.65 0.118 0.434 0.125 −0.143 0.1280.7 0.092 0.535 0.086 0.088 0.207 −0.143 0.2290.75 0.105 0.542 0.078 −0.080 0.085 0.075 0.184 −0.143 0.2210.8 0.434 −0.1430.85 −0.063 0.087 0.554 0.143 −0.082 0.088 0.142 0.119 −0.201 0.3680.9 −0.069 0.083 0.554 0.147 −0.082 0.087 0.079 0.119 −0.202 0.3100.95 −0.062 0.145 0.541 0.206 −0.207 0.147 0.141 0.105 −0.204 0.3800.96 −0.062 0.092 0.541 0.206 −0.148 0.144 0.141 0.094 −0.203 0.3870.97 −0.066 0.098 0.535 0.208 −0.149 0.082 0.143 0.087 −0.204 0.3860.98 −0.066 0.092 0.535 0.146 −0.149 0.084 0.143 0.094 −0.205 0.3810.99 −0.066 0.086 0.535 0.147 −0.150 0.087 0.143 0.094 −0.205 0.380

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Results (cont’d)

Get η̂ = X β̂ from test set (40% of data)

HR = Hazard ratio comparing group with η̂ ≥ 0 (“high risk”) toη̂ < 0 (“low risk”)

τ HR 95% CI0.5 2.258 1.438 3.5460.55 2.360 1.499 3.7160.6 2.025 1.290 3.1780.65 2.025 1.290 3.1780.7 2.384 1.492 3.8100.75 2.349 1.476 3.7390.8 2.054 1.311 3.2170.85 2.441 1.549 3.8460.9 2.475 1.571 3.9000.95 2.429 1.537 3.8370.96 2.429 1.537 3.8370.97 2.463 1.558 3.8930.98 2.463 1.558 3.8930.99 2.463 1.558 3.893

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Extensions

For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`

∂β ≡ ˙̀, the score function.

Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go

Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Extensions

For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`

∂β ≡ ˙̀, the score function.

Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go

Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Extensions

For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`

∂β ≡ ˙̀, the score function.

Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go

Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Extensions

For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`

∂β ≡ ˙̀, the score function.

Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go

Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Extensions

For log-likelihood (or log partial likelihood) loss, the descentdirection is just g = ∂`

∂β ≡ ˙̀, the score function.

Extensive literature on modified/adapted/approximate/quasi scorefunctions which allow for:

Missing dataMeasurement errorHeteroskedasticity. . .

Straightforward to incorporate these methods which propose somemodification g∗ of our original step direction g . Go

Currently working on allowing TGDR to handle missing data (basedon work of Lin and Ying) and measurement error (Augustin)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Crazy ideas (i.e. future work)

TGDR with more sophisticated steps (Newton-Raphson, BFGS, etc.)

Incorporating biological knowledge (restricting some coefficients> 0, etc.)

TGDR for GEE? (based on estimating functions...)

TGDR as a meta-method? (TGDR with LASSO loss...)

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

In Conclusion

TGDR is...

Variable selection based on thresholded gradient descent

Beautifully simple

Computationally tractable

Easy to extend to more complex data structures

But TGDR is not...

Popular (yet)

Particularly amenable to inference (confidence intervals?)

Well studied from a theoretical perspective:

When does it work?How well does it work?How does it compare to competing methods?

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

In Conclusion

TGDR is...

Variable selection based on thresholded gradient descent

Beautifully simple

Computationally tractable

Easy to extend to more complex data structures

But TGDR is not...

Popular (yet)

Particularly amenable to inference (confidence intervals?)

Well studied from a theoretical perspective:

When does it work?How well does it work?How does it compare to competing methods?

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

A word about LATEX and presentations

This presentation is a PDF file generated from a LATEX (text) document,with the help of a package called beamer. More info available at

http : //latex− beamer.sourceforge.net/

Ask me if you have any questions... but no guarantees.

Variable Selection Penalization, Solution Paths and TGDR Applying TGDR Extensions Final Thoughts

Acknowledgements

Prof. Peter Gilbert (thesis supervisor)

Prof. Victor DeGruttola (for providing ACTG data)

Thanks!

Questions?