Reading the Lasso 1996 paper by Robert Tibshirani
-
Upload
christian-robert -
Category
Education
-
view
4.913 -
download
17
description
Transcript of Reading the Lasso 1996 paper by Robert Tibshirani
![Page 1: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/1.jpg)
READING SEMINAR ON CLASSICS
Regression Shrinkage and Selection via the LASSOBy Robert Tibshirani
Presented by Ulcinaite Agne
November 4, 2012
Presented by Ulcinaite Agne LASSO November 4, 2012 1 / 41
![Page 2: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 3: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/3.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 4: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/4.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 5: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/5.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 6: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/6.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 7: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/7.jpg)
Outline
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
![Page 8: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/8.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 3 / 41
![Page 9: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/9.jpg)
Introduction
The Article
Regression Shrinkage and Selection via the LASSO by RobertTibshirani
Published in 1996 for the Royal Statistical Society. Series B(Methodological), vol. 58, No.1
Presented by Ulcinaite Agne LASSO November 4, 2012 4 / 41
![Page 10: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/10.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 5 / 41
![Page 11: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/11.jpg)
OLS estimates
We consider the usual regression situation.The data: ( xi , y i ), i = 1, . . . ,N, where xi = (xi1, . . . , xip)T and yiare the regressors and the response for the ith observation.
The ordinary least square (OLS) estimates minimize the residual sum ofsquares (RSS):
RSS =N∑i=1
(yi − βo −p∑
j=1
xijβj)2
Presented by Ulcinaite Agne LASSO November 4, 2012 6 / 41
![Page 12: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/12.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 7 / 41
![Page 13: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/13.jpg)
OLS critics
The two reasons why data analysts are often not satisfied with OLSestimates:
Prediction accuracy: OLS estimates having low bias but large variance
Iterpretation: when having too much predictors, it would be better tohave smaller subset exhibiting stronger effects
Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
![Page 14: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/14.jpg)
OLS critics
The two reasons why data analysts are often not satisfied with OLSestimates:
Prediction accuracy: OLS estimates having low bias but large variance
Iterpretation: when having too much predictors, it would be better tohave smaller subset exhibiting stronger effects
Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
![Page 15: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/15.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 9 / 41
![Page 16: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/16.jpg)
Standard improving techniques
Subset selection: small changes in data can result in very differentmodels
Ridge regression:
βridge = argmin
N∑i=1
(yi − β0 −∑j
βjxij)2
subject to ∑
j
β2j ≤ t
Does not set any of the coefficients to 0 and hence does not give aneasily interpretable model
Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
![Page 17: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/17.jpg)
Standard improving techniques
Subset selection: small changes in data can result in very differentmodels
Ridge regression:
βridge = argmin
N∑i=1
(yi − β0 −∑j
βjxij)2
subject to ∑
j
β2j ≤ t
Does not set any of the coefficients to 0 and hence does not give aneasily interpretable model
Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
![Page 18: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/18.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 11 / 41
![Page 19: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/19.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 12 / 41
![Page 20: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/20.jpg)
Definition
We are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . ,N, where xi = (xi1, . . . , xip)T
The LASSO (Least Absolute Shrinkage and Selection Operator) estimate(α, β) is defined by
(α, β) = argmin
N∑i=1
(yi − α−∑j
βjxij)2
subject to ∑
j
|βj | ≤ t
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
![Page 21: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/21.jpg)
Definition
We are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . ,N, where xi = (xi1, . . . , xip)T
The LASSO (Least Absolute Shrinkage and Selection Operator) estimate(α, β) is defined by
(α, β) = argmin
N∑i=1
(yi − α−∑j
βjxij)2
subject to ∑j
|βj | ≤ t
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
![Page 22: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/22.jpg)
Definition
We are considering the same data as in OLS estimation case:( xi , y i ), i = 1, . . . ,N, where xi = (xi1, . . . , xip)T
The LASSO (Least Absolute Shrinkage and Selection Operator) estimate(α, β) is defined by
(α, β) = argmin
N∑i=1
(yi − α−∑j
βjxij)2
subject to ∑
j
|βj | ≤ t
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
![Page 23: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/23.jpg)
Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates.
Let βoj be the full least square estimates and let t0 =∑|βoj |.
Values t < t0 will shrink the solutions towards 0, some coefficients makingequal to 0.
For example, taking t = t0/2, we will have the effect roughly similar tofinding the best subset of size p/2.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
![Page 24: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/24.jpg)
Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates.
Let βoj be the full least square estimates and let t0 =∑|βoj |.
Values t < t0 will shrink the solutions towards 0, some coefficients makingequal to 0.
For example, taking t = t0/2, we will have the effect roughly similar tofinding the best subset of size p/2.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
![Page 25: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/25.jpg)
Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is appliedto the estimates.
Let βoj be the full least square estimates and let t0 =∑|βoj |.
Values t < t0 will shrink the solutions towards 0, some coefficients makingequal to 0.
For example, taking t = t0/2, we will have the effect roughly similar tofinding the best subset of size p/2.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
![Page 26: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/26.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 15 / 41
![Page 27: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/27.jpg)
Motivation for LASSO
LASSO came from the proposal of Breiman (1993).Breiman’s non-negative garotte minimizes
N∑i=1
(yi − α−∑j
cjβoj xij)
2
subject to
cj ≥ 0,∑
cj ≤ t.
Presented by Ulcinaite Agne LASSO November 4, 2012 16 / 41
![Page 28: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/28.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 17 / 41
![Page 29: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/29.jpg)
Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XTX = I.The solution of previous minimization problem is
βj = sign(βoj )(|βoj | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 11+γ β
oj
Garotte estimates: (1− γ/βo2j )+βoj
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
![Page 30: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/30.jpg)
Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XTX = I.The solution of previous minimization problem is
βj = sign(βoj )(|βoj | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 11+γ β
oj
Garotte estimates: (1− γ/βo2j )+βoj
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
![Page 31: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/31.jpg)
Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XTX = I.The solution of previous minimization problem is
βj = sign(βoj )(|βoj | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 11+γ β
oj
Garotte estimates: (1− γ/βo2j )+βoj
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
![Page 32: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/32.jpg)
Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XTX = I.The solution of previous minimization problem is
βj = sign(βoj )(|βoj | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 11+γ β
oj
Garotte estimates: (1− γ/βo2j )+βoj
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
![Page 33: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/33.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 19 / 41
![Page 34: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/34.jpg)
Function forms
(a) Subset regression, (b) ridge regression, (c) the LASSO, (d) the garrotte
Presented by Ulcinaite Agne LASSO November 4, 2012 20 / 41
![Page 35: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/35.jpg)
Estimation picture for (a) the LASSO and (b) ridge regression
Presented by Ulcinaite Agne LASSO November 4, 2012 21 / 41
![Page 36: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/36.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 22 / 41
![Page 37: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/37.jpg)
Example of prostate cancer
Data examined: from a study byStamey(1989)The factors:
log(cancer volume) lcavol
log(prostate weigth) lweigth
age
log(benign prostatic hyperplasiaamount) lbph
seminal vesicle invasion svi
log(capsular penetration) lcp
Gleason score gleason
percentage Gleason scores pgg45
Linear model to log(prostate specificantigen) lpsa
Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
![Page 38: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/38.jpg)
Example of prostate cancer
Data examined: from a study byStamey(1989)The factors:
log(cancer volume) lcavol
log(prostate weigth) lweigth
age
log(benign prostatic hyperplasiaamount) lbph
seminal vesicle invasion svi
log(capsular penetration) lcp
Gleason score gleason
percentage Gleason scores pgg45
Linear model to log(prostate specificantigen) lpsa
Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
![Page 39: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/39.jpg)
Statistics of the example
Estimated coefficients and test error results, for different subset andshrinkage methods applied to the prostate data. The blank entriescorrespond to variables omitted.
Presented by Ulcinaite Agne LASSO November 4, 2012 24 / 41
![Page 40: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/40.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 25 / 41
![Page 41: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/41.jpg)
Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
![Page 42: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/42.jpg)
Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
![Page 43: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/43.jpg)
Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
![Page 44: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/44.jpg)
Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
![Page 45: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/45.jpg)
Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
![Page 46: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/46.jpg)
Prediction error and estimation of t
Suppose thatY = η(X) + ε
where E (ε) = 0 and var(ε) = σ2
ME = E{η(X)− η(X)}2
PE = E{Y − η(X)}2 = ME + σ2
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
![Page 47: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/47.jpg)
Prediction error and estimation of t
Suppose thatY = η(X) + ε
where E (ε) = 0 and var(ε) = σ2
ME = E{η(X)− η(X)}2
PE = E{Y − η(X)}2 = ME + σ2
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
![Page 48: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/48.jpg)
Prediction error and estimation of t
Suppose thatY = η(X) + ε
where E (ε) = 0 and var(ε) = σ2
ME = E{η(X)− η(X)}2
PE = E{Y − η(X)}2 = ME + σ2
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
![Page 49: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/49.jpg)
Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. TheLASSO is indexed in terms of the normalised parameter s = t/
∑βoj , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and theremaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
![Page 50: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/50.jpg)
Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. TheLASSO is indexed in terms of the normalised parameter s = t/
∑βoj , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and theremaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
![Page 51: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/51.jpg)
Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. TheLASSO is indexed in terms of the normalised parameter s = t/
∑βoj , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and theremaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
![Page 52: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/52.jpg)
Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. TheLASSO is indexed in terms of the normalised parameter s = t/
∑βoj , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and theremaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
![Page 53: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/53.jpg)
Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. TheLASSO is indexed in terms of the normalised parameter s = t/
∑βoj , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and theremaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
![Page 54: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/54.jpg)
Generalized Cross-validation
The constrained is re-written as∑β2j /|βj | ≤ t. So the constrained
solution β can be expressed as the ridge regression estimator
β = (XTX + λW−)−1XT y
where W = diag(|βj |) and W− denotes a generalized inverse. The numberof effective parameters in the constrained fit β may be approximated by
p(t) = tr{
X(XTX + λW−)−1XT )}
The generalised cross-validation style statistic
GCV (t) =1
N
RSS(t)
{1− p(t)/N}2
Presented by Ulcinaite Agne LASSO November 4, 2012 29 / 41
![Page 55: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/55.jpg)
Unbiased estimate of risk
This method is based on Stein’s (1981) unbiased estimate of risk.Denote the estimated standard error of βoj by τ = σ/
√N, where
σ2 =∑
(yi − yi )2/(N − p). Then the formula is derived
R{β(γ)
}≈ τ2
p − 2#(j ; |βoj /τ | < γ) +
p∑j=1
max(|βoj /τ |, γ)2
as an approximately unbiased estimate of the risk . Hence an estimate of
γ can be obtained as the minimizer of R{β(γ)
}:
γ = argminγ≥0[R{β(γ)
}].
From this we obtain an estimate of the LASSO parameter t:
t =∑
(|βoj | − γ)+.
Presented by Ulcinaite Agne LASSO November 4, 2012 30 / 41
![Page 56: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/56.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 31 / 41
![Page 57: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/57.jpg)
Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N∑i=1
(yi −∑j
βjxij)2
subject to∑
j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Denote G an m × p matrix, corresponding to m linear inequalityconstraints of the p-vector β. For our problem, m = 2p.Denote g(β) =
∑Ni=1(yi −
∑j βjxij)
2.Set E is the equality set corresponding to those constraints which areexactly met.
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
![Page 58: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/58.jpg)
Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N∑i=1
(yi −∑j
βjxij)2
subject to∑
j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.Denote G an m × p matrix, corresponding to m linear inequalityconstraints of the p-vector β. For our problem, m = 2p.
Denote g(β) =∑N
i=1(yi −∑
j βjxij)2.
Set E is the equality set corresponding to those constraints which areexactly met.
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
![Page 59: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/59.jpg)
Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N∑i=1
(yi −∑j
βjxij)2
subject to∑
j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.Denote G an m × p matrix, corresponding to m linear inequalityconstraints of the p-vector β. For our problem, m = 2p.Denote g(β) =
∑Ni=1(yi −
∑j βjxij)
2.Set E is the equality set corresponding to those constraints which areexactly met.
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
![Page 60: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/60.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 61: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/61.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 62: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/62.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 63: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/63.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 64: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/64.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 65: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/65.jpg)
Algorithm for finding LASSO solutions
Outline of the algorithm
1 Start with E = {i0} where δi0 = sign(βo)
2 Find β to minimize g(β) subject to GEβ ≤ t1
3 While{∑|βj | > t
},
4 add i to the set E where δi = sign(β). Find β to minimize
g(β) =N∑i=1
(yi −∑j
βjxij)2
subject to GEβ ≤ t1.
This procedure must always converge to in a finite number of steps sinceone element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
![Page 66: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/66.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 67: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/67.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 68: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/68.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 69: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/69.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 70: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/70.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 71: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/71.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 72: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/72.jpg)
Least angle regression algorithm (Efron 2004)
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Startwith the residual r = y − y , β1, . . . , βp = 0.
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r), until someother competitor xk has as much correlation with the current residualas does xj .
4 Move βj and βk in the direction defined by their joint least squarescoefficient of the current residual on (xj , xk), until some othercompetitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active setof variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. Aftermin(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
![Page 73: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/73.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 35 / 41
![Page 74: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/74.jpg)
Simulation
In the example, 50 data sets consisting of 20 observations from the model
y = βT + σε
were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and ε is standardnormal.
Mean-squared errors over 200 simulations from the model
Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
![Page 75: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/75.jpg)
Simulation
In the example, 50 data sets consisting of 20 observations from the model
y = βT + σε
were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and ε is standardnormal.
Mean-squared errors over 200 simulations from the model
Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
![Page 76: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/76.jpg)
Simulation
Most frequent models selected byLASSO
Most frequent models selected bysubset regression
Presented by Ulcinaite Agne LASSO November 4, 2012 37 / 41
![Page 77: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/77.jpg)
Table of Contents
1 Introduction
2 OLS estimatesOLS criticsStandard improving techniques
3 LASSODefinitionMotivation for LASSOOrthonormal design caseFunction formsExample of prostate cancerPrediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 38 / 41
![Page 78: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/78.jpg)
Conclusions
LASSO - a worthy competitor to subset selection and ridge regression.
Performance in different scenarios:
Small number of large effects - Subset selection does best, LASSO- not quite as well, ridge regression - quite poorly.
Small to moderate number of moderate-size effects - LASSOdoes best, followed by ridge regression and then subset selection.
Large number of small effects - Ridge regression does best,followed by LASSO and then subset selection.
Presented by Ulcinaite Agne LASSO November 4, 2012 39 / 41
![Page 79: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/79.jpg)
References
Robert Tibshirani (1996)
Regression Shrinkage and Selection via the LASSO
Journal of the Royal Statistical Society 58(1), 267–288.
Travor Hastie, Robert Tibshirani, Jerome Friedman (2008)
The Elements of Statistical Learning
Springer-Verlag, 57–73.
Abhimanyu Das, David Kempe
Algorithms for Subset Selection in Linear Regression
Yizao Wang (2007)
A Note on the LASSO in Model Selection
Presented by Ulcinaite Agne LASSO November 4, 2012 40 / 41
![Page 80: Reading the Lasso 1996 paper by Robert Tibshirani](https://reader038.fdocuments.in/reader038/viewer/2022102603/554e7f0fb4c9054a698b53a1/html5/thumbnails/80.jpg)
The End
Presented by Ulcinaite Agne LASSO November 4, 2012 41 / 41