Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall...
-
Upload
stewart-mcdaniel -
Category
Documents
-
view
216 -
download
2
Transcript of Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall...
![Page 1: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/1.jpg)
Lectures 15,16 – Additive Models, Trees, and Related Methods
Rice ECE697
Farinaz Koushanfar
Fall 2006
![Page 2: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/2.jpg)
Summary
• Generalized Additive Models
• Tree-Based Methods
• PRIM – Bump Hunting
• Mutlivariate Adaptive Regression Splines (MARS)
• Missing Data
![Page 3: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/3.jpg)
Additive Models
• In real life, effects are nonlinear
•
Note: Some slides are borrowed from Tibshirani
![Page 4: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/4.jpg)
Examples
![Page 5: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/5.jpg)
The Price for Additivity
Data from a study of Diabetic children, Predicting log C-peptide(a blood measurement)
![Page 6: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/6.jpg)
Generalized Additive Models (GAM)Two-class Logistic Regression
![Page 7: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/7.jpg)
Other Examples
![Page 8: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/8.jpg)
Fitting Additive Models
• Given observations xi,yi, a criterion like the penalized sum of squares can be specified for this problem, where ’s are tuning parameters
p
1jjj)X(fY The mean of error term is zero!
N
1i
p
1jj
2
jjj
2p
1jijji
p1
dt)t("f})x(fy{
)f,...,f,(PRSS
![Page 9: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/9.jpg)
Fitting Additive Models
![Page 10: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/10.jpg)
The Backfitting Algorithm for Additive Models
• Initialize:
• Cycle: j=1,2,…,p,1,2,…,p,1,…
• Until the functions fj change less than a prespecified threshold
j,i,0f̂;yN
1j
N
1ii
]})x(f̂y[{Sf̂ N
1jk
ikkijj
N
1iijjjj)x(f̂
N
1f̂f̂
![Page 11: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/11.jpg)
Fitting Additive Models (Cont’d)
![Page 12: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/12.jpg)
Example: Penalized Least square
![Page 13: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/13.jpg)
Example: Fitting GAM for Logistic Regression (Newton-Raphson Algorithm)
![Page 14: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/14.jpg)
Example: Predicting Email Spam
• Data from 4601 mail messages, spam=1, email=0, filter trained for each user separately
• Goal: predict whether an email is spam (junk mail) or good
• Input features: relative frequencies in a message of 57 of the commonly occurring words and punctuation marks in all training set
• Not all errors are equal; we want to avoid filtering out good email, while letting spam get through is not desirable but less serious in its consequences
![Page 15: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/15.jpg)
Predictors
![Page 16: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/16.jpg)
Details
![Page 17: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/17.jpg)
Some Important Features
![Page 18: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/18.jpg)
Results
• Test data confusion matrix for the additive logistic regression model fit to the spam training data
• The overall test error rate is 5.3%
![Page 19: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/19.jpg)
Summary of Additive Logistic Fit• Significant predictors from the additive model fit to the spam
training data. The coefficients represent the linear part of f^j,
along with their standard errors and Z-score. • The nonlinear p-value represents a test of nonlinearity of f^
j
![Page 20: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/20.jpg)
Example: Plots for Spam Analysis
Figure 9.1. Spam analysis: estimated functions for significant predictors. The rug plot along the bottom of each frame indicates the observed values of the corresponding predictor. For many predictors, the nonlinearity picks up the discontinuity at zero.
![Page 21: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/21.jpg)
In Summary
• Additive models are a useful extension to linear models, making them more flexible
• The backfitting procedure is simple and modular
• Limitations for large data mining applications
• Backfitting fits all predictors, which is not desirable when a large number are available
![Page 22: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/22.jpg)
Tree-Based Methods
![Page 23: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/23.jpg)
Node Impurity Measures
![Page 24: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/24.jpg)
Results for Spam Example
![Page 25: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/25.jpg)
Pruned tree for the Spam Example
![Page 26: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/26.jpg)
Classification Rules Fit to the Spam Data
![Page 27: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/27.jpg)
PRIM-Bump Hunting
![Page 28: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/28.jpg)
Number of Observations in a Box
![Page 29: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/29.jpg)
Basis Functions
![Page 30: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/30.jpg)
MARS Forward Modeling Procedure
![Page 31: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/31.jpg)
Multiplication of Basis Functions
![Page 32: Lectures 15,16 – Additive Models, Trees, and Related Methods Rice ECE697 Farinaz Koushanfar Fall 2006.](https://reader035.fdocuments.in/reader035/viewer/2022062804/5697bf851a28abf838c878ba/html5/thumbnails/32.jpg)
MARS on Spam Example