Fitting Models to Data

17

description

Fitting Models to Data. Linear and Quadratic Discriminant Analysis. Decision Trees. AID: Automatic Interaction Detector. Association Co- O ccurence. CHAID. CART: Classification and Regression Trees . CART family is oriented to statistics using the concept of impurity. - PowerPoint PPT Presentation

Transcript of Fitting Models to Data

Page 1: Fitting Models to Data
Page 2: Fitting Models to Data
Page 3: Fitting Models to Data

Fitting Models to DataLinear and Quadratic Discriminant Analysis Decision Trees

Page 4: Fitting Models to Data

Year What Notes Who1963 AID: Automatic Interaction Detector Continuous James Morgan

John Sonquist

1973 THAID: THeta AID Categorical James Morgan Robert Messenger

1980 CHAID: CHi-Square AID Multiple Splits Kass

1984 CART: Classification and Regression Trees

Popular Approach Leo Breiman

1986 Iterative Dichotomiser 3 (ID3) Categorical Quinlan Ross

1994 C4.5 Algorithm Continuous and Categorical Quinlan Ross

1994 Bagging Resampling Leo Breiman

Boosting Cascading Small Trees Rob SchapireJerry Friedman

2001 Random Forests Many trees Leo BreimanAdele Cutler

Page 5: Fitting Models to Data

AID: Automatic Interaction Detector

AssociationCo-Occurence

Page 6: Fitting Models to Data

CHAID

Page 7: Fitting Models to Data

CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurityMeasures how well are the two classes separated – Ideally we would like toseparate all 0s and 1

http://freakonometrics.hypotheses.org/1279

Page 8: Fitting Models to Data

Fitting Models to Data

Page 9: Fitting Models to Data

OverFitting

Page 10: Fitting Models to Data
Page 11: Fitting Models to Data

Bagging• Builds multiple decision trees by repeatedly

resampling training data with replacement

• Fit a Model to each Sample• Voting across the trees for a consensus prediction.

Page 12: Fitting Models to Data

• Learns slowly• Given the current model, we fit a decision tree to the

residuals (misclassifications) from the model. • We then add this new decision tree into the fitted

function in order to update the residuals.• Each of these trees can be rather small, with just a

few terminal nodes, determined by the parameter d in the algorithm.• By fitting small trees to the residuals, we slowly

improve fit in areas where it does not perform well

Boosting

Page 13: Fitting Models to Data

Random Forests

Page 15: Fitting Models to Data
Page 16: Fitting Models to Data

Gradient Boosting

Page 17: Fitting Models to Data

Many AlgorithmsDecision Trees

rpart (CART)tree (CART)ctree (conditional inference tree)CHAID (chi-squared automatic interaction detection)evtree (evolutionary algorithm)mvpart (multivariate CART)knnTree (nearest-neighbor-based trees)RWeka (J4.8, M50, LMT)LogicReg (Logic Regression)BayesTreeTWIX (with extra splits)party (conditional inference trees, model-based trees)

Random ForestsrandomForest(CART-based random forests)randomSurvivalForest(for censored responses)party(conditional random forests)gbm(tree-based gradient boosting)mboost(model-based and tree-based gradient boosting)