Fitting Models to Data

Fitting Models to DataLinear and Quadratic Discriminant Analysis Decision Trees

Year What Notes Who1963 AID: Automatic Interaction Detector Continuous James Morgan

John Sonquist

1973 THAID: THeta AID Categorical James Morgan Robert Messenger

1980 CHAID: CHi-Square AID Multiple Splits Kass

1984 CART: Classification and Regression Trees

Popular Approach Leo Breiman

1986 Iterative Dichotomiser 3 (ID3) Categorical Quinlan Ross

1994 C4.5 Algorithm Continuous and Categorical Quinlan Ross

1994 Bagging Resampling Leo Breiman

Boosting Cascading Small Trees Rob SchapireJerry Friedman

2001 Random Forests Many trees Leo BreimanAdele Cutler

AID: Automatic Interaction Detector

AssociationCo-Occurence

CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurityMeasures how well are the two classes separated – Ideally we would like toseparate all 0s and 1

http://freakonometrics.hypotheses.org/1279



OverFitting

Bagging• Builds multiple decision trees by repeatedly

resampling training data with replacement

• Fit a Model to each Sample• Voting across the trees for a consensus prediction.

• Learns slowly• Given the current model, we fit a decision tree to the

residuals (misclassifications) from the model. • We then add this new decision tree into the fitted

function in order to update the residuals.• Each of these trees can be rather small, with just a

few terminal nodes, determined by the parameter d in the algorithm.• By fitting small trees to the residuals, we slowly

improve fit in areas where it does not perform well

Boosting

Random Forests

http://www.stat.berkeley.edu/~breiman/RandomForests/



Gradient Boosting

Many AlgorithmsDecision Trees

rpart (CART)tree (CART)ctree (conditional inference tree)CHAID (chi-squared automatic interaction detection)evtree (evolutionary algorithm)mvpart (multivariate CART)knnTree (nearest-neighbor-based trees)RWeka (J4.8, M50, LMT)LogicReg (Logic Regression)BayesTreeTWIX (with extra splits)party (conditional inference trees, model-based trees)

Random ForestsrandomForest(CART-based random forests)randomSurvivalForest(for censored responses)party(conditional random forests)gbm(tree-based gradient boosting)mboost(model-based and tree-based gradient boosting)

Fitting Models to Data

Documents

Transcript of Fitting Models to Data