Rules Rules Rules! Cubist Regression Models -...

48
Rules Rules Rules! Cubist Regression Models Meeting Max Kuhn Pfizer R&D

Transcript of Rules Rules Rules! Cubist Regression Models -...

Page 1: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Rules Rules Rules! Cubist Regression ModelsMeeting

Max Kuhn

Pfizer R&D

Page 2: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Tree–based Regression Models

Classificaiton and Regression Trees (CART) are a framework for machinelearning models.

A CART searches through each predictor to find a value of a singlevariable that best splits the data into two groups.

typically, the best split minimizes the RMSE of the outcome in theresulting data subsets.

For the two resulting groups, the process is repeated until a hierarchicalstructure (a tree) is created.

in effect, trees partition the X space into rectangular sections thatassign a single value to samples within the rectangle.

To demonstrate, we’ll walk through the first two iterations of this process.

Kuhn (Pfizer R&D) Cubist 2 / 48

Page 3: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Example Data

The data used to illustrate the models are sale prices of homes inSacramento CA.

The original data were obtained from the website for the SpatialKeysoftware. From their website:

The Sacramento real estate transactions file is a list of 985 realestate transactions in the Sacramento area reported over afive-day period, as reported by the Sacramento Bee.

Google was used to fill in missing/incorrect data.

Kuhn (Pfizer R&D) Cubist 3 / 48

Page 4: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Example Data

> library(caret)

> data(Sacramento)

> str(Sacramento, vec.len = 1)

'data.frame': 932 obs. of 9 variables:

$ city : Factor w/ 37 levels "ANTELOPE","AUBURN",..: 34 34 ...

$ zip : Factor w/ 68 levels "z95603","z95608",..: 64 52 ...

$ beds : int 2 3 ...

$ baths : num 1 1 ...

$ sqft : int 836 1167 ...

$ type : Factor w/ 3 levels "Condo","Multi_Family",..: 3 3 ...

$ price : int 59222 68212 ...

$ latitude : num 38.6 ...

$ longitude: num -121 ...

Kuhn (Pfizer R&D) Cubist 4 / 48

Page 5: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Example Data

A random split was used to create a test set with 20% of the data. Thedata are:

> set.seed(955)

> in_train <- createDataPartition(log10(Sacramento$price), p = .8, list = FALSE)

>

> training <- Sacramento[ in_train,]

> testing <- Sacramento[-in_train,]

> nrow(training)

[1] 747

> nrow(testing)

[1] 185

Kuhn (Pfizer R&D) Cubist 5 / 48

Page 6: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Training in Blue, Testing in Red

Kuhn (Pfizer R&D) Cubist 6 / 48

Page 7: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

First Split of a CART Tree

sqft

1

< 1594 ≥ 1594

Node 2 (n = 426)

4.5

5

5.5

6

Node 3 (n = 321)

4.5

5

5.5

6

Kuhn (Pfizer R&D) Cubist 7 / 48

Page 8: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Second Split

sqft

1

<1594 ≥1594

zip

2

Node 3 (n = 200)

4.5

5

5.5

6

Node 4 (n = 226)

4.5

5

5.5

6

zip

5

29 ZIPs 29 ZIPs

Node 6 (n = 180)

4.5

5

5.5

6

Node 7 (n = 141)

4.5

5

5.5

6

17 ZIPs 43 ZIPs

Kuhn (Pfizer R&D) Cubist 8 / 48

Page 9: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Full Tree

sqft

1

zip

2

beds

3

Node 4 (n = 57)

4.5

5

5.5

6

zip

5

Node 6 (n = 11)

4.5

5

5.5

6

Node 7 (n = 132)

4.5

5

5.5

6

zip

8

sqft

9

Node 10 (n = 59)

4.5

5

5.5

6

Node 11 (n = 106)

4.5

5

5.5

6

Node 12 (n = 61)

4.5

5

5.5

6

zip

13

sqft

14

Node 15 (n = 105)

4.5

5

5.5

6

Node 16 (n = 75)

4.5

5

5.5

6

zip

17

Node 18 (n = 78)

4.5

5

5.5

6

Node 19 (n = 63)

4.5

5

5.5

6

14 ZIPs15 ZIPs

29 ZIPs28 ZIPs

24 ZIPs21 ZIPs

14 ZIPs3 ZIPs

<1594 ≥1594

<2.5 ≥2.5

<1998.5 ≥1998.5

17 ZIPs 45 ZIPs

< 1168 ≥ 1168

Kuhn (Pfizer R&D) Cubist 9 / 48

Page 10: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

The Good and Bad of Trees

Trees can be computed very quickly and have simple interpretations.

Also, they have built-in feature selection; if a predictor was not used in anysplit, the model is completely independent of that data.

Unfortunately, trees do not usually have optimal performance whencompared to other methods.

Also, small changes in the data can drastically affect the structure of atree.

This last point has been exploited to improve the performance of trees viaensemble methods where many trees are fit and predictions are aggregatedacross the trees. Examples are bagging, boosting and random forests.

Trees may not fit the data well in the extremes of the outcome range.

Kuhn (Pfizer R&D) Cubist 10 / 48

Page 11: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Poor Fits in the Tails

Test Set Value

Pre

dict

ed V

alue

−10

−5

0

−10 −5 0

Kuhn (Pfizer R&D) Cubist 11 / 48

Page 12: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Model Trees

The model tree approach described in Quinlan (1992) called M5, which issimilar to regression trees except:

the splitting criterion is different,

the terminal nodes predict the outcome using a linear model (asopposed to the simple average), and

when a sample is predicted, it is often a combination of thepredictions from different models along the same path through thetree.

The main implementation of this technique is a “rational reconstruction”of this model called M5’, which is described by Wang and Witten (1997)and is included in the Weka software package.

Kuhn (Pfizer R&D) Cubist 12 / 48

Page 13: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Model Tree Structure

When model trees make a split of the data, they fit a linear model to thecurrent subset using all the predictors involved in the splits along the path.

This process proceeds until there are not enough samples to split and/orfit the model.

A pruning stage is later used to simplfy the model.

Note: Many of the models here are fit with and without encodingcategorical predictors as dummy variables. Tree– and rule–based modelsusually do not require dummy variables.

Kuhn (Pfizer R&D) Cubist 13 / 48

Page 14: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Model Tree Structure

Split 1

Condition 1a Condition 1b

Model 1Split 2

Condition 2a Condition 2b

Model 3Split 3

Condition 3a Condition 3b

Model 6Split 4

Condition 4a Condition 4b

Model 8Model 7Model 5Model 4Model 2

Kuhn (Pfizer R&D) Cubist 14 / 48

Page 15: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Model Tree Predictions

When a sample is predicted, all of the linear models along the path arecombined using:

ypar =nkid ykid + c ypar

nkid + c

ykid is the prediction from the child nodenkid is the number of training set data points in the child nodeypar is the prediction from the parent nodec is a constant with a default value of 15.

For the example data, the unpruned model had 81 paths through the treeand the pruned version used 2 paths.

Kuhn (Pfizer R&D) Cubist 15 / 48

Page 16: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

From Trees to Rules

Tree–based models consist of one or more nested if-then statements forthe predictors that partition the data.

Within these partitions, a model is used to predict the outcome.

For example, a very simple tree could be defined as:

if >= 1.7 then

| if X2 >= 202.1 then Y = 1.3

| else Y = 5.6

else Y = 2.5

Kuhn (Pfizer R&D) Cubist 16 / 48

Page 17: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

From Trees to Rules

Notice that the if-then statements generated by a tree define a uniqueroute to one terminal node for any sample.

A rule is a set of if-then conditions (possibly created by a tree) that havebeen collapsed into independent conditions.

For the example above, there would be three rules:

if X1 >= 1.7 & X2 >= 202.1 then Y = 1.3

if X1 >= 1.7 & X2 < 202.1 then Y = 5.6

if X1 < 1.7 then Y = 2.5

Rules can be simplified or pruned in a way that samples are covered bymultiple rules, eg.

if X1 >= 1.7

Kuhn (Pfizer R&D) Cubist 17 / 48

Page 18: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Rule–Based Models

One path to a terminal node in an unpruned model is

sqft <= 1594 &

zip not in {z95631, z95833, z95758, z95670, 45 others} &

beds <= 2.5 &

latitude > 38.543 &

latitude > 38.615 &

latitude > 38.637 &

latitude <= 38.688 &

latitude <= 38.673

We can convert our model tree to a rule–based model. Many conditionscan be simplifed

Kuhn (Pfizer R&D) Cubist 18 / 48

Page 19: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

“Separate and Conquer” Approach to Rules

First, an initial model tree is created and only the rule with the largestcoverage is saved from this model.

The samples covered by the rule are removed from the training set andanother model tree is created with the remaining data.

Again, only the rule with the maximum coverage is retained.

This process repeats until all the training set data has been covered by atleast one rule.

A new sample is predicted by determining which rule(s) it falls under thenapplies the linear model associated with the largest coverage.

For our data, the unpruned model has 81 and can be reduced shown totwo rules based on sqft ≤ 1594.

Kuhn (Pfizer R&D) Cubist 19 / 48

Page 20: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

An Example of a Terminal Node Model

log10(price) =

- 6.3388

- 0.0032 * city in {GALT, POLLOCK_PINES, ..., GRANITE_BAY}+ 0.0209 * zip in {z95820, z95822, z95626, ..., z95746}+ 0.0015 * zip in {z95673, z95832, z95621, ..., z95746}+ 0.0098 * zip in {z95631, z95833, z95758, ..., z95746}+ 0.0091 * zip in {z95818, z95608, z95662, ..., z95746}+ 0.0033 * zip in {z95814, z95765, z95667, ..., z95746}+ 0.0005 * beds

+ 0.0001 * sqft

+ 0.3097 * latitude

+ 0.0056 * longitude

Kuhn (Pfizer R&D) Cubist 20 / 48

Page 21: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Effect of Smoothing and Pruning Results

Dummy Variables Factors

0.14

0.15

0.16

0.17

Yes No Yes Nosmoothed

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

pruned Yes No

Kuhn (Pfizer R&D) Cubist 21 / 48

Page 22: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Model Trees in R

> library(RWeka)

> model_tree <- M5P(log10(price) ~ ., data = training,

+ ## Make the minimum number of instances per

+ ## leaf higher than the default of 4

+ control = Weka_control(M = 15))

>

> model_tree_unpruned <- M5P(log10(price) ~ ., data = training,

+ control = Weka_control(M = 15, N = TRUE))

Note that the formula method is used but factors are not converted todummy variables.

Kuhn (Pfizer R&D) Cubist 22 / 48

Page 23: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Tuning Model Trees in R

> ctrl <- trainControl(method = "repeatedcv", repeats = 5)

>

> mt_grid <- expand.grid(rules = "Yes",

+ pruned = c("No", "Yes"),

+ smoothed = c("No", "Yes"))

>

> ## will use dummy variables:

> set.seed(139)

> mt_tune_dv <- train(log10(price) ~ ., data = training,

+ method = "M5",

+ tuneGrid = mt_grid,

+ trControl = ctrl)

> ## will not:

> set.seed(139)

> mt_tune <- train(x = training[, -7], y = log10(training$price),

+ method = "M5",

+ tuneGrid = mt_grid,

+ trControl = ctrl)

Setting the seed prior to each call ensures that the same resamples areused.

Kuhn (Pfizer R&D) Cubist 23 / 48

Page 24: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Cubist

Some specific differences between Cubist and the previously describedapproaches for model trees and their rule–based variants are:

the specific techniques used for linear model smoothing, creating rulesand pruning are different,

an optional boosting–like procedure called committees can be used,and

the predictions generated by the model rules can be adjusted usingnearby points from the training set data.

We are indebted to the work of Chris Keefer, who extensively studied theCubist source code to figure out the details.

Kuhn (Pfizer R&D) Cubist 24 / 48

Page 25: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Cubist

Cubist does not use the Separate and Conquer approach to creating rulesfrom trees.

A single tree is created then “flattened” into a set of rules.

The pruning and smoothing procedures are similar to those implementedin M5, but . . .

Kuhn (Pfizer R&D) Cubist 25 / 48

Page 26: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Smoothing Models in CubistCubist has a different formula for combining models up the tree:

ypar = a× ykid + (1− a)× ypar

where

a =V ar(ypar)− b

V ar(ypar) + V ar(ykid)− 2b

b =S11 − 1

nS1S2

n− 1

S1 =

n∑i=1

(yi − yipar)

S2 =

n∑i=1

(yi − yikid)

S12 =

n∑i=1

(yi − yikid)(yi − yipar)

Kuhn (Pfizer R&D) Cubist 26 / 48

Page 27: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Cubist in R

> library(Cubist)

> cb <- cubist(x = training[, -7], y = log10(training$price))

> ## To see the rules + models

> summary(cb)

Kuhn (Pfizer R&D) Cubist 27 / 48

Page 28: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Cubist Base–Model Results

A basic cubist model resulted in 8 rules. For example:

Rule 1: [58 cases, mean 4.980517, range 4.477121 to 5.523746, est err 0.152796]

if zip in {z95621, z95626, z95660, z95673, z95683, z9581, z95817, z95820,

z95822, z95823, z95824, z95826, z95827, z95828, z95832, z95838

z95841, z95842, z95843} and

beds <= 2 then

outcome = 7.944631 + 0.323 beds + 4e-05 sqft + 0.03 longitude

Rule 2: [126 cases, mean 5.200466, range 4.788875 to 5.662758, est err 0.090147]

if zip in {z95626, z95660, z95683, z95815, z95823, z95824, z95827, z95832,

z95838, z95841} and

beds > 2 then

outcome = 8.524561 - 0.056 beds + 0.000342 sqft + 0.03 longitude + 0.003 baths

Kuhn (Pfizer R&D) Cubist 28 / 48

Page 29: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Plotting the Splits

> dotplot(cb)

Percenitle

Rul

e

1

2

3

4

5

6

7

8

0.0 0.2 0.4 0.6 0.8 1.0

beds

0.0 0.2 0.4 0.6 0.8 1.0

sqft

Kuhn (Pfizer R&D) Cubist 29 / 48

Page 30: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Plotting the Slopes

> dotplot(cb, what = "coefs")

12345678

10 20 30 40 50 60

(Intercept)

0.0 0.1 0.2 0.3

beds

−0.02 −0.01 0.00 0.01 0.02

baths

12345678

1e−04 2e−04 3e−04 4e−04

sqft

−0.15 −0.10 −0.05 0.00 0.05 0.10

latitude

0.1 0.2 0.3 0.4 0.5

longitude

Kuhn (Pfizer R&D) Cubist 30 / 48

Page 31: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Cubist Committees

Model committees can be created by generating a sequence of rule–basedmodels (similar to boosting).

The training set outcome is adjusted based on the prior model fit and thenbuilds a new set of rules using this pseudo–response.

Specifically, the kth committee model uses an adjusted response:

yi(k) = 2yi(k−1) − yi(k−1)

Once the full set of committee models are created, new samples arepredicted using each model and the final rule–based prediction is thesimple average of the individual model predictions.

> cb <- cubist(x = training[, -7], y = log10(training$price), committees = 17)

Kuhn (Pfizer R&D) Cubist 31 / 48

Page 32: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Committee Results

0.1325

0.1350

0.1375

0.1400

0.1425

0 10 20 30committees

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Type Dummy Variables Factors

Kuhn (Pfizer R&D) Cubist 32 / 48

Page 33: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Neighbor–Based Adjustments

Cubist has the ability to adjust the model prediction using samples fromthe training set (Quinlan 1993).

When predicting a new sample, the K most similar neighbors aredetermined from the training set.

y =1

K

K∑`=1

w`

[(t` − t`

)+ y]

t` is the observed outcome for a training set neighbor,t` is the model prediction of that neighbor andw` is a weight calculated using the distance of the training set neighborsto the new sample.

> predict(cb, newdata = testing, neighbors = 4)

Kuhn (Pfizer R&D) Cubist 33 / 48

Page 34: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Tuning Model Trees in R

> cb_grid <- expand.grid(committees = c(1:35), neighbors = c(0, 1, 3, 5, 7, 9))

> set.seed(139)

> cb_tune_dv <- train(log10(price) ~ ., data = training,

+ method = "cubist",

+ tuneGrid = cb_grid,

+ trControl = ctrl)

> set.seed(139)

> cb_tune <- train(x = training[, -7], y = log10(training$price),

+ method = "cubist",

+ tuneGrid = cb_grid,

+ trControl = ctrl)

> ggplot(cb_tune) ## to see the profiles

Kuhn (Pfizer R&D) Cubist 34 / 48

Page 35: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Results with Neightbor Correction

Dummy Variables Factors

0.13

0.14

0.15

0 10 20 30 0 10 20 30Committees

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Neighbors0

1

3

5

7

9

Kuhn (Pfizer R&D) Cubist 35 / 48

Page 36: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Comparisons with Other ModelsTest results in red

GBM (DV)

Cubist

Random Forest

Cubist (DV)

SVM (Poly)

GBM

SVM (RBF)

MARS

Random Forest (DV)

CART (DV)

CART

KNN

0.12 0.13 0.14 0.15 0.16RMSE

Kuhn (Pfizer R&D) Cubist 36 / 48

Page 37: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Results with APM ’s Concrete Data Analysis

Boosted Tree

Cubist

Neural Networks

Random Forest

Model Tree

SVM

MARS

Bagged Tree

Elastic Net

PLS

CART

Linear Reg

Cond Inf Tree

5 6 7 8RMSE

Kuhn (Pfizer R&D) Cubist 37 / 48

Page 38: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Results with APM ’s Solubilty Data Analysis

RMSE

CubistSVMrSVMp

Boosted TreeRandom Forest

Elastic NetNeural Net

MARSRidge

PLSLinear Reg.

M5Bagged Cond. Tree

Cond. Random ForestBagged Tree

TreeCond. Tree

KNN

0.6 0.7 0.8 0.9 1.0 1.1

Cross−Validation Test Set

Kuhn (Pfizer R&D) Cubist 38 / 48

Page 39: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Backup Slides

Page 40: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

CART Profiles

0.16

0.18

0.20

0.01 0.10Cp

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Type Dummy Variables Factors

Kuhn (Pfizer R&D) Cubist 40 / 48

Page 41: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Boosted Tree Profiles

Learning Rate: 0.001 Learning Rate: 0.010

0.150

0.175

0.200

0.225

0.150

0.175

0.200

0.225

Dum

my V

ariablesFactors

250 500 750 1000 250 500 750 1000Bootsting Iterations

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Depth 1 3 5 7

Kuhn (Pfizer R&D) Cubist 41 / 48

Page 42: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Random Forest Profiles

0.13

0.15

0.17

0.19

0.21

2 4 6 8Number of Randonly Selected Predictors

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Type Dummy Variables Factors

Kuhn (Pfizer R&D) Cubist 42 / 48

Page 43: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

MARS Profiles

0.140

0.145

0.150

0.155

0.160

10 20 30#Terms

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Product Degree 1 2

Kuhn (Pfizer R&D) Cubist 43 / 48

Page 44: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

SVM (RBF) Profiles using Random Search

1

100

0.001 0.010Sigma

Cos

t

RMSE 0.14 0.16 0.18 0.20

Kuhn (Pfizer R&D) Cubist 44 / 48

Page 45: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

SVM (Poly) Profiles using Random Search

1e−04

1e−02

1e+00

1 100Cost

scal

e

Poly 1 2 3

RMSE 0.2 0.4 0.6

Kuhn (Pfizer R&D) Cubist 45 / 48

Page 46: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

KNN Profiles

0.160

0.165

0.170

0.175

2.5 5.0 7.5 10.0Max. #Neighbors

RM

SE

(R

epea

ted

Cro

ss−

Val

idat

ion)

Kernel rectangular triangular cos gaussian rank

Kuhn (Pfizer R&D) Cubist 46 / 48

Page 47: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

Thanks

Kirk Mettler for the invitation to speak tonight

Chris Keefer for his work with Cubist

Steve Weston, Chris Keefer and Nathan Coulter for adapting the Cubist Ccode to R.

Kuhn (Pfizer R&D) Cubist 47 / 48

Page 48: Rules Rules Rules! Cubist Regression Models - Meetingfiles.meetup.com/1781511/160310_cubist_MaxKuhn.pdf · Cubist Cubist does not use the Separate and Conquer approach to creating

References

Quinlan R (1992). “Learning with Continuous Classes.”Proceedings of the 5th Australian Joint Conference OnArtificial Intelligence, pp. 343-348.

Quinlan R (1993). “Combining InstanceBased andModelBased Learning.” Proceedings of the TenthInternational Conference on Machine Learning, pp. 236-243.

Wang Y, Witten I (1997). “Inducing Model Trees forContinuous Classes.” Proceedings of the Ninth EuropeanConference on Machine Learning, pp. 128-137.

Kuhn (Pfizer R&D) Cubist 48 / 48