MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical...

49

Transcript of MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical...

Page 1: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

;

Mehdi Cherti (Appstat, LAL/CNRS) Supervised by Balázs Kégl (LAL/CNRS) and Alexandre Gramfort (CNRS LTCI)Py-Earth: Multivariate Adaptive Regression Splines (MARS) in PythonOctober 26, 2015 1 / 25

Page 2: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

;

Py-Earth: Multivariate Adaptive RegressionSplines (MARS) in Python

Mehdi Cherti (Appstat, LAL/CNRS)Supervised by Balázs Kégl (LAL/CNRS) and

Alexandre Gramfort (CNRS LTCI)

October 26, 2015

Mehdi Cherti (Appstat, LAL/CNRS) Supervised by Balázs Kégl (LAL/CNRS) and Alexandre Gramfort (CNRS LTCI)Py-Earth: Multivariate Adaptive Regression Splines (MARS) in PythonOctober 26, 2015 1 / 25

Page 3: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

October 26, 2015 2 / 25

Page 4: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Introduction

MARS is a regression technique for high dimensional data:

I introduced �rst by Jerome H.Friedman in 1991

it is non-linear and non-parametric

Py-earth is an implementation of MARS in Python, created byJason Crudy

The goal is to bring Py-earth into scikit-learn

October 26, 2015 2 / 25

Page 5: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Introduction

MARS is a regression technique for high dimensional data:

I introduced �rst by Jerome H.Friedman in 1991

it is non-linear and non-parametric

Py-earth is an implementation of MARS in Python, created byJason Crudy

The goal is to bring Py-earth into scikit-learn

October 26, 2015 2 / 25

Page 6: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Introduction

MARS is a regression technique for high dimensional data:

I introduced �rst by Jerome H.Friedman in 1991

it is non-linear and non-parametric

Py-earth is an implementation of MARS in Python, created byJason Crudy

The goal is to bring Py-earth into scikit-learn

October 26, 2015 2 / 25

Page 7: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Introduction

MARS is a regression technique for high dimensional data:

I introduced �rst by Jerome H.Friedman in 1991

it is non-linear and non-parametric

Py-earth is an implementation of MARS in Python, created byJason Crudy

The goal is to bring Py-earth into scikit-learn

October 26, 2015 2 / 25

Page 8: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Introduction

MARS is a regression technique for high dimensional data:

I introduced �rst by Jerome H.Friedman in 1991

it is non-linear and non-parametric

Py-earth is an implementation of MARS in Python, created byJason Crudy

The goal is to bring Py-earth into scikit-learn

October 26, 2015 2 / 25

Page 9: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Setup

Setup : Multivariate regression with multiple outputs : (Xi ,Yi ) where

I i is the ith exampleI Xi is a vector describing each example iI Yi is a real-valued vector describing the true outputs of the example i

We want to �nd a model f which predicts Y from X with lowgeneralization mean squared error (MSE)

October 26, 2015 3 / 25

Page 10: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Setup

Setup : Multivariate regression with multiple outputs : (Xi ,Yi ) where

I i is the ith exampleI Xi is a vector describing each example iI Yi is a real-valued vector describing the true outputs of the example i

We want to �nd a model f which predicts Y from X with lowgeneralization mean squared error (MSE)

October 26, 2015 3 / 25

Page 11: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Setup

Setup : Multivariate regression with multiple outputs : (Xi ,Yi ) where

I i is the ith exampleI Xi is a vector describing each example iI Yi is a real-valued vector describing the true outputs of the example i

We want to �nd a model f which predicts Y from X with lowgeneralization mean squared error (MSE)

October 26, 2015 3 / 25

Page 12: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Setup

Setup : Multivariate regression with multiple outputs : (Xi ,Yi ) where

I i is the ith exampleI Xi is a vector describing each example iI Yi is a real-valued vector describing the true outputs of the example i

We want to �nd a model f which predicts Y from X with lowgeneralization mean squared error (MSE)

October 26, 2015 3 / 25

Page 13: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

Basic building block : Hinge functions,

y = max(x − k , 0)

or

y = max(k − x , 0)

October 26, 2015 4 / 25

Page 14: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 15: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 16: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 17: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 18: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 19: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

The algorithm constructs adaptively a set of basis functions : Bk(X )

Each basis function is a product of hinge functions, for instance :

I Bk(X ) = max(X1 − 5, 0)max(X2 + 4, 0)

The model is a linear combination of those basis functions

I Y =∑

K

k=1αkBk(X )

The speci�city of MARS comes from how the basis functions arecreated and added to the model

October 26, 2015 5 / 25

Page 20: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

Two steps, the forward pass and the pruning pass

We over-generate a set of basis functions in the forward pass

We prune unncessary basis functions in the pruning pass

October 26, 2015 6 / 25

Page 21: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

Two steps, the forward pass and the pruning pass

We over-generate a set of basis functions in the forward pass

We prune unncessary basis functions in the pruning pass

October 26, 2015 6 / 25

Page 22: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ?

Two steps, the forward pass and the pruning pass

We over-generate a set of basis functions in the forward pass

We prune unncessary basis functions in the pruning pass

October 26, 2015 6 / 25

Page 23: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ? The forward pass

October 26, 2015 7 / 25

Page 24: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ? The forward pass

October 26, 2015 8 / 25

Page 25: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ? The forward pass

October 26, 2015 9 / 25

Page 26: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ? The forward pass

October 26, 2015 10 / 25

Page 27: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

How does MARS work ? The forward pass

y = (α0 + α1max(x1 − 3, 0) + α2max(3− x1, 0)+

α3max(x1 − 3, 0)max(x2 − 7, 0) + α4max(x1 − 3, 0)max(7− x2, 0)+

α5max(x2 − 12, 0) + α6max(12− x2, 0))

(1)October 26, 2015 11 / 25

Page 28: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ?

Github repo : https://github.com/jcrudy/py-earth

I git clone https://github.com/jcrudy/py-earthI cd py-earthI python setup.py install

The state of the code:

I Py-earth supported already a lot of features and the important parts

were there

F However, it was not ready to be deployable to scikit-learnF it was not supporting multiple outputs

October 26, 2015 12 / 25

Page 29: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ?

Github repo : https://github.com/jcrudy/py-earth

I git clone https://github.com/jcrudy/py-earthI cd py-earthI python setup.py install

The state of the code:

I Py-earth supported already a lot of features and the important parts

were there

F However, it was not ready to be deployable to scikit-learnF it was not supporting multiple outputs

October 26, 2015 12 / 25

Page 30: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? improve code quality

� Clean the code (pep8) and adapt it to coding guidelines ofscikit-learn

� Enchance documentation� Add more unit tests

October 26, 2015 13 / 25

Page 31: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? improve code quality

� Clean the code (pep8) and adapt it to coding guidelines ofscikit-learn

� Enchance documentation� Add more unit tests

October 26, 2015 13 / 25

Page 32: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? improve code quality

� Clean the code (pep8) and adapt it to coding guidelines ofscikit-learn

� Enchance documentation� Add more unit tests

October 26, 2015 13 / 25

Page 33: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? new features

Support for multiple outputs

Support for output weights

Support of estimation of variable importance

Implement FastMARS (Jerome H.Friedman, 1993)

October 26, 2015 14 / 25

Page 34: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? new features

Support for multiple outputs

Support for output weights

Support of estimation of variable importance

Implement FastMARS (Jerome H.Friedman, 1993)

October 26, 2015 14 / 25

Page 35: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? new features

Support for multiple outputs

Support for output weights

Support of estimation of variable importance

Implement FastMARS (Jerome H.Friedman, 1993)

October 26, 2015 14 / 25

Page 36: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

What have been done ? new features

Support for multiple outputs

Support for output weights

Support of estimation of variable importance

Implement FastMARS (Jerome H.Friedman, 1993)

October 26, 2015 14 / 25

Page 37: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : 1D example

October 26, 2015 15 / 25

Page 38: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : 2D example

October 26, 2015 16 / 25

Page 39: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : multiple outputs

October 26, 2015 17 / 25

Page 40: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : multiple outputs20 inputs, 3 outputs, only 2 informative inputs, the rest is noise

October 26, 2015 18 / 25

Page 41: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : multiple outputs

The graph (a crop of it) of basis functions looks like this :

October 26, 2015 19 / 25

Page 42: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : variable importance

October 26, 2015 20 / 25

Page 43: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : variable importance

y = sin(πx0x1) + 20(x2 − 0.5)2 + 10x3 + 5x4 + 5 ∗ N(0, 1)The code in the previous slide gives:

October 26, 2015 21 / 25

Page 44: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : FastMARS

October 26, 2015 22 / 25

Page 45: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Example : FastMARSTop : Normal, Bottom : FastMARS

October 26, 2015 23 / 25

Page 46: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Future

Close current issues, keep working on code quality to merge it intoscikit-learn

Still, some features are missing, new features:

I Deal with missing valuesI Support categorical variables

October 26, 2015 24 / 25

Page 47: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Future

Close current issues, keep working on code quality to merge it intoscikit-learn

Still, some features are missing, new features:

I Deal with missing valuesI Support categorical variables

October 26, 2015 24 / 25

Page 48: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Future

Close current issues, keep working on code quality to merge it intoscikit-learn

Still, some features are missing, new features:

I Deal with missing valuesI Support categorical variables

October 26, 2015 24 / 25

Page 49: MARS - IJCLab Events Directory (Indico) · I Deal with missing values I Support categorical variables October 26, 2015 24 / 25. Future Closecurrent issues, keep working oncode qualityto

Thank you

Thank you for listening

October 26, 2015 25 / 25