MARS - Multivariate Adaptive Regression Splines - Visionday

1
MARS - Multivariate Adaptive Regression Splines Giulia Prando - MSc Mathematical Modelling and Computation Department of Informatics and Mathematical Modelling Introduction What? Method for multivariate regression When? In 1990 by Jerome H. Friedman Why? To overcome disadvantages of already existing methods: Global parametric modelling Non-parametric modelling Adaptive methods Regression trees (CART) Main Characteristics MARS introduces the following modifications to the CART algorithm: new type of basis functions: the Heavyside functions H[±(x - t)] are replaced by the two-sided truncated power splines [±(x - t)] q + , being t the knot site usually q=1 for the continuity of the approximating function from the solution a continuous derivative solution is then derived not removing the parent basis function after it has been split parent and both its daughters are eligible for further splitting the corresponding regions overlap restricting the product associated with each basis function to factors involving distinct predictor variables products of splines with dependencies on individual variables of power greater than q are not allowed. Two stages of MARS Forward Stage 1. start with only the constant function h 0 (x) = 1 2. at each stage consider all products of a basis function h m in the model set M with one pair in C 3. add to the model M the new pair of basis functions h l (x) · (x j - t) q + , h l (x) · (t - x j ) q + , h l ∈M and the term a M+1 h l (x) · (x j - t) q + +a M+2 h l (x) · (t - x j ) q + that gives the largest decrease in the training error; coefficients a M+1 and a M+2 are estimated by least-squares. Backward Stage Why pruning? To reduce the dimension of the model and thus avoid overfitting How to prune? At each step the term whose removal causes the smallest increase in RSS is removed When to stop pruning? When the model reaches the optimal size λ * , selected using Generalized Cross Validation Example - Regression of a bivariate function Linear Splines - Univariate basis functions (a) 1 basis function. (b) 3 basis functions. (c) 5 basis functions. (d) 7 basis functions. (e) Final model without pruning (15 basis funcs). (f) Final model with pruning (4 basis funcs). (g) Final model with pruning - Other view. Linear Splines - Bivariate basis functions (h) 1 basis function. (i) 3 basis functions. (j) 5 basis functions. (k) 7 basis functions. (l) Final model without pruning (23 basis funcs). (m) Final model with pruning (7 basis funcs). (n) Final model with pruning - Other view. Cubic Splines - Univariate basis functions (o) Final model without pruning (15 basis funcs). (p) Final model with pruning (4 basis funcs). (q) Final model with pruning - Other view. Cubic Splines - Bivariate basis functions (r) Final model without pruning (23 basis funcs). (s) Final model with pruning (7 basis funcs). (t) Final model with pruning - Other view. Comparison with CART on a blob image On data that could be considered ”categorical”, CART performs better than MARS, like this example shows: True data (u) (v) Performance of MARS (w) Univariate basis funcs, max basis funcs: 21, pruning performed. (x) Bivariate basis funcs, max basis funcs: 41, no pruning. Performance of CART (y)

Transcript of MARS - Multivariate Adaptive Regression Splines - Visionday

MARS - Multivariate AdaptiveRegression Splines

Giulia Prando - MSc Mathematical Modelling and Computation

Department of Informatics and Mathematical Modelling

Introduction

What?Method for multivariate regression

When? In 1990 by Jerome H. Friedman

Why?To overcome disadvantages of alreadyexisting methods:◮ Global parametric modelling◮ Non-parametric modelling◮ Adaptive methods⊲ Regression trees (CART)

Main Characteristics

MARS introduces the following modifications to the CART algorithm:◮ new type of basis functions: the Heavyside functions H[±(x − t)] are replaced by the two-sided

truncated power splines [±(x − t)]q+, being t the knot site⊲ usually q = 1 for the continuity of the approximating function⊲ from the solution a continuous derivative solution is then derived

◮ not removing the parent basis function after it has been split⊲ parent and both its daughters are eligible for further splitting⊲ the corresponding regions overlap

◮ restricting the product associated with each basis function to factors involving distinct predictorvariables⊲ products of splines with dependencies on individual variables of power greater than q are notallowed.

Two stages of MARS

Forward Stage

1. start with only the constant function h0(x) = 1

2. at each stage consider all products of a basis function hm in the model set M with one pair in C

3. add to the model M the new pair of basis functions

hl(x) · (xj − t)q+, hl(x) · (t − xj)q+, hl ∈ M

and the termaM+1hl(x) · (xj − t)q+ + aM+2hl(x) · (t − xj)

q+

that gives the largest decrease in the training error;⊲ coefficients aM+1 and aM+2 are estimated by least-squares.

Backward Stage

Why pruning?To reduce the dimension of themodel and thus avoid overfitting

How to prune?At each step the term whoseremoval causes the smallest increase in RSSis removed

When to stop pruning?When the model reachesthe optimal size λ∗, selected usingGeneralized Cross Validation

Example - Regression of a bivariate function

Linear Splines - Univariate basis functions

(a) 1 basis function. (b) 3 basis functions. (c) 5 basis functions. (d) 7 basis functions. (e) Final model without

pruning (15 basis funcs).

(f) Final model with pruning

(4 basis funcs).

(g) Final model with pruning

- Other view.

Linear Splines - Bivariate basis functions

(h) 1 basis function. (i) 3 basis functions. (j) 5 basis functions. (k) 7 basis functions. (l) Final model without

pruning (23 basis funcs).

(m) Final model with pruning

(7 basis funcs).

(n) Final model with pruning

- Other view.

Cubic Splines - Univariate basis functions

(o) Final model without

pruning (15 basis funcs).

(p) Final model with pruning

(4 basis funcs).

(q) Final model with pruning

- Other view.

Cubic Splines - Bivariate basis functions

(r) Final model without

pruning (23 basis funcs).

(s) Final model with pruning

(7 basis funcs).

(t) Final model with pruning

- Other view.

Comparison with CART on a blob image

On data that could be considered ”categorical”, CART performs better than MARS, like this example shows:

True data

(u) (v)

Performance of MARS

(w) Univariate basis funcs, max basis

funcs: 21, pruning performed.

(x) Bivariate basis funcs, max basis

funcs: 41, no pruning.

Performance of CART

(y)