Regression Theory

59
Motivatio n Regression Theory with Additive Models and CMARS 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 Gerhard Gerhard Gerhard Gerhard Gerhard Gerhard Gerhard Gerhard- - - - -Wilhelm Weber Wilhelm Weber Wilhelm Weber Wilhelm Weber Wilhelm Weber Wilhelm Weber Wilhelm Weber Wilhelm Weber * * Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz , , Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal , Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya Fatma Yerlikaya , Pakize Taylan Pakize Taylan Pakize Taylan Pakize Taylan Pakize Taylan Pakize Taylan Pakize Taylan Pakize Taylan * *, * *, Elcin Kartal Elcin Kartal Elcin Kartal Elcin Kartal , Efsun Kürüm Efsun Kürüm Efsun Kürüm Efsun Kürüm , Ayse Özmen Ayse Özmen Ayse Özmen Ayse Özmen Institute of Applied Mathematics, Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey Middle East Technical University, Ankara, Turkey * Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Faculty of Economics, Management and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal * * * * Department of Mathematics, Dicle University, Turkey Department of Mathematics, Dicle University, Turkey n

description

AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.

Transcript of Regression Theory

Page 1: Regression Theory

Motivatio

n

Regression Theorywith

Additive Models and CMARS

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

GerhardGerhardGerhardGerhardGerhardGerhardGerhardGerhard--------Wilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm Weber * *

Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz , , Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal ,, Fatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma Yerlikaya ,, Pakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize Taylan * *, * *,

Elcin Kartal Elcin Kartal Elcin Kartal Elcin Kartal ,, Efsun KürümEfsun KürümEfsun KürümEfsun Kürüm ,, Ayse ÖzmenAyse ÖzmenAyse ÖzmenAyse Özmen

Institute of Applied Mathematics, Institute of Applied Mathematics, Middle East Technical University, Ankara, TurkeyMiddle East Technical University, Ankara, Turkey

** Faculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal

* ** * Department of Mathematics, Dicle University, TurkeyDepartment of Mathematics, Dicle University, Turkey

n

Page 2: Regression Theory

• Introduction, Motivation• Regression• Additive Models• MARS• PRSS for MARS

Content

• PRSS for MARS • CQP for MARS• Tikhonov Regularization for MARS• Numerical Experience and Comparison• Research Extensions • Conclusion

Page 3: Regression Theory

learning from data has become very importantin every field of science and technology, e.g., in

• financial sector,• quality improvent in manufacturing,•• computational biology,• medicine and• engineering.

Introduction

Learning enables for doing estimation and prediction.

Regression is mainly based on the problems and methods of

• least squares estimation,• maximum likelihood estimation • and classification.

New tools for data analysis, based on nonparametric regression and smoothing:

• additive (and multiplicative) models.

Page 4: Regression Theory

CART

vs.

Introduction

MARS

Page 5: Regression Theory

Additive (and multiplicative) models (studied at IAM, METU):

• spline regression in additive models,• spline regression in generalized addive models,

• MARS:

Introduction

• MARS: piecewise linear (per dimension) regression in multiplicative models,

• spline regression for stochastic differential equations

via additive and nonlinear models.

Page 6: Regression Theory

One of motivations of this research has been the approximation of financial data points(x,y), e.g., coming from

• the stock market,• credit rating,• economic factor,• company properties.

Regression: a Motivation

For example, to estimate the probability of a default of a particular credit:• It is used one of the latest three data points above.

There are different approaches for estimating the probability of a default.• Regression models (binary choice) are one of them.• For example, we assume that we have the dependent variable Y

with Y = 1 (“default”) or Y = 0 (“no default”) satisfies

,

: vector of independent variable(s) (input) such as credit rating.

( ) +Y F X ε=

X

Page 7: Regression Theory

• Estimation for the default probability P,

• Also, this estimation can be done via following linear regression

[ ]( ) + ( ).P E F X ε F X= =

Y X= + +Τα β ε .

Regression: a Motivation

• An estimate for the default probability of a corparate bond can be obtained:

α and β are unknown parameters. They can be estimated via linear regressionmethods or maximum likelihood estimation. In many important cases, these just meanleast squares estimation.

Y X= + +α β ε .

P X ;Τβ= α +

Page 8: Regression Theory

Input vector and output variable Y ;

linear regression :

• E(Y | X) is linear (...) and

( )1 2, ,...,T

mX X X X=

1 01

( ,..., )m

m j jj

Y E Y X X Xε β β ε=

= + = + +∑

Regression

• which minimizes

or

( )0 1, ,...,T

mβ β β β=

( ) ( )2

1

:N

Ti i

i

RSS y x=

= −∑β β

( ) ( ) ( )TRSS β Y Xβ Y Xβ= − − ( ) 1ˆ ,T TX X X y

−=β

( ) 1 2ˆCov( ) Tβ X X σ−

=

Page 9: Regression Theory

• Classical understanding:

additive separation of variables

In the input space:

Regression, Additive Models

• New interpretation:

separation of clusters and corresponding enumeration

Page 10: Regression Theory

(A)

are estimated by a smoothing on a single coordinate.

Standard convention at .

( ) ( )1 2 01

, ,...,m

i i i im j ijj

E Y x x x f xβ=

= +∑

jf

( )( ): 0x E f x =

Regression, Additive Models

Standard convention at .

• Backfitting algorithm (Gauss-Seidel algorithm). • This procedure depends on the partial residual against :

( )0ˆ .ij i k ik

k j

r y f xβ≠

= − −∑

( )( ): 0ij j ijx E f x =

ijx

Page 11: Regression Theory

• Estimating each smooth function by holding all the other ones fixed.

initialization:

cycle0

ˆ : ave( | 1,..., ) ,iy i Nβ = = ˆ ( ) 0, ,j ijf x i j≡ ∀

1,..., ,1,..., ,1,...,j m m=

( )ˆˆ , 1,...,m

r y f x i Nβ= − − =∑

Regression, Additive Models

is updated by smoothing the partial residuals

against

until the functions almost do not change.

• Convergence (condition)

( )0ˆˆ , 1,...,ij i k ik

k j

r y f x i Nβ≠

= − − =∑

jf̂

( )0ˆˆ ( 1,..., )

m

ij i k ikk j

r y f x i Nβ≠

= − − =∑ ijx

Page 12: Regression Theory

• Convergence of the backfitting, ˆ ,Tf = f

( )

.

.

.

.

ˆ : Nm Nmj kj k j

ST IR IR≠

−→

f

fa

, f, f, f, f

1

Regression, Additive Models

• Full cycle: ; then, corresponds to l full cycles.

• Always converges if all smoother are symetric and all eigenvalues of are either +1 or in the interior of the unit ball: .

1 1ˆ ˆ ˆ ˆ...m m-=T T T T

.

.

.

m

f

ˆ lT

1λ| |<

Page 13: Regression Theory

• To extend the additive model to a wide range of distribution families:generalized additive models (GAM):

( )( ) ( ) ( )0µ β=

= = +∑m

j ji 1

X ψ X f XG ,

Regression, Generalized Additive Models

• are unspecified, , G : link function;

• : elements of a finite dimensional space consisting, e.g., of splines;

• spline orders (or degrees): suitably choosen, depending on the density and variation properties of the corresponding data in x and y components, respectively.

• problem of specifying becomes a finite dimensional parameter estimation problem.

jf

jf ( )0 1: , ,...,T

mf fθ β=

θ

Page 14: Regression Theory

• be distinct knots of and

• The function on the interval is a spline of degree k relative to the knots .

• If

(1) (polynomial of degree ; ),

0,..., Nx x 1N + [ ]ba, 0 1 ... Na x x x b= < < < =

)(xgk [ ]ba,jx

1,j jk kx x

f IP+

∈ k≤ 1,...,0 −= Nj

Regression, Generalized Additive Models,Splines

(2) ,

the space of splines on is called and relative to the distinct

knots; then, . .

• In practice, a spline is represented by a different polynomial on each subinterval and for this reason there could be a discontinuity in its kth derivative at the internal knots

.

1j j +

[ ]1 ,kkf C a b−∈

kg [ ]ba, k℘ 1N +

dim k N k℘ = +

1 1,..., Nx x −

Page 15: Regression Theory

• Characterize a spline of degree k, can be represented by

coefficients to be determined.

1, ,

:j j

k j k x xf f

+ =

, 10

( ) ( ) , if , ;+=

= − ∈ ∑k

ik j ij j j j

i

f x g x x x x x

( 1)k N+ijg

( ) ( )( ) ( ) ( 1,..., 1; 0,..., 1),= = − = −l lf x f x j N l k

Regression, Generalized Additive Models,Splines

• To hold:

there are conditions, and the remaining degrees of freedom are

- .

( ) ( ), 1 ,( ) ( ) ( 1,..., 1; 0,..., 1),− = = − = −l l

k j j k j jf x f x j N l k

( 1)k N −

( 1)k N+ ( 1)k N k N− = +

Page 16: Regression Theory

• Financial markets have different kinds of trading activities. These activities work with

• short-, mid- or long-term horizons

• from days and weeks to months and years.

• These data can sometimes be problematic for being used at the models,

Clustering for Generalized Additive Models

• These data can sometimes be problematic for being used at the models,

e.g.,given a longer horizon with sometimes less frequent data recorded, but to other times highly frequent measurements.

• the structure of data may has particular properties:

i. larger variability ii. outliersiii. some data do not have any meaning.

Page 17: Regression Theory

Clustering for Generalized Additive Models

Page 18: Regression Theory

• data variation:

Clustering for Generalized Additive Models

• for the sake of simplicity: for each interval

jI≡jN N

Page 19: Regression Theory

• Density:

; the density of the input data in the j-th interval:

• Variation

1,..., mI I

number of point in : .

length of ij j

jj

x ID

I=

Clustering for Generalized Additive Models

If over the interval the data are :

• If this value is big, at many data points, the curvature of any approximating curve could be big.

� occurrence of outliers,� instability of the model.

1 1( ),..., ( ) j j N j N jx , y x , yjI

1

11

: .N

j i j i ji

V y y−

+=

= −∑

Page 20: Regression Theory

• (or ) intervals (or cubes) according to the data grouped. • (cube ), the associated index of data variation

or

• In fact, from both the viewpoints of data fitting and complexity (or stability),

1,..., pI I 1,..., mQ Q

jI jQ

( )j j j

j j j j j

Ind :=D V

Ind :=d D v (V )

Clustering for Generalized Additive Models

• In fact, from both the viewpoints of data fitting and complexity (or stability),

o cases with a high variation distributed over a very long intervall are very muchless problematic than cases with a high variation over a short interval;

o oscillation, o curvature,o up to nonsmoothness,

o penalty!

Page 21: Regression Theory

• Additive model can be fit by data. Given observations for

• penalized sum of squares PRRS

22''

0 01 1 1

( , ,..., ) : ( ) ( )bN m m

1 m i j ij j j j ji j j a

PRSS f f y f x f t dt= = =

= − − +

∑ ∑ ∑ ∫β β µ

( , ) ( = 1,2,..., ).i iy x i N

Regression, Additive Models

• (smoothing parameters, tradeoff)

• large values of yield smoother curves, smaller ones result in more fluctuation.

• New estimation methods for additive model with CQP :

1 1 1i j j a= = =

0jµ ≥

Page 22: Regression Theory

0, ,

2

20

1 1

2''

min ,

subject to ( ) , 0,

( ) ( 1,2,..., ).

t β f

N m

i j iji= j

j j j j

t

y β f x t t

f t dt M j m

=

− − ≤ ≥

≤ =

∑ ∑

jdj jθ=∑

Regression, Additive Models

• The functions are splines:

• Then, we get

jf1

( ) ( ).j jj l l

l

f x h xθ=

=∑

0, ,

2 20 2

2

0 2

min ,

subject to ( , ) , 0,

( , ) ( 1,..., ).

t β f

j j

t

W t t

V M j m

β θ

β θ

≤ ≥

≤ =

Page 23: Regression Theory

Regression, Additive Models

http://144.122.137.55/gweber/http://144.122.137.55/gweber/

Page 24: Regression Theory

• To estimate general functions of high-dimensional arguments.

• An adaptive procedure.

• A nonparametric regression procedure.

• No specific assumption about the underlying functional relationship between the dependent and independent variables.

MARS Multivariate Adaptive Regression Spline

between the dependent and independent variables.

• Ability to estimate the contributions of the basis functions so that both the additive and the interactive effects of the predictors are allowed to determine the response variable.

• Uses expansions in piecewise linear basis functions of the form

+ ( , ) = [ ( )] ,c x xτ τ ++ − ( , ) = [ ( )]-c x x .τ τ +− −

{ }[ ] : max 0,q q+ =

Page 25: Regression Theory

y

• •

•• •

••

••

••••

••

MARS

Basic elements in the regression with MARS.

• Let us consider

• The goal is to construct reflected pairs for each input

τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ

( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., ).=jX j p

Page 26: Regression Theory

y

• •

•• •

••

••

••••

••

MARS

Basic elements in the regression with MARS.

• Let us consider

• The goal is to construct reflected pairs for each input

τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ

( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., )jX j p=

Page 27: Regression Theory

y

• •

•• •

••

••

••••

••

MARS

Basic elements in the regression with MARS.

• Let us consider

• The goal is to construct reflected pairs for each input

( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., )jX j p=

τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ r egression w ith

Page 28: Regression Theory

• Set of basis functions:

• Thus, can be represented by

{ } { }{ }1, 2, ,: ( ) , ( ) , ,..., , 1,2,...,|j j j j N jX X x x x j pτ τ τ+ +℘ = − − ∈ ∈% % %

( )f X

MARS

• are basis functions from or products of two or more such functions; interaction basis functions are created by multiplying an existing basis function with a truncated linear function involving a new variable.

• Provided the observations represented by the data

01

( ) .M

m mm

Y Xθ θ ψ ε=

= + +∑

( 1,2,..., )m m Mψ = ℘

( , ) ( 1,2,..., ) :i iy i N=x

1

( ) : [ ( )] .m

m m mj j j

K

mj

s xκ κ κ

ψ τ +=

= ⋅ −∏x

Page 29: Regression Theory

• Two subalgorithms:

(i) Forward stepwise algorithm:

• Search for the basis functions.• Minimization of some “lack of fit” criterion. • The process stops when a user-specified value is reached. M

MARS

• The process stops when a user-specified value is reached.

• Overfitting.So a backward deletion procedure is appliedby decreasing the complexity of the model without degrading the fit to the data.

(i) Backward stepwise algorithm:

maxM

Page 30: Regression Theory

• Remove from the model basis functions that contribute to the smallest increasein the residual squared error at each stage, producing an optimally estimated model with respect to each number of terms, called .

• is related with some complexity of the estimation.• To estimate the optimal value of :

f̂αα

αα

MARS

• Alternative:

2

=12

( ( ))GCV :=

(1 ( ) )

N

i ii

ˆy f

N

α

α

∑ x

M

( ) := +

:=

:=

:=

:=

u d K

N

u

K

d

αM

number of samples

number of independent basis functions

number of knots selected by forward stepwise algorithm

cost of optimal basis

Page 31: Regression Theory

PRSS for MARS

( )max

1 2

2 22 2,

1 1 1, ( )( , )

: ( ) ( )α

αα α α

λ θ ψ= = = <

∈=

= − + ∑ ∑ ∑ ∑ ∫MN

m mi i m m r s m

i m r sr s V m

PRSS y f D dx t t

{ }( ) : | 1,2,...,

: ( , ,..., )

mj m

m T

V m j K

t t t

κ= =

t =

• Tradeoff between both accuracy and complexity. • Penalty parameters .

{ }

1 2

1 2

1 2 1 2

: ( , ,..., )

( , )

: , , 0,1

Km

m Tm m mt t t

α α αα α α α α

== + ∈

t =

where ( )1 2, ( ) : ( )m m m m

r s m m r sD t tα αα αψ ψ= ∂ ∂ ∂t t

Page 32: Regression Theory

Knot Selection

Page 33: Regression Theory

Grid Selection

Page 34: Regression Theory

Grid SelectionMotivatio

n

Page 35: Regression Theory

( )max

max

1 11 11, ( ),..., ( ), ( ),..., ( )

TMM Mi i M i M i M ix xψ ψ ψ ψ+

+( ) :=ψψψψ d x x

( )max

,M

Τθ θ θ0 1:= , ,...,θθθθ

max1 2 1 2: ( , ,..., , , ,..., )d MM M M Ti i i i i i ix x x x x x+ +=

{ } { }1,2,...,( ) 0,1,2,..., 1Kmj

mj K Nκσ ∈ ∈ +

K

CQP and Tikhonov Regularization for MARS

1 21 21 2

, , ,

ˆ , ,..., ,m m mm m K mm

KmKm

mi

l l l

x x xκ κ κ

κ κ κσ σ σκ κ κ

=

x1, ,1

ˆ :m

m mj jm m

j jj j

Kmi

l lj

x xκ κ

κ κσ σκ κ

+=

∆ = −

∏x

( )1( ) : ( ),..., ( )T

N=d d dψ ψ ψψ ψ ψψ ψ ψψ ψ ψ

12

1 2

2 2

,1

, ( )( , )

ˆ ˆ: ( ) .m mim r s m i i

r sr s V m

L Dα

αα α α

ψ= <

∈=

= ∆

∑ ∑ x x

L is an matrix.max max( 1) ( 1)M M+ × +

Page 36: Regression Theory

• For a short representation, we can rewrite the approximate relation as

max ( 1)2 2 2

21 1

( ) .λ θ+

= =

= − + ∑ ∑KmM N

m im mm i

PRSS Ly dψ θψ θψ θψ θ

CQP and Tikhonov Regularization for MARS

• In case of the same penalty parameter , then:

Tikhonov regularization

2 2

22( ) .λ= − +PRSS y d Lψ θ θψ θ θψ θ θψ θ θ

2( : )mλ λ ϕ= =

Page 37: Regression Theory

CQP for MARS

• Conic quadratic programming:

,

2

min ,

( ) ,

.

tt

t

M

θ

d y

L

subject to ψ θ −ψ θ −ψ θ −ψ θ −

θθθθ2

.M≤Lθθθθ

2, ( 1,2,..., ).min ≤ − =− T

i ii iT q i kIn general :

xc x p xD x dsubject to

Page 38: Regression Theory

CQP for MARS

• Conic quadratic programming:

,

2

min ,

( ) ,

.

tt

t

M

θ

d y

L

subject to ψ θ −ψ θ −ψ θ −ψ θ −

θθθθ2

.M≤Lθθθθ

2, ( 1,2,..., ).min ≤ − =− T

i ii iT q i kIn general :

xc x p xD x dsubject to

Page 39: Regression Theory

. Moreover, is a primal dual optimal solution if and only if 1 2( , , , , , )t χ η ω ωθθθθ

max 1

0 ( ): ,

1 0 0N

TM

+

− = +

d yψψψψθθθθ

CQP for MARS

maxmax

max

max

maxmax max

max

11

1

1

1 211 1

1 2

11 2

00: ,

0 0

0 0 10 1,

0( ) 0 0

0, 0,

, M

MM

TM

TTMN

T TMM M

T T

N

t

M

L L

η

ω ω

ω χ ω η

ω ω

++

+

+

++ +

+

= +

+ =

= =

∈ ∈

L

d L

θθθθ

ψψψψ

2

max 21

,

, .MNL Lχ η

+

++∈ ∈

Page 40: Regression Theory

• CQPs belong to the well-structured convex problems.

• Interior Point Methods.

• Better complexity bounds.

CQP for MARS

• Better practical performance.

C-MARS

Page 41: Regression Theory

• We had the following data:

X1 1,5554 1,5326 -0,1823 0,1627 0,5687 0,1706 0,2041 -0,1823 -0,82 -0,7234 0,4446 -0,3291 -1,5583 1,2706 1,7555

X2 0,1849 1,1538 0,7586 -1,5363 1,906 0,3761 1,3323 -0,0064 -1,7275 1,141 0,3761 0,5673 -0,1976 0,7586 0,1849

X3 1,264 1,2023 -1,0995 0,8529 1,3051 -0,3802 -0,7913 0,1336 0,2363 -1,0995 -0,0719 -0,894 -1,0995 0,9557 1,5722

X4 1,2843 1,0175 -0,9676 0,7408 1,0635 -0,506 -0,7937 -0,0564 0,0455 -0,9676 -0,2482- 0,8557 -0,9676 0,8707 1,7339

X5 -0,7109 0,1777 0,1422 0,0355 3,2699 0,3554 -0,1777 1,5283 -0,0711 0,3554 0,8886 0,4621 -0,9241 -0,9241 -0,0711

Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263

Numerical Experience and Comparison

Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263

X1 0,0474 -0,8713 -0,2158 0,2179 1,5426 -1,16 0,9857 0,6752 0,5402 -1,4528 1,9349 -0,8299 -0,681 0,7304 -1,1305

X2 0,9498 -0,1976 -1,7275 -0,9626 1,3323 -0,9626 0,1849 -1,345 1,3323 -0,0064 0,1849 0,3761 -1,345 -0,7713 -0,0064

X3 0,0308 -0,6885 1,0584 0,5446 0,5446 -0,483 0,4419 1,264 0,0308 -1,3051 2,086 -0,5857 -0,2775 1,5722 -1,3051

X4 0,1543 -0,7278 1,0046 0,3752 0,3752 -0,5839 0,2613 1,2843 -0,1543 -1,0635 2,5631 -0,6578 -0,4241 1,7339 -1,0635

X5 1,1018 0,6753 -0,391 -0,2843 1,4217 0,4621 -0,8175 0,7819 0,2488 1,5283 -0,1777 -1,7771 0,4621 -1,0307 0,3554

Y 1,1477 -0,3916 -0,4624 -1,0993 2,8639 -1,0285 0,1923 -0,7631 2,05 1,0238 0,9177 -1,2055 -0,3208 -0,5862 -0,6216

Page 42: Regression Theory

• We constructed model functions for these data using the MARS Software where we selected the maximum number of basis elements: . Then,5

maxM =

1 2BF = max{0, + 1.728};

= -1.081 + 0.626 * BF

ωModel 1 : = 1

X

Y

1 2BF = max{0, + 1.728};

BF = max{0, - 0.462}* BF

ωModel 2 : = 2

X

X

Numerical Experience and Comparison

1 = -1.081 + 0.626 * BFY 2 5 1

1 2

BF = max{0, - 0.462}* BF

= -1.073 + 0.499* BF 0.656 BF+X

Y *

1 2

2 5 1

4 3 1

1 2 4

BF = max{0, + 1.728};

BF = max{0, - 0.462} * BF

BF = max{0, + 0.586} * BF

-1.176 + 0.422 * BF + 0.597 * BF + 0.236 * BF

ωModel 3 : = 3best mod >>>

el

=

X

X

X

Y

Page 43: Regression Theory

• and, finally,

1 2

2 5 1

3 5 1

4 3 1

BF = max{0, + 1.728}

BF = max{0, - 0.462} * BF

BF = max{0, 0.462 - } * BF

BF = max{0, + 0.586} * BF

-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF

ωModel 4 : = 4

=

X

X

X

X

Y

Numerical Experience and Comparison

1 2 3-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF

+ 0.22

= Y

46 * BF

1 2

2 5 1

3 5 1

4 3 1

5 3 1

1

BF = max{0, + 1.728};

BF = max{0, - 0.462} * BF

BF = max{0, 0.462 - } * BF

BF = max{0, + 0.586} * BF

BF = max{0, - 0.586 - } * BF

= -1.248 + 0.487 * BF + 0.486 * B

ωModel 5 : = 5

X

X

X

X

X

Y 2 3 4 5F - 0.118 * BF + 0.282 * BF + 0.263 * BF

Page 44: Regression Theory

• Then, we considered a large model with 5 five basis functions; we found (writing a MATLAB code):

0 0 0 0 0 0

0 1.8419 0 0 0 0

0 0 0.7514 0 0 0

0 0 0 0.9373 0 0

=

L

Numerical Experience and Comparison

• We constructed models using different values for in the optimization problem, which was solved by MOSEK (CQP).

• Our algorithm constructs a model with 5 parameters always; in case of Salford, there are 1, 2, 3, 4 or 5 parameters.

0 0 0 0.9373 0 0

0 0 0 0 2.1996 0

0 0 0 0 0 0.3905

M

Page 45: Regression Theory

1 17.6425 4.2003 1.1531 0.771

2

RESULTS OF SALFORD MARS

z = RSS2

θL t = GCVRSSω

Numerical Experience and Comparison

2 11.1870 3.3447 1.0430 0.613

3 7.7824 2.7897 1.0368 0.550

4 6.6126 2.5715 1.1967 0.626

5 6.2961 2.5092 1.1600 0.840

Page 46: Regression Theory

RESULTS OF OUR APPROACH

0.05 5 5.16894 0.05 0.2940 5 4.2024 0.2940

0.1 5 4.959342 0.1 0.2945 5 4.2006 0.2945

0.15 5 4.755559 0.15 0.295 5 4.1988 0.2950

0.2 5 4.557617 0.2 0.3 5 4.180557 0.3

0.25 5 4.365811 0.25 0.35 5 4.002338 0.35

M ω z = RSS2

θL t = M z = RSS2

θL t =ω

Numerical Experience and Comparison

0.25 5 4.365811 0.25 0.35 5 4.002338 0.35

0.265 5 4.3095 0.2650 0.4 5 3.831675 0.4

0.275 5 4.2723 0.2750 0.45 5 3.669118 0.45

0.285 5 4.2354 0.2850 0.5 5 3.515233 0.5

0.2865 5 4.2299 0.2865 0.55 5 3.370588 0.55

0.2875 5 4.2262 0.2875 0.552 5 3.3650 0.5520

0.2885 5 4.2226 0.2885 0.555 5 3.3567 0.5550

0.2895 5 4.2189 0.2895 0.558 5 3.3483 0.558

0.28965 5 4.2183 0.2897 0.560 5 3.3428 0.5600

0.28975 5 4.2180 0.2897 0.561 5 3.3401 0.5610

0.28985 5 4.2176 0.2899 0.562 5 3.3373 0.5620

0.28995 5 4.2172 0.2899 0.565 5 3.3291 0.5650

Page 47: Regression Theory

0.575 5 3.3019 0.5750 0.96 5 2.5968 0.96

0.585 5 3.2751 0.5850 0.97 5 2.5880 0.97

0.595 5 3.2488 0.5950 0.98 5 2.5797 0.98

0.6 5 3.235746 0.6 0.99 5 2.5718 0.99

0.65 5 3.111253 0.6 5 1 5 2.564459 1

0.7 5 2.997622 0. 7 2 5 2.509165 1.16009

M z = RSS2

θL t = M z = RSS2

θL t =ω ω

Numerical Experience and Comparison

0.7 5 2.997622 0. 7 2 5 2.509165 1.16009

0.75 5 2.895324 0.7 5 2.1 5 2.509165 1.16009

0.8 5 2.804764 0.8 2.2 5 2.509165 1.16009

0.805 5 2.7964 0.8050 2.3 5 2.509165 1.16007

0.810 5 2.7881 0.8100 2.4 5 2.509165 1.16008

0.820 5 2.7719 0.8200 2.5 5 2.509165 1.16001

0.830 5 2.7562 0.8300 2.6 5 2.509165 1.16007

0.840 5 2.7410 0.8400 2.7 5 2.509165 1.16007

0.85 5 2.726261 0.85 2.8 5 2.509165 1.16009

0.9 5 2.660023 0.9 2.9 5 2.509165 1.16009

0.95 5 2.60612 0.95 3 5 2.509165 1.16009

4 5 2.509165 1.160084

Page 48: Regression Theory

. We drew L curves: 2

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ− 4.5

5

5.5

4.5

5

5.5

2d

θ−

ψθ

−ψ

θ−

ψθ

4.5

5

5.5

4.5

5

5.5

Numerical Experience and Comparison

• Conclusion: Based on the L curve criterion and for the given data, our solution is better than Salford solution for MARS.

()

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ−

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

2Lθθθθ

2Lθθθθ

()

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ−

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

Page 49: Regression Theory

• All test data sets are also compared according to the performance measure such as MSE, MAE, Correlation Coefficient, R2, PRESS, Mallows’ Cp etc..

• These measures are based on the average of nine values (one for each fold and each replication).

Numerical Experience and Comparison

C - M A R S

Page 50: Regression Theory

Please find much more numerical experience and comparison in

Yerlikaya, Fatma,

A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing,

Numerical Experience and Comparison

MSc. Thesis at Institute of Applied Mathematics of METU, Ankara, 2008.

Page 51: Regression Theory

Piecewise Linear Functions - Stock Market

figures generated byErik KropatErik KropatErik KropatErik Kropat

Page 52: Regression Theory

Forward Stepwise Algorithm Revisited

high complexity

Page 53: Regression Theory

Forward Stepwise Algorithm Revisited

Page 54: Regression Theory

Forward Stepwise Algorithm Revisited

Page 55: Regression Theory

Forward Stepwise Algorithm Revisited

Page 56: Regression Theory

Forward Stepwise Algorithm Revisited

Page 57: Regression Theory

Regularization & Uncertainty Robust Optimization

Laurent El Ghaoui

• ••

Page 58: Regression Theory

Regularization & Uncertainty Robust Optimization

Page 59: Regression Theory

• Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press,2004.

• Breiman, L., Friedman, J. H., Olshen, R., and Stone, C., Classification and Regression Trees, Belmont, CA:Wadsworth Int. Group, 1984.

• Craven, P., and Wahba, G., Smoothing noisy data with spline functions: estimating the correct degree ofsmoothing by the method of generalized cross-validation, Numerische Mathematik 31 (1979) 377-403.

• Friedman, J.H., Multivariate adaptive regression splines, The Annals of Statistics 19, 1 (1991) 1-141.

• Hansen, P.C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,SIAM, Philadelphia, 1998.

References

• Hastie, T., Tibshirani, R., and Friedman, J.H., The Element of Statistical Learning, Springer Verlag, NY, 2001.

• MOSEK SOFTWARE, http://www.mosek.com/ .

• Myers, R.H., and Montgomery, D.C., Response Surface Methodology: Process and ProductOptimization Using Designed Experiments,New York: Wiley (2002).

• Nemirovski, A., Lectures on modern convex optimization, Israel Institute Technology (2002), http://iew3.technion.ac.il/Labs/Opt/LN/Final.pdf.

• Nesterov, Y.E., and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993.

• Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology, Optimization, 56, 5–6, October–December (2007) 675–698.

• P. Taylan, P., Weber , G.-W., and Yerlikaya, F., Continuous optimization applied in MARS for modern applications in finance, science and technology, in ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies, Neringa, Lithuania, May 20-23, 2008.