Regression Theory

Motivatio

n

Regression Theorywith

Additive Models and CMARS

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

GerhardGerhardGerhardGerhardGerhardGerhardGerhardGerhard--------Wilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm WeberWilhelm Weber * *

Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz Inci Batmaz , , Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal Gülser Köksal ,, Fatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma YerlikayaFatma Yerlikaya ,, Pakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize TaylanPakize Taylan * *, * *,

Elcin Kartal Elcin Kartal Elcin Kartal Elcin Kartal ,, Efsun KürümEfsun KürümEfsun KürümEfsun Kürüm ,, Ayse ÖzmenAyse ÖzmenAyse ÖzmenAyse Özmen

Institute of Applied Mathematics, Institute of Applied Mathematics, Middle East Technical University, Ankara, TurkeyMiddle East Technical University, Ankara, Turkey

** Faculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyFaculty of Economics, Management and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal

* ** * Department of Mathematics, Dicle University, TurkeyDepartment of Mathematics, Dicle University, Turkey

n

• Introduction, Motivation• Regression• Additive Models• MARS• PRSS for MARS

Content

• PRSS for MARS • CQP for MARS• Tikhonov Regularization for MARS• Numerical Experience and Comparison• Research Extensions • Conclusion

learning from data has become very importantin every field of science and technology, e.g., in

• financial sector,• quality improvent in manufacturing,•• computational biology,• medicine and• engineering.

Introduction

Learning enables for doing estimation and prediction.

Regression is mainly based on the problems and methods of

• least squares estimation,• maximum likelihood estimation • and classification.

New tools for data analysis, based on nonparametric regression and smoothing:

• additive (and multiplicative) models.

CART

vs.

Introduction

MARS

Additive (and multiplicative) models (studied at IAM, METU):

• spline regression in additive models,• spline regression in generalized addive models,

• MARS:

Introduction

• MARS: piecewise linear (per dimension) regression in multiplicative models,

• spline regression for stochastic differential equations

via additive and nonlinear models.

One of motivations of this research has been the approximation of financial data points(x,y), e.g., coming from

• the stock market,• credit rating,• economic factor,• company properties.

Regression: a Motivation

For example, to estimate the probability of a default of a particular credit:• It is used one of the latest three data points above.

There are different approaches for estimating the probability of a default.• Regression models (binary choice) are one of them.• For example, we assume that we have the dependent variable Y

with Y = 1 (“default”) or Y = 0 (“no default”) satisfies

,

: vector of independent variable(s) (input) such as credit rating.

( ) +Y F X ε=

X

• Estimation for the default probability P,

• Also, this estimation can be done via following linear regression

[ ]( ) + ( ).P E F X ε F X= =

Y X= + +Τα β ε .

Regression: a Motivation

• An estimate for the default probability of a corparate bond can be obtained:

α and β are unknown parameters. They can be estimated via linear regressionmethods or maximum likelihood estimation. In many important cases, these just meanleast squares estimation.

Y X= + +α β ε .

P X ;Τβ= α +

Input vector and output variable Y ;

linear regression :

• E(Y | X) is linear (...) and

( )1 2, ,...,T

mX X X X=

1 01

( ,..., )m

m j jj

Y E Y X X Xε β β ε=

= + = + +∑

Regression

• which minimizes

or

( )0 1, ,...,T

mβ β β β=

( ) ( )2

1

:N

Ti i

i

RSS y x=

= −∑β β

( ) ( ) ( )TRSS β Y Xβ Y Xβ= − − ( ) 1ˆ ,T TX X X y

−=β

( ) 1 2ˆCov( ) Tβ X X σ−

=

• Classical understanding:

additive separation of variables

In the input space:

Regression, Additive Models

• New interpretation:

separation of clusters and corresponding enumeration

(A)

are estimated by a smoothing on a single coordinate.

Standard convention at .

( ) ( )1 2 01

, ,...,m

i i i im j ijj

E Y x x x f xβ=

= +∑

jf

( )( ): 0x E f x =


Standard convention at .

• Backfitting algorithm (Gauss-Seidel algorithm). • This procedure depends on the partial residual against :

( )0ˆ .ij i k ik

k j

r y f xβ≠

= − −∑

( )( ): 0ij j ijx E f x =

ijx

• Estimating each smooth function by holding all the other ones fixed.

initialization:

cycle0

ˆ : ave( | 1,..., ) ,iy i Nβ = = ˆ ( ) 0, ,j ijf x i j≡ ∀

1,..., ,1,..., ,1,...,j m m=

( )ˆˆ , 1,...,m

r y f x i Nβ= − − =∑


is updated by smoothing the partial residuals

against

until the functions almost do not change.

• Convergence (condition)

( )0ˆˆ , 1,...,ij i k ik

k j

r y f x i Nβ≠

= − − =∑

jf̂

( )0ˆˆ ( 1,..., )

m

ij i k ikk j

r y f x i Nβ≠

= − − =∑ ijx

• Convergence of the backfitting, ˆ ,Tf = f

( )

.

.

.

.

ˆ : Nm Nmj kj k j

ST IR IR≠

−→

∑

f

fa

, f, f, f, f

1


• Full cycle: ; then, corresponds to l full cycles.

• Always converges if all smoother are symetric and all eigenvalues of are either +1 or in the interior of the unit ball: .

1 1ˆ ˆ ˆ ˆ...m m-=T T T T

.

.

.

m

f

ˆ lT

1λ| |<

T̂

• To extend the additive model to a wide range of distribution families:generalized additive models (GAM):

( )( ) ( ) ( )0µ β=

= = +∑m

j ji 1

X ψ X f XG ,

Regression, Generalized Additive Models

• are unspecified, , G : link function;

• : elements of a finite dimensional space consisting, e.g., of splines;

• spline orders (or degrees): suitably choosen, depending on the density and variation properties of the corresponding data in x and y components, respectively.

• problem of specifying becomes a finite dimensional parameter estimation problem.

jf

jf ( )0 1: , ,...,T

mf fθ β=

θ

• be distinct knots of and

• The function on the interval is a spline of degree k relative to the knots .

• If

(1) (polynomial of degree ; ),

0,..., Nx x 1N + [ ]ba, 0 1 ... Na x x x b= < < < =

)(xgk [ ]ba,jx

1,j jk kx x

f IP+

∈ k≤ 1,...,0 −= Nj

Regression, Generalized Additive Models,Splines

(2) ,

the space of splines on is called and relative to the distinct

knots; then, . .

• In practice, a spline is represented by a different polynomial on each subinterval and for this reason there could be a discontinuity in its kth derivative at the internal knots

.

1j j +

[ ]1 ,kkf C a b−∈

kg [ ]ba, k℘ 1N +

dim k N k℘ = +

1 1,..., Nx x −

• Characterize a spline of degree k, can be represented by

coefficients to be determined.

1, ,

:j j

k j k x xf f

+ =

, 10

( ) ( ) , if , ;+=

= − ∈ ∑k

ik j ij j j j

i

f x g x x x x x

( 1)k N+ijg

( ) ( )( ) ( ) ( 1,..., 1; 0,..., 1),= = − = −l lf x f x j N l k

Regression, Generalized Additive Models,Splines

• To hold:

there are conditions, and the remaining degrees of freedom are

- .

( ) ( ), 1 ,( ) ( ) ( 1,..., 1; 0,..., 1),− = = − = −l l

k j j k j jf x f x j N l k

( 1)k N −

( 1)k N+ ( 1)k N k N− = +

• Financial markets have different kinds of trading activities. These activities work with

• short-, mid- or long-term horizons

• from days and weeks to months and years.

• These data can sometimes be problematic for being used at the models,

Clustering for Generalized Additive Models

• These data can sometimes be problematic for being used at the models,

e.g.,given a longer horizon with sometimes less frequent data recorded, but to other times highly frequent measurements.

• the structure of data may has particular properties:

i. larger variability ii. outliersiii. some data do not have any meaning.

• data variation:


• for the sake of simplicity: for each interval

jI≡jN N

• Density:

; the density of the input data in the j-th interval:

• Variation

1,..., mI I

number of point in : .

length of ij j

jj

x ID

I=


If over the interval the data are :

• If this value is big, at many data points, the curvature of any approximating curve could be big.

� occurrence of outliers,� instability of the model.

1 1( ),..., ( ) j j N j N jx , y x , yjI

1

11

: .N

j i j i ji

V y y−

+=

= −∑

• (or ) intervals (or cubes) according to the data grouped. • (cube ), the associated index of data variation

or

• In fact, from both the viewpoints of data fitting and complexity (or stability),

1,..., pI I 1,..., mQ Q

jI jQ

( )j j j

j j j j j

Ind :=D V

Ind :=d D v (V )

⋅

⋅


• In fact, from both the viewpoints of data fitting and complexity (or stability),

o cases with a high variation distributed over a very long intervall are very muchless problematic than cases with a high variation over a short interval;

o oscillation, o curvature,o up to nonsmoothness,

o penalty!

• Additive model can be fit by data. Given observations for

• penalized sum of squares PRRS

22''

0 01 1 1

( , ,..., ) : ( ) ( )bN m m

1 m i j ij j j j ji j j a

PRSS f f y f x f t dt= = =

= − − +

∑ ∑ ∑ ∫β β µ

( , ) ( = 1,2,..., ).i iy x i N


• (smoothing parameters, tradeoff)

• large values of yield smoother curves, smaller ones result in more fluctuation.

• New estimation methods for additive model with CQP :

1 1 1i j j a= = =

0jµ ≥

jµ

0, ,

2

20

1 1

2''

min ,

subject to ( ) , 0,

( ) ( 1,2,..., ).

t β f

N m

i j iji= j

j j j j

t

y β f x t t

f t dt M j m

=

− − ≤ ≥

≤ =

∑ ∑

∫

jdj jθ=∑


• The functions are splines:

• Then, we get

jf1

( ) ( ).j jj l l

l

f x h xθ=

=∑

0, ,

2 20 2

2

0 2

min ,

subject to ( , ) , 0,

( , ) ( 1,..., ).

t β f

j j

t

W t t

V M j m

β θ

β θ

≤ ≥

≤ =


http://144.122.137.55/gweber/http://144.122.137.55/gweber/

• To estimate general functions of high-dimensional arguments.

• An adaptive procedure.

• A nonparametric regression procedure.

• No specific assumption about the underlying functional relationship between the dependent and independent variables.

MARS Multivariate Adaptive Regression Spline

between the dependent and independent variables.

• Ability to estimate the contributions of the basis functions so that both the additive and the interactive effects of the predictors are allowed to determine the response variable.

• Uses expansions in piecewise linear basis functions of the form

+ ( , ) = [ ( )] ,c x xτ τ ++ − ( , ) = [ ( )]-c x x .τ τ +− −

{ }[ ] : max 0,q q+ =

y

• •

•• •

••

••

••••

••

•

MARS

Basic elements in the regression with MARS.

• Let us consider

• The goal is to construct reflected pairs for each input

τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ

( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., ).=jX j p

y

• •

•• •

••

••

••••

••

•

MARS


• Let us consider


τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ

( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., )jX j p=

y

• •

•• •

••

••

••••

••

•

MARS


• Let us consider


( ) ,Y f X ε= + 1 2, ,..., pX X X X Τ= ( )

( 1,2,..., )jX j p=

τ x

• ••

••

•••

+( , )=[ ( )]c x x +τ + −τ( , )=[ ( )]-c x x +τ − −τ r egression w ith

• Set of basis functions:

• Thus, can be represented by

{ } { }{ }1, 2, ,: ( ) , ( ) , ,..., , 1,2,...,|j j j j N jX X x x x j pτ τ τ+ +℘ = − − ∈ ∈% % %

( )f X

MARS

• are basis functions from or products of two or more such functions; interaction basis functions are created by multiplying an existing basis function with a truncated linear function involving a new variable.

• Provided the observations represented by the data

01

( ) .M

m mm

Y Xθ θ ψ ε=

= + +∑

( 1,2,..., )m m Mψ = ℘

( , ) ( 1,2,..., ) :i iy i N=x

1

( ) : [ ( )] .m

m m mj j j

K

mj

s xκ κ κ

ψ τ +=

= ⋅ −∏x

• Two subalgorithms:

(i) Forward stepwise algorithm:

• Search for the basis functions.• Minimization of some “lack of fit” criterion. • The process stops when a user-specified value is reached. M

MARS

• The process stops when a user-specified value is reached.

• Overfitting.So a backward deletion procedure is appliedby decreasing the complexity of the model without degrading the fit to the data.

(i) Backward stepwise algorithm:

maxM

• Remove from the model basis functions that contribute to the smallest increasein the residual squared error at each stage, producing an optimally estimated model with respect to each number of terms, called .

• is related with some complexity of the estimation.• To estimate the optimal value of :

f̂αα

αα

MARS

• Alternative:

2

=12

( ( ))GCV :=

(1 ( ) )

N

i ii

ˆy f

N

α

α

−

−

∑ x

M

( ) := +

:=

:=

:=

:=

u d K

N

u

K

d

αM

number of samples

number of independent basis functions

number of knots selected by forward stepwise algorithm

cost of optimal basis

PRSS for MARS

( )max

1 2

2 22 2,

1 1 1, ( )( , )

: ( ) ( )α

αα α α

λ θ ψ= = = <

∈=

= − + ∑ ∑ ∑ ∑ ∫MN

m mi i m m r s m

i m r sr s V m

PRSS y f D dx t t

{ }( ) : | 1,2,...,

: ( , ,..., )

mj m

m T

V m j K

t t t

κ= =

t =

• Tradeoff between both accuracy and complexity. • Penalty parameters .

{ }

1 2

1 2

1 2 1 2

: ( , ,..., )

( , )

: , , 0,1

Km

m Tm m mt t t

α α αα α α α α

== + ∈

t =

where ( )1 2, ( ) : ( )m m m m

r s m m r sD t tα αα αψ ψ= ∂ ∂ ∂t t

mλ

Knot Selection

Grid Selection

Grid SelectionMotivatio

n

( )max

max

1 11 11, ( ),..., ( ), ( ),..., ( )

TMM Mi i M i M i M ix xψ ψ ψ ψ+

+( ) :=ψψψψ d x x

( )max

,M

Τθ θ θ0 1:= , ,...,θθθθ

max1 2 1 2: ( , ,..., , , ,..., )d MM M M Ti i i i i i ix x x x x x+ +=

{ } { }1,2,...,( ) 0,1,2,..., 1Kmj

mj K Nκσ ∈ ∈ +

K

CQP and Tikhonov Regularization for MARS

1 21 21 2

, , ,

ˆ , ,..., ,m m mm m K mm

KmKm

mi

l l l

x x xκ κ κ

κ κ κσ σ σκ κ κ

=

x1, ,1

ˆ :m

m mj jm m

j jj j

Kmi

l lj

x xκ κ

κ κσ σκ κ

+=

∆ = −

∏x

( )1( ) : ( ),..., ( )T

N=d d dψ ψ ψψ ψ ψψ ψ ψψ ψ ψ

12

1 2

2 2

,1

, ( )( , )

ˆ ˆ: ( ) .m mim r s m i i

r sr s V m

L Dα

αα α α

ψ= <

∈=

= ∆

∑ ∑ x x

L is an matrix.max max( 1) ( 1)M M+ × +

• For a short representation, we can rewrite the approximate relation as

max ( 1)2 2 2

21 1

( ) .λ θ+

= =

= − + ∑ ∑KmM N

m im mm i

PRSS Ly dψ θψ θψ θψ θ

CQP and Tikhonov Regularization for MARS

• In case of the same penalty parameter , then:

Tikhonov regularization

2 2

22( ) .λ= − +PRSS y d Lψ θ θψ θ θψ θ θψ θ θ

2( : )mλ λ ϕ= =

CQP for MARS

• Conic quadratic programming:

,

2

min ,

( ) ,

.

tt

t

M

θ

≤

≤

d y

L

subject to ψ θ −ψ θ −ψ θ −ψ θ −

θθθθ2

.M≤Lθθθθ

2, ( 1,2,..., ).min ≤ − =− T

i ii iT q i kIn general :

xc x p xD x dsubject to

. Moreover, is a primal dual optimal solution if and only if 1 2( , , , , , )t χ η ω ωθθθθ

max 1

0 ( ): ,

1 0 0N

TM

tχ

+

− = +

d yψψψψθθθθ

CQP for MARS

maxmax

max

max

maxmax max

max

11

1

1

1 211 1

1 2

11 2

00: ,

0 0

0 0 10 1,

0( ) 0 0

0, 0,

, M

MM

TM

TTMN

T TMM M

T T

N

t

M

L L

η

ω ω

ω χ ω η

ω ω

++

+

+

++ +

+

= +

+ =

= =

∈ ∈

L

d L

θθθθ

ψψψψ

2

max 21

,

, .MNL Lχ η

+

++∈ ∈

• CQPs belong to the well-structured convex problems.

• Interior Point Methods.

• Better complexity bounds.

CQP for MARS

• Better practical performance.

C-MARS

• We had the following data:

X1 1,5554 1,5326 -0,1823 0,1627 0,5687 0,1706 0,2041 -0,1823 -0,82 -0,7234 0,4446 -0,3291 -1,5583 1,2706 1,7555

X2 0,1849 1,1538 0,7586 -1,5363 1,906 0,3761 1,3323 -0,0064 -1,7275 1,141 0,3761 0,5673 -0,1976 0,7586 0,1849

X3 1,264 1,2023 -1,0995 0,8529 1,3051 -0,3802 -0,7913 0,1336 0,2363 -1,0995 -0,0719 -0,894 -1,0995 0,9557 1,5722

X4 1,2843 1,0175 -0,9676 0,7408 1,0635 -0,506 -0,7937 -0,0564 0,0455 -0,9676 -0,2482- 0,8557 -0,9676 0,8707 1,7339

X5 -0,7109 0,1777 0,1422 0,0355 3,2699 0,3554 -0,1777 1,5283 -0,0711 0,3554 0,8886 0,4621 -0,9241 -0,9241 -0,0711

Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263

Numerical Experience and Comparison

Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263

X1 0,0474 -0,8713 -0,2158 0,2179 1,5426 -1,16 0,9857 0,6752 0,5402 -1,4528 1,9349 -0,8299 -0,681 0,7304 -1,1305

X2 0,9498 -0,1976 -1,7275 -0,9626 1,3323 -0,9626 0,1849 -1,345 1,3323 -0,0064 0,1849 0,3761 -1,345 -0,7713 -0,0064

X3 0,0308 -0,6885 1,0584 0,5446 0,5446 -0,483 0,4419 1,264 0,0308 -1,3051 2,086 -0,5857 -0,2775 1,5722 -1,3051

X4 0,1543 -0,7278 1,0046 0,3752 0,3752 -0,5839 0,2613 1,2843 -0,1543 -1,0635 2,5631 -0,6578 -0,4241 1,7339 -1,0635

X5 1,1018 0,6753 -0,391 -0,2843 1,4217 0,4621 -0,8175 0,7819 0,2488 1,5283 -0,1777 -1,7771 0,4621 -1,0307 0,3554

Y 1,1477 -0,3916 -0,4624 -1,0993 2,8639 -1,0285 0,1923 -0,7631 2,05 1,0238 0,9177 -1,2055 -0,3208 -0,5862 -0,6216

• We constructed model functions for these data using the MARS Software where we selected the maximum number of basis elements: . Then,5

maxM =

1 2BF = max{0, + 1.728};

= -1.081 + 0.626 * BF

ωModel 1 : = 1

X

Y

1 2BF = max{0, + 1.728};

BF = max{0, - 0.462}* BF

ωModel 2 : = 2

X

X


1 = -1.081 + 0.626 * BFY 2 5 1

1 2

BF = max{0, - 0.462}* BF

= -1.073 + 0.499* BF 0.656 BF+X

Y *

1 2

2 5 1

4 3 1

1 2 4

BF = max{0, + 1.728};

BF = max{0, - 0.462} * BF

BF = max{0, + 0.586} * BF

-1.176 + 0.422 * BF + 0.597 * BF + 0.236 * BF

ωModel 3 : = 3best mod >>>

el

=

X

X

X

Y

• and, finally,

1 2

2 5 1

3 5 1

4 3 1

BF = max{0, + 1.728}

BF = max{0, - 0.462} * BF

BF = max{0, 0.462 - } * BF

BF = max{0, + 0.586} * BF

-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF

ωModel 4 : = 4

=

X

X

X

X

Y


1 2 3-1.242 + 0.555 * BF + 0.484 * BF - 0.093 * BF

+ 0.22

= Y

46 * BF

1 2

2 5 1

3 5 1

4 3 1

5 3 1

1

BF = max{0, + 1.728};

BF = max{0, - 0.462} * BF

BF = max{0, 0.462 - } * BF

BF = max{0, + 0.586} * BF

BF = max{0, - 0.586 - } * BF

= -1.248 + 0.487 * BF + 0.486 * B

ωModel 5 : = 5

X

X

X

X

X

Y 2 3 4 5F - 0.118 * BF + 0.282 * BF + 0.263 * BF

• Then, we considered a large model with 5 five basis functions; we found (writing a MATLAB code):

0 0 0 0 0 0

0 1.8419 0 0 0 0

0 0 0.7514 0 0 0

0 0 0 0.9373 0 0

=

L


• We constructed models using different values for in the optimization problem, which was solved by MOSEK (CQP).

• Our algorithm constructs a model with 5 parameters always; in case of Salford, there are 1, 2, 3, 4 or 5 parameters.

0 0 0 0.9373 0 0

0 0 0 0 2.1996 0

0 0 0 0 0 0.3905

M

1 17.6425 4.2003 1.1531 0.771

2

RESULTS OF SALFORD MARS

z = RSS2

θL t = GCVRSSω


2 11.1870 3.3447 1.0430 0.613

3 7.7824 2.7897 1.0368 0.550

4 6.6126 2.5715 1.1967 0.626

5 6.2961 2.5092 1.1600 0.840

RESULTS OF OUR APPROACH

0.05 5 5.16894 0.05 0.2940 5 4.2024 0.2940

0.1 5 4.959342 0.1 0.2945 5 4.2006 0.2945

0.15 5 4.755559 0.15 0.295 5 4.1988 0.2950

0.2 5 4.557617 0.2 0.3 5 4.180557 0.3

0.25 5 4.365811 0.25 0.35 5 4.002338 0.35

M ω z = RSS2

θL t = M z = RSS2

θL t =ω


0.25 5 4.365811 0.25 0.35 5 4.002338 0.35

0.265 5 4.3095 0.2650 0.4 5 3.831675 0.4

0.275 5 4.2723 0.2750 0.45 5 3.669118 0.45

0.285 5 4.2354 0.2850 0.5 5 3.515233 0.5

0.2865 5 4.2299 0.2865 0.55 5 3.370588 0.55

0.2875 5 4.2262 0.2875 0.552 5 3.3650 0.5520

0.2885 5 4.2226 0.2885 0.555 5 3.3567 0.5550

0.2895 5 4.2189 0.2895 0.558 5 3.3483 0.558

0.28965 5 4.2183 0.2897 0.560 5 3.3428 0.5600

0.28975 5 4.2180 0.2897 0.561 5 3.3401 0.5610

0.28985 5 4.2176 0.2899 0.562 5 3.3373 0.5620

0.28995 5 4.2172 0.2899 0.565 5 3.3291 0.5650

0.575 5 3.3019 0.5750 0.96 5 2.5968 0.96

0.585 5 3.2751 0.5850 0.97 5 2.5880 0.97

0.595 5 3.2488 0.5950 0.98 5 2.5797 0.98

0.6 5 3.235746 0.6 0.99 5 2.5718 0.99

0.65 5 3.111253 0.6 5 1 5 2.564459 1

0.7 5 2.997622 0. 7 2 5 2.509165 1.16009

M z = RSS2

θL t = M z = RSS2

θL t =ω ω


0.7 5 2.997622 0. 7 2 5 2.509165 1.16009

0.75 5 2.895324 0.7 5 2.1 5 2.509165 1.16009

0.8 5 2.804764 0.8 2.2 5 2.509165 1.16009

0.805 5 2.7964 0.8050 2.3 5 2.509165 1.16007

0.810 5 2.7881 0.8100 2.4 5 2.509165 1.16008

0.820 5 2.7719 0.8200 2.5 5 2.509165 1.16001

0.830 5 2.7562 0.8300 2.6 5 2.509165 1.16007

0.840 5 2.7410 0.8400 2.7 5 2.509165 1.16007

0.85 5 2.726261 0.85 2.8 5 2.509165 1.16009

0.9 5 2.660023 0.9 2.9 5 2.509165 1.16009

0.95 5 2.60612 0.95 3 5 2.509165 1.16009

4 5 2.509165 1.160084

. We drew L curves: 2

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ− 4.5

5

5.5

4.5

5

5.5

2d

yψ

θ−

ψθ

−ψ

θ−

ψθ

−

4.5

5

5.5

4.5

5

5.5


• Conclusion: Based on the L curve criterion and for the given data, our solution is better than Salford solution for MARS.

()

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ−

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

2Lθθθθ

2Lθθθθ

()

dy

ψθ

−ψ

θ−

ψθ

−ψ

θ−

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

0 0.2 0.4 0.6 0.8 1 1.2 1.42.5

3

3.5

4

• All test data sets are also compared according to the performance measure such as MSE, MAE, Correlation Coefficient, R2, PRESS, Mallows’ Cp etc..

• These measures are based on the average of nine values (one for each fold and each replication).


C - M A R S

Please find much more numerical experience and comparison in

Yerlikaya, Fatma,

A New Contribution to Nonlinear Robust Regression and Classification with MARS and Its Application to Data Mining for Quality Control in Manufacturing,


MSc. Thesis at Institute of Applied Mathematics of METU, Ankara, 2008.

Piecewise Linear Functions - Stock Market

figures generated byErik KropatErik KropatErik KropatErik Kropat

Forward Stepwise Algorithm Revisited

high complexity

Forward Stepwise Algorithm Revisited

Regularization & Uncertainty Robust Optimization

Laurent El Ghaoui

• ••

Regularization & Uncertainty Robust Optimization

• Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press,2004.

• Breiman, L., Friedman, J. H., Olshen, R., and Stone, C., Classification and Regression Trees, Belmont, CA:Wadsworth Int. Group, 1984.

• Craven, P., and Wahba, G., Smoothing noisy data with spline functions: estimating the correct degree ofsmoothing by the method of generalized cross-validation, Numerische Mathematik 31 (1979) 377-403.

• Friedman, J.H., Multivariate adaptive regression splines, The Annals of Statistics 19, 1 (1991) 1-141.

• Hansen, P.C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,SIAM, Philadelphia, 1998.

References

• Hastie, T., Tibshirani, R., and Friedman, J.H., The Element of Statistical Learning, Springer Verlag, NY, 2001.

• MOSEK SOFTWARE, http://www.mosek.com/ .

• Myers, R.H., and Montgomery, D.C., Response Surface Methodology: Process and ProductOptimization Using Designed Experiments,New York: Wiley (2002).

• Nemirovski, A., Lectures on modern convex optimization, Israel Institute Technology (2002), http://iew3.technion.ac.il/Labs/Opt/LN/Final.pdf.

• Nesterov, Y.E., and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993.

• Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by generalized additive models and continuous optimization for modern applications in finance, science and technology, Optimization, 56, 5–6, October–December (2007) 675–698.

• P. Taylan, P., Weber , G.-W., and Yerlikaya, F., Continuous optimization applied in MARS for modern applications in finance, science and technology, in ISI Proceedings of 20th Mini-EURO Conference Continuous Optimization and Knowledge-Based Technologies, Neringa, Lithuania, May 20-23, 2008.

Regression Theory

Education

Transcript of Regression Theory