Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.
-
Upload
letitia-cannon -
Category
Documents
-
view
230 -
download
4
Transcript of Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.
![Page 1: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/1.jpg)
Introduction to Smoothing Splines
Tongtong WuFeb 29, 2004
![Page 2: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/2.jpg)
Outline Introduction
Linear and polynomial regression, and interpolation
Roughness penalties Interpolating and Smoothing splines
Cubic splines Interpolating splines Smoothing splines Natural cubic splines Choosing the smoothing parameter Available software
![Page 3: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/3.jpg)
Key Words roughness penalty penalized sum of squares natural cubic splines
![Page 4: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/4.jpg)
Motivation51015246810
Index(y18)
![Page 5: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/5.jpg)
Motivation51015246810
Indexy18
![Page 6: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/6.jpg)
Motivation51015246810
Indexy18
![Page 7: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/7.jpg)
Motivation51015246810
Index(y18)
Spline(y18)
![Page 8: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/8.jpg)
Introduction Linear and polynomial regression :
Global influence Increasing of polynomial degrees happens in
discrete steps and can not be controlled continuously
Interpolation Unsatisfactory as explanations of the given
data
![Page 9: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/9.jpg)
Roughness penalty approach A method for relaxing the model
assumptions in classical linear regression along lines a little different from polynomial regression.
![Page 10: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/10.jpg)
Roughness penalty approach Aims of curving fitting
A good fit to the data To obtain a curve estimate that does not
display too much rapid fluctuation Basic idea: making a necessary
compromise between the two rather different aims in curve estimation
![Page 11: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/11.jpg)
Roughness penalty approach Quantifying the roughness of a curve
An intuitive way:
(g: a twice-differentiable curve) Motivation from a formalization of a
mechanical device: if a thin piece of flexible wood, called a spline, is bent to the shape of the graph g, then the leading term in the strain energy is proportional to
{ }∫b
adttg 2)(''
∫ 2''g
![Page 12: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/12.jpg)
Roughness penalty approach Penalized sum of squares
g: any twice-differentiable function on [a,b] : smoothing parameter (‘rate of exchange’
between residual error and local variation)
Penalized least squares estimator
{ } { }∫∑ +−==
b
a
n
iii dttgtgYgS 2
1
2 )('')()( α
α
)(minargˆ gSg =
![Page 13: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/13.jpg)
Roughness penalty approachCurve for a large value of α51015
246810Indexy18
![Page 14: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/14.jpg)
Roughness penalty approachCurve for a small value of α51015
246810Indexy18
![Page 15: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/15.jpg)
Interpolating and Smoothing Splines Cubic splines
Interpolating splines
Smoothing splines
Choosing the smoothing parameter
![Page 16: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/16.jpg)
Cubic Splines Given a<t1<t2<…<tn<b, a function g is a
cubic spline if
1. On each interval (a,t1), (t1,t2), …, (tn,b), g is a
cubic polynomial
2. The polynomial pieces fit together at points ti
(called knots) s.t. g itself and its first and second derivatives are continuous at each ti,
and hence on the whole [a,b]
![Page 17: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/17.jpg)
Cubic Splines How to specify a cubic spline
Natural cubic spline (NCS) if its second and third derivatives are zero at a and b, which implies d0=c0=dn=cn=0, so that g is
linear on the two extreme intervals [a,t1]
and [tn,b].
123 for )()()()( +≤≤+−+−+−= iiiiiiiii tttattbttcttdtg
![Page 18: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/18.jpg)
Natural Cubic SplinesValue-second derivative representation We can specify a NCS by giving its value
and second derivative at each knot ti.
Define
which specify the curve g completely. However, not all possible vectors
represent a natural spline!
)( where,)',,( 1 iin tggggg == L
)('' where,)',,( 12 iin tg== − γγγγ L
![Page 19: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/19.jpg)
Natural Cubic SplinesValue-second derivative representation Theorem 2.1
The vector and specify a natural spline g if and only if
Then the roughness penalty will satisfy
g γ
γRgQ ='
KggRdttgb
a'')('' 2 ==∫ γγ
![Page 20: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/20.jpg)
Natural Cubic SplinesValue-second derivative representation
)2(
11
13
13
12
12
12
12
11
11
00
00
0
0
00
−×−−
−
−−−
−−−
−
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
−−−−
=
nnnh
hhhh
hhhh
Q
L
MOMM
L
L
L
L
)2()2(12
322
231
)(3
100
0)(3
1
6
1
06
1)(
3
1
−×−−− ⎥
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
+
+
+
=
nnnn hh
hhh
hhh
R
L
MOMM
L
L
nitth iii ,,1for 1 K=−= +
![Page 21: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/21.jpg)
Natural Cubic SplinesValue-second derivative representation R is strictly diagonal dominant, i.e.
R is positive definite, so we can define
'1QQRK −=
irrij ijii ∀>∑ ≠
,||||
![Page 22: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/22.jpg)
Interpolating Splines To find a smooth curve that interpolate (ti,zi),
i.e. g(ti)=zi for all i.
Theorem 2.2
Suppose and t1<…<tn. Given any values
z1,…,zn, there is a unique natural cubic spline
g with knots ti satisfying
2≥n
niztg ii ,,1for )( K==
![Page 23: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/23.jpg)
Interpolating Splines The natural cubic spline interpolant is the
unique minimizer of over S2[a,b] that
interpolate the data.
Theorem 2.3
Suppose g is the interpolant natural cubic spline, then
∫ 2''g
niztgbaSg ii ,,1for )(~ with ],[~2 K==∈
∫∫ ≥ 22 ''''~ gg
![Page 24: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/24.jpg)
Smoothing Splines Penalized sum of squares
g: any twice-differentiable function on [a,b] : smoothing parameter (‘rate of exchange’
between residual error and local variation)
Penalized least squares estimator
{ } { }∫∑ +−==
b
a
n
iii dttgtgYgS 2
1
2 )('')()( α
α
)(minargˆ gSg =
![Page 25: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/25.jpg)
Smoothing Splines1. The curve estimator is necessarily a
natural cubic spline with knots at ti, for i=1,…,n.
Proof: suppose g is the NCS
)~()( gSgS ≤⇒
g
{ } { }∑∑==
−=−n
iii
n
iii tgYtgY
1
2
1
2 )(~)(
{ } { }∫∫ ≤b
a
b
adttgdttg 22 )(''~)(''
![Page 26: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/26.jpg)
Smoothing Splines2. Existence and uniqueness
Let then
since be precisely the vector of . Express ,Kggg ''' 2 =∫
{ } )()'()(1
2 gYgYtgYn
iii −−=−∑
=
)',,( 1 nYYY K=
g )( itg
€
S(g) = (Y − g)'(Y − g) +αg'Kg
= g'(I +αK)g− 2Y 'g+Y 'Y
YKIg 1)( settingby achieved is Minimum −+= α
![Page 27: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/27.jpg)
Smoothing Splines2. Theorem 2.4
Let be the natural cubic spline with knots at ti for which . Then for any in S2[a,b]g
)()ˆ( gSgS ≤
YKIg 1)( −+= αg
![Page 28: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/28.jpg)
Smoothing Splines3. The Reinsch algorithm
The matrix has bandwidth 5 and is symmetric and strictly positive-definite, therefore it has a Cholesky decomposition
gQQRIgKIY )()( 1−+=+= αα
)'()1 γγαα RgQQYgQQRYg =−=−=⇒ − Q
γα )'(' QQRYQ +=⇒
)'( QQR α+
'' LDLQQR =+α
![Page 29: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/29.jpg)
Smoothing Splines3. The Reinsch algorithm for spline smoothing
Step 1: Evaluate the vector .Step 2: Find the non-zero diagonals of
and hence the Cholesky decomposition factors L and D. Step 3: Solve
for by forward and back substitution.Step 4: Find g by .
γ
YQ'
QQR 'α+
YQLDL '' =γ
γαQYg −=
![Page 30: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/30.jpg)
Smoothing Splines4. Some concluding remarks Minimizing curve essentially does not depend
on a and b, as long as all the data points lie between a and b.
If n=2, for any , setting to be the straight line through the two points (t1,Y1) and (t2,Y2) will reduce S(g) to zero.
If n=1, the minimizer is no longer unique, since any straight line through (t1,Y1) will yield a zero value S(g).
g
α g
![Page 31: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/31.jpg)
Choosing the Smoothing Parameter Two different philosophical
approaches Subjective choice Automatic method – chosen by data
Cross-validation Generalized cross-validation
![Page 32: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/32.jpg)
Choosing the Smoothing Parameter Cross-validation
Generalized cross-validation
{ }
αα
ααα
ith smoother w spline theis ˆ if )(1
)(ˆ
);(ˆ)(min
1
2
1
1
2)(1
gA
tgYn
tgYnCV
n
i ii
ii
n
ii
ii
∑
∑
=
−
=
−−
⎟⎟⎠
⎞⎜⎜⎝
⎛
−
−=
−=
( )
{ } 221
1
2
1
df)t (equivalen
squares of sum residual
)(1
)(ˆ
)(min×
=−
−=
−
=−∑ n
trAn
tgYnGCV
n
iii
αα
α
![Page 33: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/33.jpg)
Available Software
smooth.spline in R Description:
Fits a cubic smoothing spline to the supplied data. Usage:
plot(speed, dist)cars.spl <- smooth.spline(speed, dist)cars.spl2 <- smooth.spline(speed, dist, df=10)lines(cars.spl, col = "blue")lines(cars.spl2, lty=2, col = "red")
![Page 34: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/34.jpg)
Available SoftwareExample 1
library(modreg) y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4)) xx <- seq(1,length(y18), len=201) (s2 <- smooth.spline(y18)) # GCV (s02 <- smooth.spline(y18, spar = 0.2)) plot(y18, main=deparse(s2$call), col.main=2) lines(s2, col = "blue"); lines(s02, col = "orange"); lines(predict(s2, xx), col = 2) lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)
![Page 35: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/35.jpg)
Available Software
Example 1
![Page 36: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/36.jpg)
Available SoftwareExample 2 data(cars) ## N=50, n (# of distinct x) =19 attach(cars) plot(speed, dist, main = "data(cars) & smoothing splines") cars.spl <- smooth.spline(speed, dist) cars.spl2 <- smooth.spline(speed, dist, df=10)
lines(cars.spl, col = "blue") lines(cars.spl2, lty=2, col = "red") lines(smooth.spline(cars, spar=0.1))
## spar: smoothing parameter (alpha) in (0,1] legend(5,120,c(paste("default [C.V.] => df
=",round(cars.spl$df,1)), "s( * , df = 10)"), col = c("blue","red"), lty = 1:2, bg='bisque')
detach()
![Page 37: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/37.jpg)
Available Software
Example 2
![Page 38: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/38.jpg)
Extensions of Roughness penalty approach Semiparametric modeling: a simple application
to multiple regression
Generalized linear models (GLM)
To allow all the explanatory variables to be nonlinear
Additive model approach
εβ ++= ')( xtgY
ε+=∑=
d
jjj tgY
1
)(
ε+= )(tgY
![Page 39: Introduction to Smoothing Splines Tongtong Wu Feb 29, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022081506/56649cea5503460f949b5d0e/html5/thumbnails/39.jpg)
Reference P.J. Green and B.W. Silverman (1994)
Nonparametric Regression and Generalized Linear Models. London: Chapman & Hall