Estimating Term Structure with Penalized Splines · Estimating Term Structure with Penalized...
Transcript of Estimating Term Structure with Penalized Splines · Estimating Term Structure with Penalized...
Estimating Term Structure withPenalized Splines
David Ruppert
Operations Research and Industrial Engineering
Cornell University
Joint work with Robert Jarrow and Yan Yu
January 27, 2004
1
Outline
• Bond prices, forward rates, yields
• Empirical forward rate – noisy
• Modelling the forward rate
• Penalized least-squares
• Inadequacy of cross-validation
• Residual analysis – checking the noise assumptions
• Corporate term structure and credit spreads
• Asymptotics
2
Discount Function, Forward Rates, and Yields
• D(0, t) = D(t) is the discount function, the value at time 0 (now)
of a zero-coupon bond that pays $1 at time t.
Price(t)PAR
= D(t)
• f(t) is the current forward rate defined by
D(t) = exp−
∫ t
0
f(s)ds
for all t
• The yield is the average forward rate, i.e.,
y(t) =1t
∫ t
0
f(s)ds = −1t
logD(t)
3
Discount Function, Forward Rates, and Yields
0 5 10 15 20 25 300.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
pric
e
maturity
STRIPS on Dec 31, 1995: price = empirical discount function
4
Prices, Forward Rates, and Yields
0 5 10 15 20 25 30−1.8
−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
maturity
log(
pric
e)
STRIPS on Dec 31, 1995: log prices
5
Discount Function, Forward Rates, and Yields
0 5 10 15 20 25 300.048
0.05
0.052
0.054
0.056
0.058
0.06
0.062
0.064
maturity
−lo
g(pr
ice)
/t
STRIPS on Dec 31, 1995: empirical yields
6
Empirical Forward Rate
D(t) = exp−
∫ t
0
f(s)ds
for all t
f(t) = − d
dtlogD(t)
empirical forward = − logP (ti+1) − logP (ti)ti+1 − ti
P (t) = observed price at time t
7
Empirical Forward Rate
0 5 10 15 20 25 300.02
0.03
0.04
0.05
0.06
0.07
0.08
maturity
empi
rical
forw
ard
STRIPS on Dec 31, 1995: empirical forward rate
8
Modelling Coupon Bonds
• P1, · · · , Pn denote observed market prices of n bonds (coupon or
zero-coupon)
• Bond i has fixed payments Ci(ti,j) due on dates
ti,j , j = 1, . . . , Ni (Ni = 1 for zero-coupon bonds)
• Model price for the ith coupon bond:
Pi(δ) =Ni∑
j=1
Ci(ti,j) exp−
∫ ti,j
0
f(s, δ)ds
f(·, δ) is a model for the forward rate
9
Spline Model of Forward Rate
• f(s, δ) = δTB(s)
– B(s) is a vector of spline basis functions
– δ is a vector of spline coefficients
• ∴ F (t, δ) :=∫ t
0f(s, δ)ds = ty(t, δ) = δTBI(s)
– BI(t) :=∫ t
0B(s)ds.
10
Example: Quadratic Splines
B(s) =(1, s, s2, (s− κ1)2+, . . . , (s− κK)2+
)T
0.0 0.2 0.4 0.6 0.8 1.0
-0.0
50.
00.
050.
100.
15
Plus function with knot at 0.6
11
Lidar Data — Carefully Chosen Knots
range
log
ratio
400 500 600 700
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
There is a better way to get a smooth fit than selecting knots
15
Modelling the Forward Rate
From before:
B(t) =(
1, t, · · · , tp, (t− κ1)p+, · · · , (t− κK)p
+
)T
Therefore:
BI(t) :=∫ t
0
B(s)ds =(t · · · tp+1
p+1
(t−κ1)p+1+
p+1 · · · (t−κK)p+1+
p+1
)T.
16
Penalized Least-Squares
Qn,λ(δ) =1n
n∑
i=1
[h(Pi)− hPi(δ)
]2
+ λδTGδ
or equivalently
Qn,λ() =1
n
nXi=1
(h(Pi)− h
"NiXj=1
Ci (ti,j) expn−TBI(ti,j)
o#)2
+λTG.
• h is a monotonic transformation: “transform-both-sides” model
• λδTGδ is a “roughness” penalty
– λ ≥ 0
– G is positive semi-definite
17
Penalized Least-Squares
From previous slide:
Qn,λ() =1
n
nXi=1
(h(Pi)− h
"NiXj=1
Ci(ti,j) expn−TBI(ti,j)
o#)2
+λTG.
Several sensible choices for G
1. G is a diagonal matrix
• last K diagonal elements equal to one
• all others zero.
• penalizes jumps at the knots in the pth derivative of the spline.
2. quadratic penalty on the dth derivative∫ f (d)(s)2 ds
• uses Gij =∫
B(d)j (t)B(d)
k (t)dt
– Bj(t) is the jth element of B(t)
18
Linear Spline with 24 Knots Fit by Penalized LeastSquares
range
log
ratio
400 500 600 700
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
• Number of knots has little effect on fit provide it is at least 15
• Choice of λ is crucial
19
Using Zero Coupon Bonds
• Now assume we are using zeros, e.g., STRIPS
• Pi has a single payment of $1 at time ti
• Therefore,
Qn,λ(δ) =1n
n∑
i=1
(h(Pi)− h
[exp
−δTBI(ti)
])2
+ λδTGδ
20
Choosing the Knots
• κk is the k(K+1) th sample quantile of tin
i=1
• the ti are nearly equally spaced so the knots are also
21
Effective Number of Parameters of a Fit
There exists a matrix S(λ) such that
P1...
Pn
≈ S(λ)
P1...
Pn
• S(λ) is called the smoother matrix or hat matrix
• DF(λ) := traceS(λ) is called the degrees of freedom of the fit
or the effective number of parameters
22
Generalized Cross-Validation
GCV (λ) =n−1
∑ni=1
[h(Pi)− h
Pi(δ)
]2
1− n−1θ DF(λ)2 ,
• one chooses λ to minimize GCV(λ)
• θ is a user-specified tuning parameter
• θ = 1 is ordinary GCV
• Fisher, Nychka, and Zervos used θ = 2
– this causes more smoothing
– Question: why doesn’t ordinary GCV work well here?
23
EBBS
• To estimate MSE add together:
– estimated squared bias
– estimated variance
• Gives MSE(f ; t, λ), the estimated MSE of f at t and λ.
– then∑n
i=1 MSE(f ; ti, λ) is minimized over λ
• EBBS estimates bias at any fixed t by
– computing the fit at t for a range of values of the smoothing
parameter
– fitting a curve to model bias
24
EBBS – Estimating Bias
• to the first order, the bias is γ(t)λ for some γ(t)
• Let f(t, λ) be f depending on maturity and λ
• Computeλ`, f(t, λ`)
, ` = 1, . . . , L
– λ1 < . . . < λL is the grid of values of λ
– we used L = 50 values of λ
– log10(λ`) were equally spaced between −7 and 1
– DF(10) = 4.8 and DF(10−7) = 28.9 for a 40-knot cubic spline
fit
25
EBBS – Estimating Bias
• For any fixed t, fit a straight line to the data
(λi, f(t, λi) : i = 1, . . . , L• slope of the line is γ(t)
• estimate of squared bias at t and λ` is (γ(t) λ`)2
26
EBBS versus GCV
0 5 10 15 20 25 30
0.045
0.05
0.055
0.06
0.065
0.07fo
rwar
d ra
te
time to maturity
EBBSGCV θ=3GCV θ=1Empirical
27
EBBS Fit: Residual Analysis
0 10 20 30−6
−4
−2
0
2
4
6x 10
−3
maturity
resi
dual
− lo
g tr
ansf
orm
0 5 10 15 20−0.5
0
0.5
1
Lag
Sam
ple
Aut
ocor
rela
tion
Sample Autocorrelation Function (ACF)
−4 −2 0 2 4 6
x 10−3
0.0030.01 0.02 0.05 0.10
0.25
0.50
0.75
0.90 0.95 0.98 0.99
0.997
Data
Pro
babi
lity
Normal Probability Plot
2.5 3 3.5 4 4.5 50
1
2
3
4
5
6x 10
−3
log(fitted value)
abso
lute
res
idua
l
Residuals from fit using h(·) = log(·)
28
Geometry of Transformations
0 1 2 3 4 5 6 7 8−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Y
log(
Y) tangent line at Y=5
tangent line at Y=1
29
Strength of a Transformation
• Suppose y1 < y2
• strength of a transformation h:
strength =h′(y2)h′(y1)
− 1
• Example:
h(y;α) =yα − 1
αif α 6= 0
= log(y) if α = 0
•
strength :=(
y2
y1
)α−1
− 1
> 0 if α > 1
< 0 if α < 1
30
Strength of a Transformation
−1 −0.5 0 0.5 1 1.5 2
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
alpha
stre
ngth
y2/y
1 = 2
31
Transformation and Weighting
• log is the linearizing transformation
– convenient
– induces some heteroscedasticity, but not enough to cause a
problem
• logP (t)/t = − yield
– cause severe heteroscedasticity – avoid
Transformation and weighting should be done primarily to inducethe assumed noise distribution, which is:
• normal
• constant variance
32
Transformation and Weighting
10 20 30 40 50 60 70 80 90 1000
0.02
0.04
0.06
0.08
0.1
0.12
fitted value
abso
lute
res
idua
lno transform
33
Transformation and Weighting
2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
3x 10
−3
fitted value
abso
lute
res
idua
l
log transform
34
Transformation and Weighting
4 5 6 7 8 9 100
1
2
3
4
5
6
7x 10
−3
fitted value
abso
lute
res
idua
lsquare root transform
35
Transformation and Weighting
0 5 10 15 20 25 300
1
2
3
4
5
6
7
8
9
10
fitted value
abso
lute
res
idua
llog transform weight by 1/t
36
Modelling the Correlation
• open problem
• probably not stationary
• simulations show that stationary AR and MA processes do not
have the same problem with GCV as seen with actual price data
37
Modelling Corporate Term Structure
fC(t) = fTr(t) + α0 + α1t + α2t2
• α0 + α1t + α2t2 is the credit spread
• H0: α1 = α2 = 0 is accepted for AT&T data
• α0 > 0 for the AT&T data
38
Modelling Corporate Term Structure
0
5
10
15
20
0
5
10
15
20
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Years to Maturity
AT&T
Apr 94 − Dec 95
forw
ard
rate
s
39
Modelling Corporate Term Structure
Question: Should one smooth over both date and time to maturity?
40
Asymptotics
The PLS estimator is the solution to
n∑
i=1
ψi(δ, λ,G) = 0
for an appropriate ψi(·, ·, ·)
41
Asymptotics: λ → 0
Theorem 1
• let δn,λn be a sequence of penalized least squares estimators
• assume typical “regularity” assumptions
• suppose λn is o(1)
• then δn is a (strongly) consistent for δ0
• if λn is o(n−1/2), then
√n
(δn,λn − δ0
)D→ N
0, σ2Ω−1(δ0)
,
where
Ω(δ0) := limn
Σn, Σn = σ−2n−1n∑
i=1
Eψi(δ, λ,G)ψi(δ, λ,G)T
42
Asymptotics: λ fixed
• assume λn ≡ λ
• the bias does not shrink to 0
– limit of δn,λ solves
limn→∞
E
n−1
n∑
i=1
ψi(δ, λ,G)
= 0
• the large sample variance formula is
Varδ(λ) =σ2
n
[Σn + λG−1ΣnΣn + λG−1].
43
Summary
• splines are convenient for estimating term structure
• penalization is better, or at least easier, than knot selection
• EBBS provides a reasonable amount of smoothing
• GCV undersmooths because
– noise is correlated
– target function is a derivative
• corporate term structure can be estimated by “borrowing
strength” from treasury bonds
• a constant credit spread fits the data reasonably well
• asymptotics are available for inference
44
References
Jarrow, R., Ruppert, D., and Yu, Y. (2004) Estimating the interest
rate term structure of corporate debt with a semiparametric
penalized spline model, JASA, to appear.
Available at:
http://www.orie.cornell.edu/~davidr
• see “Recent Papers”
• also see “Recent Talks” for these slides
45
References
Ruppert, D. (2004) Statistics and Finance: An Introduction,
Springer, New York – splines, term structure, transformations
Ruppert, D., Wand, M.P., and Carroll, R.J. (2003) Semiparametric
Regression, Cambridge University Press, New York – splines
Carroll, R.J. and Ruppert, D. (1988), Transformation and Weighting
in Regression, Chapman & Hall, New York. – transform-both-sides
46