Affine Term Structure Models: An Introduction · Affine Term Structure Models: An Introduction...

Post on 06-Jun-2020

7 views 0 download

Transcript of Affine Term Structure Models: An Introduction · Affine Term Structure Models: An Introduction...

Affine Term Structure Models: An Introduction

Jens H. E. Christensen

Federal Reserve Bank of San Francisco

Term Structure Modeling and the Lower Bound Problem

Day 1: Term Structure Modeling in Normal Times

Lecture I.1

European University InstituteFlorence, September 7, 2015

The views expressed here are solely the responsibility of the author and should not be interpreted as reflecting the views

of the Federal Reserve Bank of San Francisco or the Board of Governors of the Federal Reserve System.

1 / 53

Overview of Course

Day 1: Term Structure Modeling in Normal Times

Affine term structure models and their estimationArbitrage-free Nelson-Siegel modelsStochastic volatilityFinite-sample bias and other practical issues

Day 2: The Lower Bound Problem

Models that respect lower bounds for yieldsHow do they compare to affine models?Which challenges are solved? And which remain?

Day 3: Applications to Policy Questions

How does QE work?What are the potential costs or risks of QE?How to extract market-based inflation expectations?

2 / 53

Day 1: Term Structure Modeling in Normal Times

1. Lecture:Affine Term Structure Models: An Introduction

2. Lecture:The Affine Arbitrage-Free Class of Nelson-Siegel TermStructure Models

3. Lecture:An Arbitrage-Free Generalized Nelson-Siegel TermStructure Model

4. Lecture:Can Spanned Term Structure Factors Drive StochasticVolatility?

5. Lecture:How Efficient is the Kalman Filter at Estimating AffineTerm Structure Models?

3 / 53

History and Background

The arbitrage-free term structure literature started with foundingpapers like Vasicek (1977) and Cox, Ingersoll, and Ross (1985).

Affine term structure models were then and remain the workhorsemodel classes thanks to their richness and tractability.

After two decades of studying one- and two-factor models, it wasclear by the 1990s that more is needed—3 factors at least.

Although multi-dimensional affine models come in many varieties,they can still be categorized as shown by Dai and Singleton (2000).

With multi-dimensional models the focus shifts from fit toforecasting and risk premiums. Duffee (2002) and Cheridito,Filipovic, and Kimmel (2007) give us essentially and extendedaffine risk premiums ⇒ Maximum affine flexibility is achieved!

With flexibility comes estimation problems. Attention moves from fitand risk premiums to econometric issues and identification.

Most recently, lower bounds of yields pose yet another challenge.4 / 53

The Ultimate Question

Can we solve challenges and keep all benefits?

Good fit

Good forecasting

Robust and tractable estimation

Respect of lower bounds

Goal of course is to provide a positive answer: Yes!

5 / 53

Outline of Presentation

Introduction of Affine ModelsDuffie and Kan (1996)

Market Prices of Risk

Characterization of Canonical Affine ModelsDai and Singleton (2000)

Risk Premiums in Affine ModelsCompletely affine — Dai and Singleton (2000)Essentially affine — Duffee (2002)Extended affine — Cheridito, Filipovic, and Kimmel (2007)

Kalman Filter Estimation of Affine Models

Other Estimation Schemes for Gaussian Affine ModelsJoslin, Singleton, and Zhu (2011)Hamilton and Wu (2012)

Conclusion6 / 53

The Class of Affine Term Structure Models (1)

Define a filtered probability space (Ω,F , (Ft),Q), where thefiltration (Ft) = Ft : t ≥ 0 satisfies the usual conditions(Williams, 1997).

The state variables Xt are assumed to be a Markov processdefined on a set M ⊂ Rn that solves the following stochasticdifferential equation

dXt = K Q(t)[θQ(t)− Xt ]dt + Σ(t)D(Xt , t)dW Qt ,

where

W Q is a Brownian motion in Rn contained in (Ft),

θQ : [0,T ] → Rn is a bounded, continuous function,

K Q : [0,T ] → Rn×n is a bounded, continuous function,

Σ : [0,T ] → Rn×n is a bounded, continuous function,

while D : M × [0,T ] → Rn×n has a diagonal structure ...7 / 53

The Class of Affine Term Structure Models (2)

... the stochastic part of the volatility matrix has a diagonalstructure:

D(Xt , t) =

√γ1(t) + δ1(t)Xt . . . 0

.... . .

...0 . . .

√γn(t) + δn(t)Xt

,

where

γ(t) =

γ1(t)...

γn(t)

and δ(t) =

δ11(t) . . . δ1

n(t)...

. . ....

δn1(t) . . . δn

n(t)

with

γ : [0,T ] → Rn is a bounded, continuous function,

δ : [0,T ] → Rn×n is a bounded, continuous function,

δj(t) will be used to denote the j th row of the δ(t)-matrix.8 / 53

The Class of Affine Term Structure Models (3)

Given this structure we have an affine diffusion process

dXt = K Q(t)[θQ(t)− Xt ]dt + Σ(t)D(Xt , t)dW Qt .

For asset pricing, we will also need the instantaneousrisk-free rate, which is assumed affine in Xt :

rt = ρ0(t) + ρ1(t)′Xt ,

whereρ0 : [0,T ] → R is a bounded, continuous function,ρ1 : [0,T ] → Rn is a bounded, continuous function.

Finally, for Xt to be well defined and “admissible,” it must bethe case that

γ j(t) + δj(t)Xt ≥ 0, 1 ≤ j ≤ n,

for all Xt ∈ M ⊂ Rn.9 / 53

The Class of Affine Term Structure Models (4)

For admissible Xt , Duffie and Kan (1996) show thatarbitrage-free bond prices will be exponential-affine

P(t ,T ) = EQt

[exp

(−∫ T

trudu

)]= exp

(B(t ,T )′Xt + A(t ,T )

),

where B(t ,T ) and A(t ,T ) are solutions to a system ofordinary differential equations (ODEs)

dB(t ,T )

dt= ρ1 + (K Q)′B(t ,T )− 1

2

n∑

j=1

(Σ′B(t ,T )B(t ,T )′Σ)j ,j(δj)′,

dA(t ,T )

dt= ρ0 − B(t ,T )′K QθQ − 1

2

n∑

j=1

(Σ′B(t ,T )B(t ,T )′Σ)j ,jγj ,

with boundary conditions A(T ,T ) = 0 and B(T ,T ) = 0.

Note that the possible time dependence of ρ0, ρ1, K Q, θQ, Σ,γ, and δ is suppressed in this notation. 10 / 53

The Class of Affine Term Structure Models (5)

This result implies that zero-coupon bond yields are affine:

y(t ,T ) = − 1T − t

ln P(t ,T ) = − 1T − t

B(t ,T )′Xt−1

T − tA(t ,T ).

Key point: Affine models only characterize the Q-dynamics.

Xt is an affine diffusion process.

rt is affine in Xt .

Bond yields are affine in Xt .

A(t ,T ) and B(t ,T ) are straightforward to solve.

Note: The result is silent about the P-dynamics of Xt .

For pricing and calibration exercises, only Q-dynamics areneeded. However, for forecasting and term premiumdecompositions, we will need the market prices of risk. 11 / 53

Market Prices of Risk (1)

Assume that Xt and rt are well-defined stochastic processes,but now formulated under the objective P probability measure

dXt = K P [θP − Xt ]dt + ΣD(Xt )dW Pt .

A standard assumption in finance is that there exists astochastic discount factor Mt with dynamics given by:

dMt

Mt= −rtdt − Λ′

tdW Pt .

Mt is also known as the pricing kernel or state price deflator.

Note that the process Λt is an n-dimensional vector.

12 / 53

Market Prices of Risk (2)

The pricing of the zero-coupon bond is given by

P(t ,T ) = EPt

[MT

Mt

].

By Ito’s lemma, Zt = ln Mt has dynamics

dZt = −rtdt − Λ′tdW P

t − 12Λ′

tΛtdt .

This implies that

ln MT = ln Mt −∫ T

Trsds −

∫ T

TΛ′

sdW Ps − 1

2

∫ T

tΛ′

sΛsds

or, equivalently,

MT

Mt= exp(−

∫ T

trsds −

∫ T

tΛ′

sdW Ps − 1

2

∫ T

tΛ′

sΛsds).

13 / 53

Market Prices of Risk (3)

Now, consider the process

Vt = exp(−∫ t

0Λ′

sdW Ps − 1

2

∫ t

0Λ′

sΛsds).

Provided

EP [exp(12

∫ T

0Λ′

sΛsds)] <∞ (Novikov’s condition),

it is the case that Vtt≤T is a P-martingale w.r.t. Ft and

dQ = VT dP ⇐⇒ dQdP

= VT (Radon-Nikodym derivative)

defines a probability measure for which (Girsanov’s theorem)

dW Qt = Λtdt + dW P

t

is a Brownian motion under this new Q-probability measure.

This has two implications ...14 / 53

Market Prices of Risk (4)

First, bond prices can be written as

P(t ,T ) = EPt

[MT

Mt

]=

Ω

MT

MtdP(ω)

=

Ω

exp(−∫ T

trsds −

∫ T

tΛ′

sdW Ps − 1

2

∫ T

tΛ′

sΛsds)dP(ω)

=

Ω

exp(−∫ T

trsds)× exp(−

∫ T

tΛ′

sdW Ps − 1

2

∫ T

tΛ′

sΛsds)dP(ω)

=

Ω

exp(−∫ T

trsds)dQ(ω) = EQ

t

[exp(−

∫ T

trsds)

]

as if investors were risk-neutral and just discount with rt .Second, under this risk-neutral measure, the dynamics are

dXt = K P [θP − Xt ]dt + ΣD(Xt)dW Pt ⇐⇒

dXt = K P [θP − Xt ]dt + ΣD(Xt)[dW Qt − Λtdt ] ⇐⇒

dXt =(

K P [θP − Xt ]− ΣD(Xt)Λt

)dt + ΣD(Xt)dW Q

t .

Thus, drift of Xt is adjusted with ΣD(Xt )Λt — the price of risk!15 / 53

Risk Premiums in Affine Models

To establish the connection between the P- and Q-dynamics,we need the functional form for the market prices of risk Λt .

Three specifications dominate the literature:

Completely affine risk premiums — Dai and Singleton(2000).

Essentially affine risk premiums — Duffee (2002).

Extended affine risk premiums — Cheridito, Filipovic,and Kimmel (2007).

Common trait: P- and Q-dynamics remain affine in Xt .

Note: With risk premiums specified, it is also possible tocharacterize so-called canonical representations of affinemodels as demonstrated by Dai and Singleton (2000).

16 / 53

Completely Affine Risk Premiums (1)

Early on, completely affine risk premiums were popular

Λt = D(Xt)λ1,

where λ1 is an n × 1 vector.

The measure change is

dW Qt = dW P

t − Λtdt .

Thus, the term to deduct from the drift of the P-dynamics is

ΣD(Xt)Λt = ΣD(Xt)2λ1 ⇐⇒

ΣD(Xt)Λt = Σ

γ1 + δ1Xt . . . 0...

. . ....

0 . . . γn + δnXt

λ11...λ1

n

.

Note: The covariance matrix of Λt is affine and Xt is affineunder both P and Q ⇒ Model is completely affine. 17 / 53

Completely Affine Risk Premiums (2)

As emphasized by Duffee (2002), completely affine riskpremiums are proportional to yield volatility.

In Gaussian models, this becomes particularly restrictive:

Constant volatility ⇒ Constant risk premiums ⇒ K P = K Q .

Thus, factor interactions have to be the same under bothprobability measures.

Also, the expectations hypothesis must hold!

18 / 53

Canonical Characterization of Affine Models (1)

Dai and Singleton (2000) classify affine term structuremodels and describe the canonical representation for eachmodel class using completely affine risk premiums

Λt = D(Xt)λ1.

They introduce the notation of Am(n) model classes:n is the number of factors.Subscript m is the number of square-root processes.

Problem: Invariant transformations imply that the samemodel (identical implications for the distribution of bondyields) can be formulated in an infinite number of ways.

Approach:Find a standard way of representing Am(n) models.Determine the maximally flexible admissible specificationthat is econometrically identifiable.This is the canonical representation of Am(n) models.

19 / 53

Canonical Characterization of Affine Models (2)

DS consider admissible affine processes with Q-dynamics

dXt = K Q[θQ − Xt ]dt + ΣD(Xt )dW Qt ,

rt = ρ0 + ρ′1Xt .

They use completely affine risk premiums

Λt = D(Xt)λ1.

This implies that the P-dynamics are

dXt = K Q[θQ − Xt ]dt + ΣD(Xt)Λtdt + ΣD(Xt)dW Pt

= K P [θP − Xt ]dt + ΣD(Xt)dW Pt ,

where

K P = K Q − Σdiag(λ1)δ,

θP = (K P)−1[K QθQ + Σdiag(λ1)γ].

20 / 53

Canonical Characterization of Affine Models (3)

The canonical representation of the Am(n) class ischaracterized by the following specifications.

K P =

(K CC

m×m 0m×(n−m)

K UC(n−m)×m K UU

(n−m)×(n−m)

)for m > 1.

For the Gaussian case (m = 0), DS specify K P as triangular,while Singleton (2006)—and subsequent literature—imposethe triangular property on K Q.

θP =

((θP)C

m×10(n−m)×1

).

Means of unrestricted processes are unidentified and fixed atzero.

Σ = I, γ =

(0m×1

1(n−m)×1

), δ =

(1m×m 0m×(n−m)

BCU(n−m)×m 0(n−m)×(n−m)

).

Restrictions on volatility structure ensure both admissibilityand identification. 21 / 53

Canonical Characterization of Affine Models (3)

DS impose additional parametric restrictions.

ρ1(i) ≥ 0, m + 1 ≤ i ≤ n.Unrestricted processes must have nonnegative loadings inthe short rate—otherwise their factor values can switch sign.

K Pi ,·θ

P > 0, 1 ≤ i ≤ m.Square-root processes must have positive drift atzero—otherwise zero is absorbing state.

K Pi ,j ≤ 0, 1 ≤ i ≤ m, i 6= j .

Square-root processes must be positively correlated.

θPi ≥ 0, 1 ≤ i ≤ m.

Square-root processes must have nonnegative drift.

δi ,j ≥ 0, m + 1 ≤ i ≤ n, 1 ≤ j ≤ m.Volatility sensitivities must be nonnegative. 22 / 53

Canonical Characterization of Affine Models (4)

Example: The canonical A1(2) model has P-dynamics(

dX 1t

dX 2t

)

=

(

κP11 0

κP21 κP

22

)[(

θP1

0

)

(

X 1t

X 2t

)]

dt+

( √

X 1t 0

0√

1 + δ21X 1

t

)

(

dW 1,Pt

dW 2,Pt

)

The completely affine market prices of risk are

ΣD(Xt)Λt =

(X 1

t 00 1 + δ2

1X 1t

)(λ1

1λ1

2

).

Thus, the Q-dynamics are

dXt = K P [θP − Xt ]dt + ΣD(Xt)dW Qt − ΣD(Xt )Λtdt ⇐⇒

(

dX 1t

dX 2t

)

=

[(

κP11 0

κP21 κP

22

)(

θP1

−λ12/κ

P21

)

(

κP11 + λ1

1 0κP

21 + δ21λ

12 κP

22

)(

X 1t

X 2t

)]

dt

+

( √

X 1t 0

0√

1 + δ21X 1

t

)

(

dW 1,Qt

dW 2,Qt

)

.

Note the limited flexibility between P- and Q-dynamics!23 / 53

Essentially Affine Risk Premiums (1)

Essentially affine risk premiums were introduced in Duffee(2002). To describe them, define the diagonal matrix:

D−1ess(Xt) =

(γ j + δjXt)

−1/2, if inf γ j(t) + δj(t)Xt > 0;0, otherwise.

Note that the denominators in D−1(Xt) already arewell-defined processes ⇒ No extra parameter restrictionsare required!

Now, Duffee (2002) considers the following risk premiums

Λt = D(Xt , t)λ1 + D−1ess(Xt)λ

2Xt ,

where λ2 is an n × n matrix.

Note: For the rows with 0 in D−1ess(Xt), we will not be able to

identify the corresponding rows in λ2 ⇒ They are zero!24 / 53

Essentially Affine Risk Premiums (2)

The two natural extremes are illustrative.

At one extreme, we have the Gaussian models.

For those classes of models, D−1ess(Xt) is given by

D−1ess(Xt) =

(γ1)−1/2 . . . 0...

. . ....

0 . . . (γn)−1/2

.

This means that D−1ess(Xt)λ

2Xt adds n × n parameters to Λt

that all load on Xt .

By implication, there is complete separation between thespecifications of K P and K Q in Gaussian models withessentially affine risk premiums.

25 / 53

Essentially Affine Risk Premiums (3)

At the other extreme, we have the An(n) models withsquare-root processes only.

For those classes of models, D−1ess(Xt) is given by

D−1ess(Xt) =

0 . . . 0...

. . ....

0 . . . 0

= 0!

Thus, completely and essentially affine risk premiums areidentical for An(n) models!

26 / 53

Essentially Affine Risk Premiums (4)

Example: The canonical A1(2) model has P-dynamics(

dX 1t

dX 2t

)

=

(

κP11 0

κP21 κP

22

)[(

θP1

0

)

(

X 1t

X 2t

)]

dt+

( √

X 1t 0

0√

1 + δ21X 1

t

)

(

dW 1,Pt

dW 2,Pt

)

Since

D−1ess(Xt) =

(0 00 (1 + δ2

1X 1t )

−1/2

),

the essentially affine market prices of risk are

ΣD(Xt )Λt =

(X1

t 00 1 + δ2

1X1t

)(λ1

1λ1

2

)+

(0 00 1

)(0 0λ2

21 λ222

)(X1

tX2

t

).

Thus, the Q-dynamics are

dXt = K P [θP − Xt ]dt + ΣD(Xt)dW Qt − ΣD(Xt )Λtdt ⇐⇒

(

dX 1t

dX 2t

)

=

(

κP11 0

κP21 κP

22

)(

θP1

−λ12/κ

P21

)

dt

(

κP11 + λ1

1 0κP

21 + δ21λ

12 + λ2

21 κP22 + λ2

22

)(

X 1t

X 2t

)

dt+

( √

X 1t 0

0√

1 + δ21X 1

t

)

(

dW 1,Qt

dW 2,Qt

)

.

27 / 53

Extended Affine Risk Premiums (1)

Cheridito, Filipovic, and Kimmel (2007) introduce theextended affine risk premiums. To describe them, define thefollowing additional matrix:

D−1ext (Xt) =

(X j

t )−1/2, if X j

t > 0;0, otherwise.

Now, CFK consider the following extension of the essentiallyaffine risk premiums

Λt = D(Xt , t)λ1 + D−1ess(Xt)λ

2Xt + D−1ext (Xt)(λ

3 + λ4Xt),

where λ3 is an n × 1 vector

λ3 =

λ3

j , if X jt > 0;

0, otherwise,

while λ4 is an n × n diagonal matrix

λ4 =

λ4

ij ≤ 0, if i 6= j and X it ,X

jt > 0;

0, otherwise.28 / 53

Extended Affine Risk Premiums (2)

While for Gaussian A0(n) models there is no differencebetween essentially and extended affine risk premiums, theextension is particularly powerful for the An(n) models:

λ3 =

λ31......λ3

n

and λ4 =

0 λ412 . . . λ4

1nλ4

21 0 . . . λ42n

.... . .

...λ4

n1 . . . λ4n(n−1) 0

.

Thus, there is a total of n × n extra free parameters in thecanonical An(n) model with extended affine risk premiums.

In principle, this implies that the P- and Q-dynamics are fullyflexible relative to each other

dXt = K P [θP − Xt ]dt + ΣD(Xt)dW Pt

dXt = K Q[θQ − Xt ]dt + ΣD(Xt)dW Qt .

However, this is neglecting the sign restrictions in λ4.29 / 53

Extended Affine Risk Premiums (3)

Essentially affine risk premiums truly nest completely affinerisk premiums.

CFK claim (p. 129, l. 34) that the extended affine riskpremiums always nest both essentially and completely affinerisk premiums.

At face value, this is true (more parameters are indeedindisputably allowed!), but it does not come for free!

Under the extended affine risk premiums, the restrictedprocesses have to satisfy Feller conditions under bothprobability measures in addition to all the usual restrictionsfor existence and admissibility:

(K PθP)j > 0.5σ2jj for 1 ≤ j ≤ m,

(K QθQ)j > 0.5σ2jj for 1 ≤ j ≤ m.

30 / 53

Extended Affine Risk Premiums (4)

Problem: We need

EP[

exp(−∫ T

0Λ′

sdW Ps − 1

2

∫ T

0Λ′

sΛsds)]= 1.

If so, exp(−∫ T

0 Λ′sdW P

s − 12

∫ T0 Λ′

sΛsds) is a martingale and

Q = exp(−∫ T

0Λ′

sdW Ps − 1

2

∫ T

0Λ′

sΛsds)P

defines an equivalent probability measure and absence ofarbitrage is ensured.

Question: If Xt is a square-root process, is

EP [exp(−∫ T

0

1√Xs

dW Ps − 1

2

∫ T

0

1Xs

ds)] <∞?

This is about properties of stochastic exponentials.

Answer: Yes — provided Feller conditions are satisfied! 31 / 53

Extended Affine Risk Premiums (5)

Example: The canonical A1(2) model has P-dynamics(

dX 1t

dX 2t

)

=

(

κP11 0

κP21 κP

22

)[(

θP1

0

)

(

X 1t

X 2t

)]

dt+

( √

X 1t 0

0√

1 + δ21X 1

t

)

(

dW 1,Pt

dW 2,Pt

)

Since

D−1ess(Xt) =

(0 00 (1 + δ2

1X 1t )

−1/2

)and D−1

ext (Xt) =

((X 1

t )−1/2 0

0 0

),

the extended affine market prices of risk are

ΣD(Xt )Λt =

(X1

t 00 1 + δ2

1X1t

)(λ1

1λ1

2

)+

(0 00 1

)(0 0λ2

21 λ222

)(X1

tX2

t

)

+

((X1

t )−1/2 0

0 0

)[(λ3

10

)+

(0 00 0

)].

32 / 53

Extended Affine Risk Premiums (6)

Thus, the Q-dynamics are

dXt = K P [θP − Xt ]dt + ΣD(Xt)dW Qt − ΣD(Xt )Λtdt ,

which is equivalent to(

dX 1t

dX 2t

)=

(κP

11 0κP

21 κP22

)(θP

1 − λ31/κ

P11

−λ12/κ

P21

)dt

−(

κP11 + λ1

1 0κP

21 + δ21λ

12 + λ2

21 κP22 + λ2

22

)(X 1

tX 2

t

)dt

+

( √X 1

t 00

√1 + δ2

1X 1t

)(dW 1,Q

t

dW 2,Qt

).

Note: The Q-dynamics of X 1t are now completely detached

from its P-dynamics, but there are Feller conditions

κP11θ

P >12σ2

11 and κP11θ

P − λ31 >

12σ2

11.

33 / 53

Estimation of Affine Models

Dai and Singleton (2000) use simulated method of moments.

Duffee (2002) estimates affine models with QML and assumen yields observed without error.

Cheridito, Filipovic, and Kimmel (2007) use approximatemaximum likelihood estimation and n yields observed withouterror.

More recently, Joslin, Singleton, and Zhu (2011) andHamilton and Wu (2012) have offered efficient estimationalgorithms for Gaussian affine models.

Throughout the course, the Kalman filter is used for modelestimation, so details of this estimation procedure are brieflyprovided in the following.

34 / 53

Kalman Filter Estimation of Affine Models (1)

For affine Gaussian models, in general, the conditional meanvector and the conditional covariance matrix are

EP [XT |Ft ] = (I − exp(−K P∆t))θP + exp(−K P∆t)Xt ,

V P [XT |Ft ] =

∫ ∆t

0e−K P sΣΣ′e−(K P )′sds,

where ∆t = T − t is the time between observations.

We compute conditional moments of discrete observationsand obtain the state transition equation

Xt = (I − exp(−K P∆t))θP + exp(−K P∆t)Xt−1 + ξt .

35 / 53

Kalman Filter Estimation of Affine Models (2)

In the standard Kalman filter, the measurement equation is

yt = A + BXt + εt .

The assumed error structure is(ξt

εt

)∼ N

[(00

),

(Q 00 H

)],

where the matrix H is assumed diagonal, while the matrix Qhas the following structure:

Q =

∫ ∆t

0e−K P sΣΣ′e−(K P )′sds.

In addition, the transition and measurement errors areassumed orthogonal to the initial state.

Now, Kalman filtering is used to evaluate the likelihoodfunction. 36 / 53

Kalman Filter Estimation of Affine Models (3)

Under a stationarity assumption the filter can be initialized atthe unconditional mean and covariance matrix:

X0 = θP ,Σ0 =

∫∞

0 e−K P sΣΣ′e−(K P )′sds.

Let Yt = (y1, y2, . . . , yt) be the information available at time t ,and denote model parameters by ψ.

Consider period t–1 and suppose that the state update Xt−1

and its mean square error matrix Σt−1 have been obtained.

The prediction step is

Xt|t−1 = EP [Xt |Yt−1] = ΦX ,0t (ψ) + ΦX ,1

t (ψ)Xt−1,

Σt|t−1 = ΦX ,1t (ψ)Σt−1Φ

X ,1t (ψ)′ + Qt(ψ),

whereΦX ,0

t (ψ) = (I − exp(−K P∆t))θP ,ΦX ,1

t (ψ) = exp(−K P∆t),

Qt(ψ) =∫ ∆t

0 e−K P sΣΣ′e−(K P )′sds.37 / 53

Kalman Filter Estimation of Affine Models (4)

In the time-t update step, Xt|t−1 is improved by using theadditional information contained in Yt :

Xt = E [Xt |Yt ] = Xt|t−1 + Σt|t−1B(ψ)′F−1t vt ,

Σt = Σt|t−1 − Σt|t−1B(ψ)′F−1t B(ψ)Σt|t−1,

where

vt = yt − E [yt |Yt−1] = yt − A(ψ)− B(ψ)Xt|t−1,

Ft = cov(vt) = B(ψ)Σt|t−1B(ψ)′ + H(ψ),

H(ψ) = diag(σ2ε (τ1), . . . , σ

2ε (τN)).

At this point, the Kalman filter has delivered all ingredientsneeded to evaluate the Gaussian log likelihood, theprediction-error decomposition of which is

log l(y1, . . . , yT ;ψ) =

T∑

t=1

(−N

2log(2π)−1

2log |Ft |−

12

v ′t F

−1t vt

),

where N is the number of observed yields. 38 / 53

Kalman Filter Estimation of Affine Models (5)

I typically maximize the likelihood with respect to ψ using theNelder-Mead simplex algorithm.

Upon convergence, we obtain standard errors from theestimated covariance matrix

Ω(ψ) =1T

[ 1T

T∑

t=1

∂ log lt(ψ)∂ψ

∂ log lt(ψ)∂ψ

′]−1,

where ψ denotes the estimated model parameters.

Note: For non-Gaussian affine models, the only change isthat Qt and Σ0 are calculated with∫ T

texp(−K P(T − s))ΣD(EP [Xs|Xt ])D(EP [Xs|Xt ])

′Σ′ exp(−(K P)′(T − s))ds

using formulas from Fisher and Gilles (1996).39 / 53

Problems with Canonical Models

Empirical problems with canonical models:

Since Xt are latent factors, they may rotate during theestimation.

This leaves multiple maxima with close to identicallikelihood values but different yield decompositions.

Consequence: Two researchers may come up with verydifferent estimation results despite the fact that they usethe same data, the same model, and the sameestimation method.

Kim and Orphanides (2012) describe these problems for thespecific case of the A0(3) model. Duffee (2011) also containsan elaborate discussion. What to do?

40 / 53

Joslin, Singleton, and Zhu (2011) (1)

Joslin, Singleton, and Zhu (2011) consider a genericdiscrete-time representation of Gaussian affine models

∆Xt = K P0 + K P

1 Xt−1 + ΣXεPt ,

∆Xt = K Q0 + K Q

1 Xt−1 + ΣXεQt ,

rt = ρ0 + ρ1Xt .

Canonical: maximally flexible subject only to restrictionsrequired for econometric identification.

Let Xt be n-dimensional and assume that we observe J > nbond yields denoted yt .

For any full-rank matrix W ∈ Rn×J ,

P = Wyt

represents n portfolios of yields.41 / 53

Joslin, Singleton, and Zhu (2011) (2)

Assuming that the n portfolios are priced perfectly, JSZ useinvariant affine transformations to demonstrate that anycanonical Gaussian affine model has a uniqueobservationally equivalent representation

∆Pt = K P0P + K P

1PPt−1 + ΣPεPt ,

∆Pt = K Q0P + K Q

1PPt−1 + ΣPεQt ,

rt = ρ0P + ρ1PPt ,

where K Q0P , K Q

1P , ρ0P , and ρ1P are functions of (λQ, kQ∞,ΣP):

λQ are the eigenvalues of K Q1 .

ΣPΣ′P is the covariance matrix of innovations to the

portfolios.kQ∞ is long-run mean parameter.

42 / 53

Joslin, Singleton, and Zhu (2011) (3)

Key observation: Since P are observed factors, theP-conditional likelihood function of the observed yields is

f (yt |yt−1;ψ) = f (yt |Pt ;λQ, kQ

∞,ΣP ,Σε)×f (Pt |Pt−1;K P0P ,K

P1P ,ΣP).

Critical result: K P0P and K P

1P can be estimated by OLSindependent of ΣP . This delivers K P

0P and K P1P .

In the final step, λQ, kQ∞,ΣP ,Σε are estimated by maximizing

the likelihood function using K P0P and K P

1P as input and ΣP Σ′P

as initial guess for ΣPΣ′P :

f (y0t |yt−1;ψ) =

T∑

t=1

(2π)−(J−n)|Σε|−1

× exp(− 1

2||Σ−1

ε (yot − A(λQ , kQ

∞,ΣPΣ′

P)− B(λQ , kQ∞)′Pt)||2

)

× f (Pt |Pt−1; K P0P , K

P1P ,ΣP).

43 / 53

Joslin, Singleton, and Zhu (2011) (4)

The pricing factors P are linear combinations of yields. Theyfocus on the first N principal components:

Their P-dynamics follow an unconstrained VAR.

Therefore, no-arbitrage assumptions do not improve forecastperformance by themselves.

Two contributions:Computationally efficient estimation of Gaussian affinemodels.Forecast gains from using a Gaussian affine modelrelative to an unconstrained VAR must come fromadditional restrictions on the P-dynamics and not fromimposing absence of arbitrage.

Note: Matlab code is available on Ken Singleton’s website.44 / 53

Hamilton and Wu (2012) — Overview

Hamilton and Wu (2012, HW) describe another novel way toidentify and estimate Gaussian affine models.

Unlike other papers that emphasize how affine models should berepresented, HW’s focus is on how to estimate any given Gaussianaffine model without prejudice.

In the paper, HW only consider the popular class of Gaussianaffine models where it can be assumed that M yields are observedwithout error, and M is identical to the number of pricing factors,i.e., the factors become observable.

Implication: Their so-called reduced-form representation is arestricted vector autoregression that can be estimated with OLS.

Model estimation involves solving multiple equations with as manyunknowns, but presents a more manageable problem than bruteforce full joint ML estimation of all model parameters. 45 / 53

Hamilton and Wu (2012) — Canonical Model

The general Gaussian affine model in discrete time is givenby (Eq. (1), (12), (13))

Xt = c + ρXt−1 + Σut ,

Xt = cQ + ρQXt−1 + ΣuQt ,

rt = δ0 + δ1Xt .

In the case of M = 3, there are 37 free parameters.

HW use the following normalization:Σ = IM (9 restrictions);ρQ is (upper) triangular (3 restrictions);c = 0 (3 restrictions);δ ≥ 0.

With these 15 restrictions the model is canonical with 22parameters that are all econometrically identified.

46 / 53

Hamilton and Wu (2012) — HW Base Model

The starting point for HW’s analysis is the model

Xt = ρXt−1 + ut ,

with a short rate given by

rt = δ0 + δ′1Xt ,

while zero-coupon bond yields are given by

yt = B(ρQ , δ1)′Xt + A(cQ, δ0, ρ

Q, δ1),

wherecQ is an M × 1 vector;ρQ is an upper triangular M × M matrix;δ1 is an M × 1 vector;δ0 is a constant.

47 / 53

Hamilton and Wu (2012) — Reduced-Form Model (1)

Since M linear combinations of yields are priced without error, it ispossible to present the model in the following reduced-formrepresentation

(Y 1

tY 2

t

)=

(A1

A2

)+

(B1

B2

)Xt +

(0Σe

)ue

t ,

with uet ∼ i .i .d .N(0, IN−M).

No Pricing error ⇒ Xt becomes observable:

Y 1t = A1 + B1Xt ⇐⇒ Xt = B−1

1 (Y 1t − A1).

This defines an invariant affine transformation

A1 + B1Xt = A1 + B1ρB−11 B1Xt−1 + B1ut

= A1 − B1ρB−11 A1 + B1ρB−1

1 (A1 − A1 + B1Xt−1) + B1ut .

This is equivalent to

Y 1t = A∗

1 + φ∗11Y 1t−1 + u∗

1t . 48 / 53

Hamilton and Wu (2012) — Reduced-Form Model (2)

The reduced-from model is

Y 1t = A∗

1 + φ∗11Y 1

t−1 + u∗1t ,

where

A∗1 = A1 − B1ρB−1

1 A1,

φ∗11 = B1ρB−1

1 ,

u∗1t = B1ut

and u∗1t ∼ i .i .d .N(0,B1B′

1).

Point: Reduced-form model is a VAR(1) estimated with OLS.

This gives us A∗1, φ∗

11, and the covariance matrix of u∗1t :

Ω∗1 =

1T

T∑

t=1

(Y 1t − A∗

1 − φ∗11Y 1

t−1)(Y1t − A∗

1 − φ∗11Y 1

t−1)′

= B1B′1. 49 / 53

Hamilton and Wu (2012) — Reduced-Form Model (3)

The equations for the N − M yields with errors take the form

Y 2t = A∗

2 + φ∗21Y 1

t + u∗2t .

This equation can also be estimated with OLS to yield A∗2,

φ∗21, and the covariance matrix of u∗

2t

Ω∗2 =

1T

T∑

t=1

(Y 2t − A∗

2 − φ∗21Y 1

t )(Y2t − A∗

2 − φ∗21Y 1

t )′.

This was the easy part.

The remaining steps are a little more computationallyinvolved to get back to (δ0, δ1, ρ, cQ, ρQ).

50 / 53

Hamilton and Wu (2012) — B(ρQ, δ1) Parameters

HW consider the following set of equations:

φ∗21Ω∗1 = B2B−1

1 B1B′1 = B2B′

1,

Ω∗1 = B1B′

1.

The first line contains M equations, while the second line containsM(M + 1)/2 equations.

Combined they match the number of unknowns: (ρQ,δ1).

Guess (ρQ ,δ1), calculate B(ρQ, δ1), and defineπ = (vec(φ∗21Ω

∗1), vech(Ω∗

1)),g(ρQ, δ1) = (vec(B2B′

1), vech(B1B′1)).

Keep adjusting (ρQ ,δ1) to solve

(ρQ , δ1) = argminρQ ,δ(π − g(ρQ, δ1))′(π − g(ρQ, δ1)).

Note: If N −M > 1, there are more equations than unknowns - howto weight deviations? For N − M = 1, problem is exactly identified.

51 / 53

Hamilton and Wu (2012) — ρ and A(cQ , δ0, ρQ, δ1) Parameters

With (ρQ ,δ1) in hand, we can now recall that

φ∗11 = B1ρB−1

1 ⇐⇒ ρ = B−11 φ∗

11B1.

Finally, we extract the last M + 1 parameters, δ0 and cQ, from:

A∗1 = (I − B1ρB−1

1 )A1,

A∗2 = A2 − B2B−1

1 A1.

Note: If N − M > 1, there are again more equations thanunknowns - weights? For N − M = 1, problem is exactlyidentified.

Procedure: Guess (cQ,δ0), calculate A(cQ, δ0, ρQ, δ1), and

compute sum of squared differences in the equations. Stoponce minimized! This is minimum-chi-square estimation.

52 / 53

Limitations of JSZ and HW

A short list of some limitations:

JSZ and HW only apply to Gaussian affine models.

They rely on observable state variables. Hence, missingobservations or irregular data frequencies areproblematic to handle.

No easy extension to nonlinear measurement equations.

No extensions to non-Gaussian dynamics.

Fortunately, there exists an alternative approach that is ableto overcome these limitations ...

53 / 53