Nonlinear Filtering and Robust Learning

8/2/2019 Nonlinear Filtering and Robust Learning

1/26

Nonlinear Filtering and Robust Learning

Lars Peter Hansenemail:lhansen@uchicago

Nick Polsonemail:[email protected]

Thomas J. Sargent

email:[email protected]

October 20, 2010

Abstract

We refine and apply particle filtering algorithms that emphasize portions of thefiltered distribution most pertinent to a decision maker. We illustrate our algorithms inan equilibrium model with investors who design robust decision rules by exponentiallytilting probabilities of a baseline statistical model.

1 Introduction

This paper develops nonlinear filtering methods for the purpose of building models in which

decision makers simultaneously learn and choose, even though they possibly doubt theirhidden state Markov model. Particular filtering methods, for example, Kalman filtering andWonham filtering, have tractable closed-form recursive representations but apply to only alimited range of problems. That situation leads us to explore numerical alternatives thatcan potentially expand the class of tractable decision problems.

We devote particular attention to decision problems in which some of the hidden states arereally parameters and hence invariant over time. We describe, justify, and apply a modifiedversion of particle filtering designed to make parameter learning tractable. Models withparameter learning require special attention because particle filtering simulations typicallyrecurrently revisit the same parameter value. We suggest a less wasteful way to use particles.

Our approach is to introduce artificial movements in parameters into the simulations to beused for filtering. In particular, in a simulated time path associated with a particular particle,we make parameters evolve probabilistically over time, but in a way that does not distortthe densities that are the targets that we seek to approximate. Our approach partly followsthose of Whittle (1969), Storvik (2002), and Fearnhead (2002) by exploiting the existence ofsufficient statistics conditioned on data on signals and hidden states. Thus, our definition ofa particle will include the vector of hidden states as well as a vector of sufficient statistics forthe parameter of interest, conditioned on the hidden states. We differ from Storvik (2002)

Preliminary and incomplete.

1


2/26

and Fearnhead (2002) in how we make the particle evolve more efficiently. Johannes andPolson (2007) and Carvalho et al. (2010) apply this method in a variety of settings. Herewe provide extensions that allow us to explore applications in which individuals learn overtime while recognizing their doubts about their statistical specification.

We apply our methods to a model in which investors make decisions that are robust to aset of model misspecification. Such investors exponentially twist continuation values to forma likelihood ratio that they use to adjust probabilities conservatively. Hansen and Sargent(2007) provide a recursive formulation that explicitly incorporates learning by confrontingdecision makers with a hidden state Markov process in which the state can include unknownparameters. Hansen and Sargent (2010) and Hansen (2007) apply this approach, but onlyin models for which there are quasi-analytic formulas for the filtering problem. Extendingthings beyond those special models requires that the numerical filtering algorithm be adaptedto produce accurate approximations in regions of the space of parameters and other hiddenstates that most concern the decision maker. Details of the decision problem, includingwhether and how much the decision maker doubts his statistical specification, matter in

designing a good numerical approximation algorithm. Concerns about robustness inspire ouragents to explore tails of distributions, a fact that frames numerical challenges for filteringalgorithms. Thurn et al. (2002) suggest using risk functions in conjunction with particlefiltering methods in order to focus numerical accuracy on the most important componentsof uncertainty. We use a similar idea but modify it to suit the economic examples that mostinterest us.

DeJong et al. (2010) provide both a comprehensive survey of particle filtering meth-ods and develop their own refinements of such methods. Because we focus on the decisionproblems of private economic agents engaged in learning rather than the problem of econome-tricians who seek to maximize a likelihood for a given data set, we want to explore differentquestions than DeJong et al. (2010).

2


3/26

2 Hidden state Markov chain

Let Xt denote an underlying Markov state at date t and let Yt be a vector of current periodsignals. We allow a decision maker to observe a component Dt of the state vector. Wepartition the state vector:

Xt =

DtZt

.

We capture the observability of Dt through a functional relation Dt+1 = (Yt+1, Dt). Thedecision-maker does not observe Zt, nor does he observe a parameter vector that governsthe dynamic evolution of states and signals.

Let Z denote a locally compact metric space of admissible hidden states, B(Z) the Borelsigma algebra ofZ, and a measure on the measurable space of hidden states (Z,B(Z)).Let (dz|y, x , ) denote the conditional distribution ofZt+1 conditioned on Yt+1, Xt, and ,where is a parameter that resides in a measurable space (,B()). We will sometimes findit convenient to assume that the conditional probability measure is absolutely continuous

with respect to with density :

(dz|y, x , ) = (z|y, x , )(dz).

Let be a measure on (,B()); in particular, we can think of it as a prior probabilitymeasure for . While could be viewed as a time invariant hidden component of the state z,we will have cause to treat it separately. As we shall see, in designing simulation algorithmsto use in approximation, the distinction between a parameter that is invariant over timeand an evolving hidden state is important. Finally, let Y denote a locally compact metricspace of admissible signals, B(Y) the corresponding Borel sigma algebra, and a measureon the measurable space (Y,B(Y)) of signals. Let (y|x, )(dy) denote the distribution ofYt+1 conditioned on Xt and . Notice that we specify the hidden state Markov chain usinga convenient factorization of the joint distribution for Xt+1, Yt+1 conditioned on (Xt, ).If next period signals are conditionally independent of the next period state vector, then(dz|y, x , )d(z) does not depend on y.

Let Yt denote the signal history including initial conditions for the observable state vector.Thus Yt is generated by D0, Y1,...,Yt. Evidently, Dt can be constructed as a function ofYt,and Yt+1 combined with Yt generates the same sigma algebra as Yt+1. Let Xt denote thecombined history of the states and signals; Xt generates a larger sigma algebra than Yt. Theaim of filtering is to compute a sequence of probability measures:

qt(z, )(dz)(d)

for the date t hidden state, Zt, and the unknown parameter vector , conditioned on Yt.The predictive density for Yt+1 given Yt is:

t(y|Yt) =

(y|d , z , )qt(z, )(dz)(d).

3 Partial smoothing

Filtering means estimating a hidden state or parameter at time t using an information setYt observed at time t. Smoothing means estimating a hidden state or parameter at time t

3


4/26

using an information set available at a later date s > t. When feasible, it is advantageousat least partially to smooth our estimates of the states prior to simulation, where partiallymeans that the information used is for a date s satisfying t < s < +.

For example, we construct a new state vector

Xt .=

YtXt1

.

The unobservable component for the new state vector is Zt1, whereas earlier it had beenZt. The signal vector remains as it was before.

Construct the distribution of the signal conditioned on Xt:

(y|Xt, ) =

(y|Dt, z , )(dz|Yt, Xt1, )

where we integrate out Zt because it is not part of the current value of the newly constructedstate Xt. In this new construction, Zt becomes the hidden component of the next period

state vector

Xt+1. The new state evolution conditioned on the signal Yt+1 is

(dz|Yt+1, Xt, ) =(Yt+1|Dt, z , )(dz|Yt, Xt1, )

(Yt+1|Dt, z, )(dz|Yt, Xt1, )

in conjunction with the functional relation Dt = (Yt, Dt1). Since Yt+1 is both a signal anda component of the next period state, part of the evolution of the next period state giventhe next period signal is degenerate by construction.

If feasible, these look forward calculations are valuable to exploit in simulation.

4 A simulator based on sufficient statistics

Following Johannes and Polson (2007) and Carvalho et al. (2010), we explore an alternativeway to construct particles in a particle filtering algorithm that uses sufficient statistics. Itis convenient to construct these sufficient statistics by using both the hidden and observedstates. In the data generation for the actual time series, does not change over time. Instandard approaches to particle filtering, this invariance is imposed; but we now alter theimplicit time series evolution that underlies the particle filtering algorithm while preservingthe filtering outcome. In this section, we construct a new Markov process that uses a vectorof sufficient statistics as a state vector. While different from the original process, this newprocess has the same distribution of the hidden states and parameter vector conditionalon the observation history Yt.

Consider a state vector St that, given an initial condition S0, is constructed recursivelyvia:

St+1 = (, Yt+1, Zt+1, Xt, St).

This recursion will be useful in forming simulations provided that

Assumption 4.1. The distribution of conditioned on St, Zt, and the signal history Ytsatisfies

t(d|St, Zt,Yt) = (d|St).

for some prespecified .

4


5/26

Some special cases are revealing. For example,

Condition 4.2.

St =

In this case (d|St) assigns probability one to the single value . This can lead to a veryinefficient algorithm.

Alternatively, the following condition describes a situation in which there is a set ofsufficient statistics St for the distribution of , given the state vector and signal history aswell as an appropriately restricted prior. When feasible, this make the use of particle moreefficient.

Condition 4.3. There exists a vector St such that

t(d|St, Zt,Yt) = (d|St)

where S has a recursive representation

St+1 = (Yt+1, Zt+1, Xt, St)

for t = 0, 1,... for some choice of S0 such that the prior (d) = (d|S0).

Remark 4.4. Evidently, the entire state vector, including its hidden components, can beused to form sufficient statistics. The distribution t+1 can always be represented recursivelyusing a fictitious posterior for conditioned onXt, denoted t(d|Xt). In particular, supposethat (dz|y, x , ) = (z|y, x , )(dz). Then

t+1(d|Xt+1) =(Zt+1|Yt+1, Xt, )(Yt+1|Xt, ) t(d|Xt)

(Zt+1|Yt+1, Xt, )(Yt+1|Xt, ) t(d|Xt)so that t(d|Xt) could be used as St. Typically, St constructed in this way would be infinitedimensional, rendering it impractical for our purposes. What we want are finite-dimensionalsufficient statistics.

Sometimes it will be enough that we can partition the parameter vector into two setsof components and then to have sufficient statistics for one only set of components but notfor the other. It may still be possible to satisfy assumption 4.1 with a recursive updatingequation, albeit possibly one that has degeneracies in the sense that a component of St willbe the component of for which we do not use a sufficient statistic. This component of Stwill be invariant by construction.

In applications, it is often the case that we are interested in further restricting the param-eter space. Suppose that absent such restrictions, Assumption 4.1 is satisfied. Now imposean additional prior restriction on the parameter of the form where the set isprespecified. Recall the restriction:

t(d|St, Zt,Yt) = (d|St)

Form

t(d|St, Zt,Yt) =t(d|St, Zt,Yt)

t(d|St,Yt)d()

=(d|St)

(d|St)d()

= (d|St),

5


6/26

which asserts the well known result that if St is sufficient for without the restriction onthe parameter space, it remains sufficient given this restriction.

Let t be the distribution of (Zt, St) conditioned on Yt. We now describe a recursivemethod of constructing t+1 conditioned on Yt+1. Note that

t+1(dz,dz,ds,d) = (dz

|Yt+1, Dt, z , )(Yt+1|Dt, z , )t(d|z,s,Yt)t(dz,ds)(dz|Yt+1, Dt, z , )(Yt+1|Dt, z , )t(d|z,s,Yt)t(dz,ds)

(dz|Yt+1, Dt, z , )(Yt+1|Dt, z , )t(d|z,s,Yt)(dz)t(dz,ds)(Yt+1|Dt, z , )t(d|z,s,Yt)(dz)t(dz,ds)

(1)

gives the joint distribution for Zt+1, Zt, St, conditioned on Yt+1. Since St+1 is a knownfunction of Zt+1, Yt+1, Xt, St, , we can use this distribution to deduce the distribution ofZt+1, St+1 conditioned on Yt+1. To see this, consider any bounded Borel measurable functionf of (z, s). Then

E[f(Zt+1, St+1)|Yt+1] =

f[z, (, Yt+1, z, Dt, z , s)]t+1(dz

,dz,ds,d).

Because this formula applies for all such f, it implies a distribution t+1 for Zt+1, St+1conditioned on Yt+1. To obtain the joint distribution of (, Zt+1, St+1) conditioned on Yt+1,we form

t+1(d|s, z,Yt+1)t+1(dz

, ds).

For purposes of simulation, we construct a new Markov process with an expanded statevector (Xt, St). The notation is used because (Xt, St) has a different stochastic evolutionthan does (Xt, St). Under this new evolution, the parameter vector ceases to be timeinvariant, which allows particles to regenerate. This regeneration is helpful in building a

simulation algorithm that works well over time. Since the transition distribution for t+1will degenerate eventually as the parameter is learned with more and more precision as t + 1grows, the computational advantages of this randomness will eventually disappear for larget. Under the new Markov law, the state vector evolves as follows:

t+1 (|St)

Yt+1 |Xt, t+1

(2)

Zt+1 (|Yt+1, Xt, t+1) (3)

St+1 = t+1, Yt+1, Zt+1, Xt, St . (4)By using (2), we have altered the joint process for the state and signal relative to that givenin section 2; but we have left unaltered the conditional distributions that are the targets ofour simulator. We can hit these targets using the signals recorded in the data in conjunctionwith simulations based on (2).

Proposition 4.5. Under Assumption 4.1, the Markov process shares t with the originalprocess as the conditional distribution for (Zt, St) conditioned on Yt, constructed from thehistory of observations on Yt from the Markov process, provided that the same prior over and the same initialization of he sufficient statistic vector are used.

6


7/26

Proof. We prove this by induction. Supposing that the result holds for t, we establish theresult for t+1. Substituting the restriction given in Assumption 4.1 into (1) gives

t+1(dz,dz,ds,d) =

(dz|Yt+1, Dt, z , )(Yt+1|Dt, z , )(d|s)t(dz,ds)

(Yt+1|Dt, z , )(d|s)t(dz,ds) . (5)The numerator of this formula motivates our construction of the process. At date t firstgenerate t+1 conditioned on St by drawing from , next generate Yt+1 conditioned on Xtand t+1 by drawing from , and finally generate Zt+1 conditioned on Yt+1, Xt and t+1 bydrawing from . Formula (5) then conditions on Yt+1 in addition to Yt. As before, we imputet+1 from

E

f(Zt+1, St+1)|Yt+1

=

f[z, (t+1, Yt+1, z

, Dt, z , s)]t+1(dz,dz,ds,d),

which holds for any Borel measurable function f. Therefore, t+1 is the same for our

construction.

We now show how to use a simulation method to construct the particle filtering solutionfor this alternative Markov process.

Algorithm 4.6. At date t there are N particles. A particle is specified as (Z[i]t , S

[i]t ) where

[i] indexes a particle.

1. Draw [i]t+1 from

|S[i]t

.

2. Construct weights

w[i]t+1 =

Yt+1|Dt,

Z

[i]

t ,

[i]

t+1N

i

Yt+1|Dt, Z[i]t ,

[i]t+1

.Sample from the weighted empirical distribution. Draw (Z

[i]t , S

[i]t ,

[i]t+1) from a multino-

mial distribution with probability w[i]t+1.

3. Draw Z[i]t+1 from

dz|Yt+1, Dt, Z

[i]t ,

[i]t+1

.

4. ConstructS[i]t+1 =

S[i]t+1, Yt+1, Z

[i]t+1, Dt, X

[i]t ,

[i]t+1

.

5. Replace particle (Z[i]t , S[i]t ) with (Z[i]t+1, S[i]t+1).

By including the additional randomness to regenerate parameters , this algorithm avoidsproblems with standard particle filtering methods.

Under the particle approximation, the predictive density for Yt+1 is:

t(y|Yt)

1

N

Ni=1

(y|Dt, Z[i]t ,

[i]t+1)

7


8/26

where the s come from the first step of the algorithm. The resulting estimate of the statedensity is:

qt+1(z, )(d)

N

i=1w

[i]t+1

z|Yt+1, Dt, Z

[i]t ,

[i]t+1

.

The density for conditional on z has density

qt+1(z, )

qt+1(z, )(d)

1

N

Ni=1

|S[i]t+1

.

5 Utility-based Simulation

We consider utility-based adjustment for simulation. As argued by Thurn et al. (2002), theobjective of a decision problem can be useful guide on where to focus the accuracy of theapproximation. In what follows we show to use a dynamic objective to alter the evolution

in way that supports the calculation of an implied price of uncertainty.

5.1 Recursive utility

Following Kreps and Porteus (1978) and Epstein and Zin (1989) form a process of continu-ation values:

Vt =

(Ct)1 + exp() [Rt(Vt+1)]

1 11

Rt(Vt+1) =

E

(Vt+1)

1 |Xt

1

1 (6)

where > 0, and > 0. For now we do calculation under the more complete informationset to motivate a change of probability measure. The parameter is a scale factor for thecontinuation value, and we will have cause to adjust this parameter in studying limitingcases. In the special case in which = 1, the recursion as given is not well defined. It turnsout by an appropriate scaling, we can represent preferences in this special case by using theCobb-Douglas form:1

Vt = (Ct)1exp() [Rt(Vt+1)]

exp(). (7)

Notice that recursion (6) maps next-periods continuation Vt+1 and the current-period Ct intothe current period continuation value Vt. Next we exploit homogeneity and write

VtCt

1= ()1 + exp()

Rt

Vt+1Ct+1

Ct+1

Ct

1where we first divide by Ct and then we raise both sides to the power 1. Finally we write

exp

(1 )

1

Vt

Ct

1 ()1

11

= E

Vt+1

Ct+1

1 Ct+1

Ct

1|Xt

. (8)

1Represent [1 exp()]()1 = 1 and take limits as tends to unity. This results in the Cobb-Douglas recursion with replacing .

8


9/26

There are some noteworthy special cases. First suppose that = . In this case weobtain the expected utility recursion. Next suppose that ()1 converges to zero and that

exp(1)1

converges to exp() defined to be the solution to the equation:

exp()

Vt

Ct

1

= E

Vt+1

Ct+1

1

Ct+1

Ct

1

|Xt

. (9)

This choice of by design makes the future as important as possible and thus gives aninteresting limiting case.

In what follows we construct solutions for VtCt for a given specification of the consumptiondynamics. We suppose that log Ct+1log Ct is among the components ofYt+1 and we supposeinitially that parameters and states are observed by the decision maker. This leads us tosearch for a solution of the form:

log Vt log Ct = (Xt, ).

The equations (8) and its limiting counterpart (9) now become fixed point equations for. In the case of (9), let

e(x, ) = exp [(1 )(x, )] ,

and rewrite equation (9) as

exp()e(x, ) = E

e(Xt+1, )

Ct+1

Ct

1|Xt = x,

which is an eigenvalue equation with a positive eigenfunction e and an eigenvalue exp()

where depends on the state and consumption dynamics including the parameter vector .

5.2 A convenient change of measure

We follow Hansen and Scheinkman (2009) by using this positive eigenfunction to build achange in probability measure that preserves the Markov structure.2 Notice in particularthat (9) implies

exp()E

e(Xt+1, )

e(Xt, )exp [(1 )(log Ct+1 log Ct)] |Xt = x,

= 1.

The positive random variable

Mt+1

Mt= exp()

e(Xt+1, )

e(Xt, )exp [(1 )(log Ct+1 log Ct)]

is the Radon-Nikodym derivative for change in the transition density of the Markov pro-cess. Moreover, once we initialize M0 at say unity, we may build a so-called multiplicativemartingale conditioned on the parameter vector .

2An analogous change of measure is used in the Markov theory for large deviations.

9


10/26

Associated with a multiplicative martingale is a change of probability measure, as is wellknown from the applied probability literature. Moreover, this change of measure preservesthe Markov structure. Under the change of measure, the expectation conditioned on datezero information including the unknown parameter is:

E(Rt|X0, ) = E

Mt

M0Rt|X0,

for a random variable Rt that is Xt measurable.The implied distortion for the filtering distribution is what is important for us. Let qt+1

denote the density for Zt+1 and conditioned on Yt+1 under this change of measure. Ofcourse this density will depend on our initial distributions.

Proposition 5.1. If the joint distorted prior q0 for Z0, satisfies:

q0(z, )(dz)(d) exp()e(D0, z , )q0(z, )(dz)(d),

for some , thenexp(t)qt(z, ) exp(t)e(Dt, z , )qt(z, )

for all t 0 where depends on and the constant of proportionality depends only on Yt.

Proof. Recall that

qt+1(z, )

(z|Yt+1, Dt, z , ) (Yt+1|Dt, z , ) qt(z, )(dz)

where the proportionality constant can depend on the conditioning vector Yt+1. Thus

exp[(t + 1 + )]e(Dt+1, z, )qt+1(z

, ) exp()

e(Dt+1, z

, )

e(Dt, z , )exp [(1 )(log Ct+1 log Ct)

(z|Yt+1, Dt, z) (Yt+1|Dt, z)

e(Dt, z , )exp[(t + )]qt(z, )(dz)

given thatexp [(1 )(log Ct+1 log Ct)]

is in the conditioning information set Yt+1. Thus

qt+1(z, ) exp[(t + 1)]e(Dt+1, z

, )qt+1(z, ),

provided thatqt(z, ) exp(t)e(Dt, z , )qt(z

, ).

The conclusion follows by induction since the date zero proportionality holds by assumption.

This result shows how the change of measure that alters the dynamic evolution affects theimplied stationary distribution. This link will prove valuable in designing filtering algorithmsthat adapt to the concerns that investors might have for model specification.

10


11/26

5.3 Robustness and risk sensitivity

Since the work of Jacobson (1973) and Whittle (1990), there is a well known connectionbetween risk sensitivity, robustness and large-deviation theory. Hansen and Sargent (1995),Maenhout (2004) and Hansen et al. (2006) show how to adapt the Jacobson and Whittle

formulation to recursive utility. The risk aversion parameter 1 is related to an entropypenalization parameter used to discipline a concern about model misspecification via theformula:

= 1

1 ,

which is positive provided that > 1. When states or parameters are unknown, robustvaluation and pricing lead to the use of the robust-adjusted density:

qt(z, ) exp()e(Dt, z , )qt(z, ),

In the case of known parameters, the stationary density that emerges as a result of the thechange of measure of produces the correct scaling. In other words

qt(z, ) qt(z, ),

however when parameters are unknown we must include an eigenfunction adjustment:

qt(z, ) exp(t)qt(z, )

because exp(t) depends on unknown parameters. In what follows we will use a recursiveapproach for this adjustment. Given a date t a numerical approximation for qt that includesthe exp(t) adjustment, we will simulate using the evolution and then make an exp()adjustment to construct qt+1. This repeated adjustment implements the implicit changein the reference model under robust learning implicit in Basar and Bernhard (1989) anddelineated in Hansen and Sargent (2005).

6 Examples

We consider robust utility-based adjustments to filtering methods applied to four exampleeconomies.

6.1 Unknown mean growth rate

This example gives an illustration of robust Kalman filtering.

Yt+1 = + Zt + GWt+1

Zt+1 = AZt + BWt+1

where X is a scalar hidden state and an unknown parameter but G, A and B are known.This can be viewed as a state space system when written as:

Yt+1 = + Zt + GWt+1Zt+1

=

A 00 1

Zt

+

B

0

Wt+1. (10)

11


12/26

In this system Yt+1 is the growth rate of consumption. Kalman filtering gives recursiveestimators of Zt and . Call these Zt and t and their joint covariance matrix:

t tt t .

To construct a change of measure, we use the eigenfunction:

e(z) = exp

z

(1 A)

.

which is independent of . The eigenvalue associated with this eigenfunction is

exp() exp

1

where the proportionality constant does not depend on . Then the shock Wt+1 has distorted

dynamics with precision: I and mean conditioned on Xt and :

1

(G + B)

where

=1

1 A.

This leads us to the distorted dynamics:

Yt+1 = + Zt 1

G (G + B) + GWt+1

Zt+1 = AZt 1

B (G + B) + BWt+1

where {Wt+1} is multivariate sequence of independent, standard normally distributed ran-dom vectors.

Applying the Kalman filter gives estimates for t and Zt with the same covariance matrixas from the original Kalman filter. The recursive updating equations are:

Zt+1 =AZt 1

B (G + B)

+

t + t + BG

t + 2t + t + GG

Yt+1 Zt t + 1 G (G + B)

t+1 =t

+

t + t

t + 2t + t + GG

Yt+1 Zt t +

1

G (G + B)

(11)

This distorted system will produce Zt sequences that are systematically negative in contrastto the original Kalman filter. On the other hand, t will be distorted in a positive direction.An eigenvalue adjustment will offset this latter distortion.

12


13/26

We now consider a recursive specification of the eigenfunction adjustment. Let Zt andt be the adjusted conditional estimates of Zt and t With joint covariance matrix:

t tt t .

These estimates are computed recursively from the distorted dynamics except that at eachdate we make an eigenvalue adjustment:

Zt+1 = 1

t+1 + AZt

1

B (G + B)

+

t + t + BG

t + 2t + t + GG

Yt+1 Zt t +

1

G (G + B)

t+1 = 1

t+1 + t

+

t + tt + 2t + t + GG

Yt+1 Zt t + 1

G (G + B)

The inclusion of the terms

1t+1

1t+1

relative to recursion (11) are outcome of the eigenvalue adjustment. This is recognized as aspecial case of the robust counterpart to the Kalman filter.

To initialize the Kalman filter, suppose that the prior for and Z0 are independentimplying that 0 = 0. We restrict Z0 to be in the stationary distribution implying that

Z0 = 0

0 =|B|2

1 A2

Under the change of measure,

Z0 = |B|2

(1 A2)(1A)

0 = 0

0

We illustrate the outcome of the Kalman filter, in large part to motivate what follows.In this illustration we assume that B =

0 b

where b = .0003, G =

g 0

where g = .005,

A = .978. We initialize the prior for by setting:

0 = .005

0 = .0022.

The parameter .23 and = .1. The Kalman filter iterations for the covariance matrixconverge. We report the original conditional distribution for the hidden state Zt and its

13


14/26

distorted counterpart in Figure 1. The concern for robustness shifts the state distributionto the left by reducing its mean conditioned on the history of consumption signals.

10 5 0 5

x 103

0

50

100

150

200

250

300

Figure 1: Conditional density for the hidden state. Thick line is the density without therobustness adjustment and the thin line adjusts for robustness.

We report the distortion in the conditional mean for consumption scaled by the condi-tional standard deviation g in Figure 2. When there is a unitary elasticity of intertemporalsubstitution, this can be interpreted as a price of uncertainty. See Hansen and Sargent(2010).

14


15/26

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 2: The time series for the one-period price of uncertainty with = 23.

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 3: The time series for the one-period price of uncertainty with = 0.

For the Kalman filtering model this distortion varies smoothly with respect to calendartime. There is an initial phase of active learning after when distortion becomes flat overtime. Lowering diminishes the initial uncertainty. This is reflected in Figure 3 whichpresumes that = 0:

15


16/26

6.2 Conditioning on A and G

We now make the estimation a bit more ambitious by letting the shock coefficient B =

b 0

be estimated by the decision maker. As a consequence, we can no longer apply the Kalmanfilter, but instead we rely on sufficient statistics in conjunction with the particle filter to

compute uncertainty prices numerically. We suppose that A and G =

0 g

are given.The eigenvalue formula of interest is:

exp() exp

1

+

1

2

1

(1A)

2b2

.

We impose the priors

Z0 N

0,

b2

1 A2

N(0, 0)which are assumed to be independent conditioned on b. We assume the marginal prior onb2 to be a generalized inverse Gaussian distribution with density of the form:

exp

1

1

b2 2b

2

(b2)3.

Following Geweke (1993), we include a term in the exponential of both 1b2

and b2. As we willsee the latter contribution has further motivation as an adjustment for robustness.3

Next we consider the distorted priors conditioned on b:

Z0 N b

2

(1 A2)(1 A), b

2

1 A2

N

0

0, 0

and the marginal prior on b2

exp

1

1

b2

2b

2

2

1

(1 A)

2b2

(b2)3.

For the purposes of building sufficient statistics we treat the hidden state as data, andwe use two contributions to the likelihood

exp

1

g2

t1u=0

(Yu+1 Zu) t

2g22

exp

1

2b2

t1u=0

(Zu+1 AZu)2

(b2)t/2

3Geweke (1993) does not consider models with hidden states, and the estimation problem he explores isthus distinct from ours.

16


17/26

This leads us to construct the sufficient statistics:

S[1]t+1 =

tu=0

1

g2(Yu+1 Zu) + S

[1]0 =

1

g2(Yt+1 Zt) + S

[1]t

S[2]t+1 = (t + 1)

1

g2 + S[2]0 =

1

g2 + S[2]t

S[3]t+1 =

tu=0

1

2(Zu+1 AZu)

2 + S[3]0 =

1

2(Zt+1 AZt)

2 + S[3]t

S[4]t+1 = t + 1 + S

[4]0 = 1 + S

[4]t .

where4

S[1]0 =

0

0, S

[2]0 =

1

0, S

[3]0 = 1, S

[4]0 = 23.

Then the date t posteriors are:

N

S[1]t

S[2]t

, 1S[2]t

b2 exp

S[3]t

1

b2

2(b)

2

(b2)S

[4]t

/2.

When we use the distorted law of motion we use the likelihood contributions:

exp

1

2g2

t1u=0

Yu+1 Zu +

g2

t

g22

exp 12b

2

t1

u=0

Zu+1 AZu + b2

(1A)

2

(b2)t/2Notice that

exp

1

2b2

t1u=0

Zu+1 AZu +

b2

(1 A)

2(b2)t/2

exp

1

2b2

t1u=0

(Zu+1 AZu)2 b2

t

2

1

(1 A)

2(b2)t/2.

We now use the sufficient statistics

S

[1]

t+1 =

1

g2

Yt+1 Zt +

g2

+S

[1]

t

S[2]t+1 =

1

g2+ S

[2]t

S[3]t+1 =

1

2(Zt+1 AZt)

2 + S[3]t

S[4]t+1 = 1 + S

[4]t .

4It is sometimes more convenient to use S[1]t+1/S

[2]t+1 in place ofS

[1]t+1 and S

[3]t+1/S

[4]t+1 in place ofS

[3]t+1 as

sufficient statistics.

17


18/26

where

S[1]0 =

0

0

, S

[2]0 = S

[2]0 =

1

0, S

[3]0 = S

[3]0 = 1, S

[4]0 = S

[4]0 = 23.

Notice that S[j]t = S

[j]t for j = 2, 3, 4.

Then the date t posteriors are:

N

S[1]t

S[2]t

,1

S[2]t

b2 exp

S

[3]t

1

b2

2b

2

t +

2

1

(1 A)

2b2

(b2)S

[4]t

/2.

The distorted law of motion, however, is not what we use in our analysis. Instead wenow must make an eigenvalue adjustment. This leads us to the evolution. Suppose thedate t density qt has been computed and that there is an associated vector of sufficientstatistics S

[j]

tfor j = 1, 2, 3, 4. We use exp() to alter the evolution the one-period evolution

and hence the construction of the sufficient statistics.We construct

S[1]t+1 = S

[1]t

1

+

1

g2

Yt+1 Zt +

g2

= S

[1]t +

1

g2(Yt+1 Zt)

S[2]t+1 = S

[2]t+1 = S

[2]t+1

S[3]t+1 = S

[3]t+1 = S

[3]t+1

S[4]t+1 = S

[4]t+1 = S

[4]t+1.

The sufficient statistic S[1]t+1 gets updated just as S

[1]t+1 but starts from a different initial

condition. Then the date t + 1 distributions for the parameters and hidden states are:

Zt+1 N

AZt

b2, b2|b2

N

S[1]t+1

S[2]t+1

,1

S[2]t+1

b2 exp

S

[3]t+1

1

b2

2b

2

2

1(1 A)

2b2

(b2)S

[4]t+1/2

where the first distribution is conditioned on b2.When we use partial smoothing we must condition on more Y observation than Z ob-

servation. For this example, we just use a later date for the sufficient statistics based on Ythan Z. Thus we use the likelihood:

exp

1

g2

t1u=0

(Yu+1 Zu) t

2g22

exp

1

2b2

t2u=0

(Zu+1 AZu)2

(b2)(1t)/2,

and we combine S[1]t+1 and S

[2]t+1 in conjunction with S

[3]t and S

[4]t , and similarly for the

counterparts.

18


19/26

We report the one-period uncertainty prices in Figure 4. While there continues to be aninitial phase in which parameter learning is prominent, the time series of uncertainty pricesafter this phase has some prominent peaks associated with recessions. Again a reduction in diminishes the impact of the learning phase as is evident in Figure 5. In this latter case

we see a notable increase in the uncertainty prices in recent time periods.

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 4: The time series for the one-period price of uncertainty when investors are uncertainabout , B and the hidden state process Z. For this specification = 23.

19


20/26

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 5: The time series for the one-period price of uncertainty when investors are uncertainabout , B and the hidden state process Z. For this specification = 0.

6.3 Conditioning only on G

Next we include the estimation of the parameter A in the analysis. For conceptual simplicity

suppose that we have a discrete distribution over alternative values of A. To infer thedistortion for this distribution, notice that scale factor for the generalized inverse Gaussianprior for b2 depends on the parameter A, where we now view this density as a conditionaldensity. The scale factor for the generalized inverse Gaussian is well known.5 This scaledependence is offset when constructing the distorted prior distribution for A. We considera uniform distribution for A defined on in interval [.95, .99]. Instead of using a discrete gridof points we constructed a truncated normal density for A that was essentially flat.

Robust learning induces the economic agents to distort priors towards large values of Aand this is reflected in the upper two plots of figure 6. By the end of the sample the distorteddistribution is largely concentrated at the endpoint.

5It depends on a modified Bessel function of the second kind.

20


21/26

0.94 0.96 0.98 10

0.005

0.01

0.015

0.02

0.025

0.03

0.94 0.96 0.98 10

0.01

0.02

0.03

0.04

0.05

0.94 0.96 0.98 10

0.1

0.2

0.3

0.4

0.94 0.96 0.98 10

0.2

0.4

0.6

0.8

Figure 6: Histograms for the distribution of A. The first column gives the prior and slanted

prior at the initial date, and the second column gives the end of sample posterior and slantedposterior.

The time series for the price of uncertainty is reported in figure 9.

21


22/26

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 7: The time series for the one-period price of uncertainty when investors are uncertain

about , A, B and the hidden state process Z. For this specification = 23.

We next consider sensitivity to the choice of the prior value of . As an alternative weset = 0 Prior and posterior histograms for the autoregressive are given in figure 8. Thedistorted distribution collapses more slowly to the upper bound of .99.

22


23/26

0.94 0.96 0.98 10

0.005

0.01

0.015

0.02

0.025

0.03

0.94 0.96 0.98 10

0.01

0.02

0.03

0.04

0.05

0.94 0.96 0.98 10

0.05

0.1

0.15

0.2

0.94 0.96 0.98 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 8: Histograms for the distribution of A. The first column gives the prior and slanted

prior at the initial date, and the second column gives the end of sample posterior and slantedposterior.The time series for the one-period price of uncertainty when investors are uncertainabout , A, B and the hidden state process Z. For this specification = 0.

The important difference in the uncertainty prices is that the initial prices are now moremodest than in our earlier calculations.

23


24/26

1950 1960 1970 1980 1990 2000 20100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Year

Figure 9: The time series for the one-period price of uncertainty when investors are uncertain

about , A, B and the hidden state process Z. For this specification = 0.

6.4 Stochastic volatility

To be added.

24


25/26

References

Basar, Tamar S. and Pierre Bernhard, eds. 1989. Differential Games and Applications (Lec-ture Notes in Control and Information Sciences. Springer.

Carvalho, C., M. Johannes, H. Lopes, and N. Polson. 2010. Particle Smoothing and Statis-tical Science. Statistical Science 25.

DeJong, David N., Hariharan Dharmarajan, Roman Liesenfeld, Guilherme V. Moura, andJean-Froncois Richard. 2010. Efficient Filtering Likelihood Evaluation of State-SpaceRepresentations. University of Pittsburgh and Univeristat Kiel.

Epstein, L. and S. Zin. 1989. Substitution, Risk Aversion and the Temporal Behavior ofConsumption and Asset Returns: A Theoretical Framework. Econometrica 57 (4):937969.

Fearnhead, P. 2002. MCMC, Sufficient Statistics and Particle Filter. Journal of Computua-tional and Graphical Statistics 11:848862.

Geweke, John. 1993. A note on some limitations of CRRA utility. Economic Letters 71:341345.

Hansen, Lars Peter. 2007. Beliefs, Doubts and Learning: Valuing Macroeconomic Risk.American Economic Review 97 (2):130.

Hansen, Lars Peter and Thomas J. Sargent. 1995. Discounted Linear Exponential QuadraticGaussian Control. IEEE Transactions on Automatic Control 40.

. 2005. Robust estimation and control under commitment. Journal of EconomicTheory 124 (2):258301.

. 2007. Recursive robust estimation and control without commitment. Journal ofEconomic Theory 136 (1):127.

. 2010. Fragile Beliefs and the Price of Uncertainty. Quantitative Economics 1 (1):129162.

Hansen, Lars Peter and Jose Alexandre Scheinkman. 2009. Long-term Risk: An OperatorApproach. Econometrica 77 (1):177234.

Hansen, Lars Peter, Thomas J. Sargent, Guahar A. Turmuhambetova, and Noah Williams.2006. Robust Control and Model Misspecification. Journal of Economic Theory 128:4590.

Jacobson, David H. 1973. Optimal Stochastic Linear Systems with Exponential PerformanceCriteria and Their Relation to Deterministic Differential Games. IEEE Transactions forAutomatic Control AC-18:1124131.

Johannes, Michael S. and Nick Polson. 2007. Exact particle Filtering and Parameter Learn-ing. University of Chicago and Columbia University.

25


26/26

Kreps, David M. and Evon L. Porteus. 1978. Temporal Resolution of Uncertainty andDynamic Choice. Econometrica 46 (1):185200.

Maenhout, Pascal J. 2004. Robust Portfolio Rules and Asset Pricing. Review of FinancialStudies 17 (4):951983.

Storvik, G. 2002. Particle Filters in State Space Models with the Presence of UnknownStatic Parameters. IEEE Trans. on Signal Processing 50:281289.

Thurn, Sebastian, John Langford, and Vandi Verma. 2002. Risk Sensitive Particle Filters. InAdvances in Neural Information Processing Systems 14, vol. 2, edited by Thomas Diettrich,Suzanna Becker, and Zoubin Ghahramani. Cambridge, MA: MIT Press.

Whittle, P. 1969. A View of Stochastic Control Theory. Journal of the Royal StatisticalSociety. Series A (General) 132 (3):320334.

Whittle, Peter. 1990. Risk Sensitive and Optimal Control. West Suffix, England: John Wileyand Sons.

Nonlinear Filtering and Robust Learning

Documents

Transcript of Nonlinear Filtering and Robust Learning