Particle Filters and Bayesian Inference in Financial...

42
Particle Filters and Bayesian Inference in Financial Econometrics HEDIBERT F. LOPES* AND RUEY S. TSAY University of Chicago Booth School of Business, Chicago, IL, USA ABSTRACT In this paper we review sequential Monte Carlo (SMC) methods, or particle filters (PF), with special emphasis on its potential applications in financial time series analysis and econometrics. We start with the well-known normal dynamic linear model, also known as the normal linear state space model, for which sequential state learning is available in closed form via standard Kalman filter and Kalman smoother recursions. Particle filters are then introduced as a set of Monte Carlo schemes that enable Kalman-type recursions when normality or linearity or both are abandoned. The seminal bootstrap filter (BF) of Gordon, Salmond and Smith (1993) is used to introduce the SMC jargon, potentials and limitations. We also review the literature on parameter learning, an area that started to attract much attention from the particle filter community in recent years. We give particular attention to the Liu–West filter (2001), Storvik filter (2002) and particle learning (PL) of Carvalho, Johannes, Lopes and Polson (2010). We argue that the BF and the auxiliary particle filter (APF) of Pitt and Shephard (1999) define two fundamentally distinct directions within the particle filter lit- erature. We also show that the distinction is more pronounced with parameter learning and argue that PL, which follows the APF direction, is an attractive extension. One of our contributions is to sort out the research from BF to APF (during the 1990s), from APF to now (the 2000s) and from Liu–West filter to Storvik filter to PL. To this end, we provide code in R for all the examples of the paper. Readers are invited to find their own way into this dynamic and active research arena. Copyright © 2010 John Wiley & Sons, Ltd. key words particle learning; sequential Monte Carlo; Markov chain Monte Carlo; stochastic volatility; realized volatility; Nelson–Siegel model INTRODUCTION The Kalman filter (KF) and its many variants and generalizations have played a fundamental role in modern time series analysis by allowing the study and estimation of complex dynamics and by drawing the attention of researchers and practitioners to the rich class of state-space models, also known as dynamic models (Harvey, 1989; West and Harrison, 1997). Well-known and widely used Journal of Forecasting J. Forecast. 30, 168–209 (2011) Published online 30 July 2010 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/for.1195 Copyright © 2010 John Wiley & Sons, Ltd. * Correspondence to: Hedibert F. Lopes, University of Chicago Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA. E-mail: [email protected]

Transcript of Particle Filters and Bayesian Inference in Financial...

  • Particle Filters and Bayesian Inference in Financial Econometrics

    HEDIBERT F. LOPES* AND RUEY S. TSAYUniversity of Chicago Booth School of Business, Chicago, IL, USA

    ABSTRACTIn this paper we review sequential Monte Carlo (SMC) methods, or particle fi lters (PF), with special emphasis on its potential applications in fi nancial time series analysis and econometrics. We start with the well-known normal dynamic linear model, also known as the normal linear state space model, for which sequential state learning is available in closed form via standard Kalman fi lter and Kalman smoother recursions. Particle fi lters are then introduced as a set of Monte Carlo schemes that enable Kalman-type recursions when normality or linearity or both are abandoned. The seminal bootstrap fi lter (BF) of Gordon, Salmond and Smith (1993) is used to introduce the SMC jargon, potentials and limitations. We also review the literature on parameter learning, an area that started to attract much attention from the particle fi lter community in recent years. We give particular attention to the Liu–West fi lter (2001), Storvik fi lter (2002) and particle learning (PL) of Carvalho, Johannes, Lopes and Polson (2010). We argue that the BF and the auxiliary particle fi lter (APF) of Pitt and Shephard (1999) defi ne two fundamentally distinct directions within the particle fi lter lit-erature. We also show that the distinction is more pronounced with parameter learning and argue that PL, which follows the APF direction, is an attractive extension. One of our contributions is to sort out the research from BF to APF (during the 1990s), from APF to now (the 2000s) and from Liu–West fi lter to Storvik fi lter to PL. To this end, we provide code in R for all the examples of the paper. Readers are invited to fi nd their own way into this dynamic and active research arena. Copyright © 2010 John Wiley & Sons, Ltd.

    key words particle learning; sequential Monte Carlo; Markov chain Monte Carlo; stochastic volatility; realized volatility; Nelson–Siegel model

    INTRODUCTION

    The Kalman fi lter (KF) and its many variants and generalizations have played a fundamental role in modern time series analysis by allowing the study and estimation of complex dynamics and by drawing the attention of researchers and practitioners to the rich class of state-space models, also known as dynamic models (Harvey, 1989; West and Harrison, 1997). Well-known and widely used

    Journal of ForecastingJ. Forecast. 30, 168–209 (2011)Published online 30 July 2010 in Wiley Online Library(wileyonlinelibrary.com) DOI: 10.1002/for.1195

    Copyright © 2010 John Wiley & Sons, Ltd.

    * Correspondence to: Hedibert F. Lopes, University of Chicago Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA. E-mail: [email protected]

  • Particle Filters and Bayesian Inference 169

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    variants of KF include (i) the extended KF (Jazwinski, 1970; West et al., 1985); (ii) the Gaussian sum fi lter (Alspach and Sorenson, 1972); (iii) the unscented KF (Julier and Uhlmann, 1997; Van der Merwe et al., 2000); and (iv) the Gaussian quadrature KF (Ito and Xiong, 2000).

    Despite their wide applicability, approximations provided by these variants become less effective when substantial nonlinearities and/or extreme non-Gaussianity are present in the data. To overcome the diffi culty, the last two decades have been exposed to an increasing number of Monte Carlo (MC)-based approximations for state-space models. These MC methods are basically divided into two major categories: Markov chain Monte Carlo (MCMC) schemes for offl ine/batch sampling and sequential Monte Carlo (SMC) schemes for online/sequential sampling. For example, Carlin et al. (1992), Carter and Kohn (1994), Frühwirth-Schnatter (1994) and Shephard (1994) propose MCMC methods to esti-mate general state-space models. However, MCMC-based algorithms are prohibitively costly when performing online estimation of states and parameters; see Gamerman and Lopes (2006).

    SMC methods, also known as particle fi lters, are MC schemes that, when used in the state-space context, rebalance draws from the posterior distribution of the states and parameters at a given time (the particles) based on the next observation via its likelihood. In their seminal paper, Gordon et al. (1993) propose one of the most popular fi lters, the bootstrap fi lter (BF), which is based on a sampling importance resampling (SIR) argument (Smith and Gelfand, 1992). Also infl uential from the start are the works on sequential Bayesian imputation by Kong et al. (1994) and Liu and Chen (1995).

    In this paper we review the bootstrap fi lter and its variants. We also introduce the auxiliary particle fi lter (APF) of Pitt and Shephard (1999) (see also the discussion in Liu and Chen, 1998) and argue that both fi lters defi ne two directions within the SMC literature, namely sample–resample and resa-mple–sample methods. This is done in the third section, which ends with a list of additional review papers and books on SMC. The fourth section starts by showing how both BF and APF can be used to approximate the likelihood function of fi xed parameters. We then introduce the Liu and West fi lter (Liu and West, 2001), which generalizes APF to sequentially update the posterior distributions of parameters. The section also introduces the particle learning (PL) of Carvalho et al. (2010). Some illustrative examples appear in the fi fth section. Final remarks and current research directions are presented in the sixth section.

    NORMAL DYNAMIC LINEAR MODEL

    To introduce the ideas of the particle fi lter, let us start with the well-known normal dynamic linear model (NDLM):

    y F x vt t t t= ′ + (1)

    x G x wt t t t= +−1 (2)

    where vt and wt are temporally and mutually independent Gaussian sequences with zero mean and variances σ 2t and τ 2t , respectively. Equation (1) is referred to as the observation equation that relates the observed series yt to the state vector xt. Equation (2) is the state transition equation that governs the time evolution of the state, which might be latent. The local level and the local linear trend models are special cases of the NDLM. In the local level model, yt = xt + vt and xt = xt−1 + wt, so that Ft = Gt = 1, σ 2t = σ2 and τ 2t = τ2 for all t. In the local linear trend model, yt = x1t + vt, x1t = x1,t−1 + x2,t−1 + w1t and x2t = x2,t−1 + w2t, and we have xt = (x1t, x2t)′, Ft = (1, 0)′, G = (g1, g2), g1 = (1, 0)′, g2 = (1, 1)′, σ 2t = σ2 and τ 2t = τ for all t, where τ is a 2 × 2 positive defi nite matrix.

  • 170 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Conditionally on the quadruple {Ft, Gt, σ 2t, τ 2t}, for t = 1, . . . , T, and on the initial distribution (x0|y0) ~ N (m0, C0), it is straightforward to show that

    x y N a Rt t t t− ( )1 ∼ , (3)

    y y N f Qt t t t− ( )1 ∼ , (4)

    x y N m Ct t t t∼ ,( ) (5)

    for t = 1, . . . , T, where yt = (y1, . . . , yt)′ and N(a, b) denotes the normal distribution with mean a and variance b. The three densities in equations (3)–(5) are referred to as the propagation density, predictive density and fi ltering density, respectively. In fact, the propagation and fi ltering densities are the prior density of xt given yt−1 and the posterior density of xt given yt. The means and variances of the three densities are provided by the Kalman recursions:

    a G m R G C Gt t t t t t t t= = ′ +− −1 1 2and τ (6)

    f F a Q F R Ft t t t t t t t= ′ = ′ +and σ 2 (7)

    m a A e C R A Q At t t t t t t t t= + = − ′and (8)

    where et = yt − ft is the prediction error and At = RtFtQt−1 is the Kalman gain. Two other useful densi-ties are the conditional and marginal smoothed densities:

    x x y N h Ht t t t t+ ( )1, ,∼ (9)

    x y N m Ct T tT tT∼ ,( ) (10)

    where

    h m B x a H C B R Bt t t t t t t t t t= + −( ) = − ′+ + +1 1 1and (11)

    m m B m a C C B R CtT t t tT t tT t t t tT= + −( ) = − −( )+ + + +1 1 2 1 1and (12)

    and Bt = CtG′t+1R −1t+1. See West and Harrison (1997, Ch. 4) for additional details.An important and rich subclass of the NDLM assumes that Ft and Gt are both known, while σ 2t =

    σ2 and τ 2t = τ2 are both unknown variances. In this case, the above Kalman recursions can be used to marginalize out the states based on equation (4), i.e.,

    p y r f y f QT t t tt

    T

    σ 2 21

    , ; ,( ) = ( )=

    ∏ (13)

    where f(x; μ, σ2) is the density of a normal random variable with mean μ and variance σ2 evaluated at x. Note that here ft and Qt are both nonlinear functions of (σ2, τ2). In other words, should the main objective be sampling from p(xT, σ2, τ2|yT), then draws can be obtained in two steps:

    1. Draw (σ2, τ2) from p(σ2, τ2|yT), which is proportional to the prior p(σ2, τ2) times the likelihood from equation (12).

  • Particle Filters and Bayesian Inference 171

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    2. Draw xT from p(xT|σ2, τ2) by fi rst computing forward moments via equations (6)–(8) and (11), and then sampling backward xt conditional on xt+1 and yt via equation (9).

    Sampling (σ2, τ2) from step 1 can be performed by SIR, acceptance–rejection or Metropolis–Hastings-type algorithms or replaced by a Gibbs step that draws (σ2, τ2) conditional on (yT, xT). Reis et al. (2006) compare the performance of these sampling and other MCMC schemes in the context of the local level model. Step 2 is known as the forward fi ltering, backward sampling (FFBS) algo-rithm (Carter and Kohn, 1994; Frühwirth-Schnatter, 1994).

    For the NLDM, all densities needed for making inference are well known and they can easily be carried out in applications. On the other hand, diffi culties arise when the model is nonlinear or non-Gaussian, because no closed-form densities are available. As we discuss below, particle fi lters provide an effective approach to overcoming the diffi culty.

    BASIC PARTICLE FILTERS

    Let us consider a more general dynamic model where the assumptions of normality and/or linearity are relaxed. The observation and state transition equations become

    y x p y xt t t t∼ ( ),

    x x p x x tt t t t− −( ) =1 1 1 2∼ , , , . . .

    Denote the initial probability density of the state by p(x0). All static parameters, such as σ2 and τ2 from the previous section, are assumed to be known throughout this section. Batch and sequential parameter learning are deferred to the next section. The Kalman recursions from equations (3) and (5) are now replaced, respectively, by

    p x y p x x p x y xt t t t t t t− − − − −( ) = ( ) ( )∫1 1 1 1 1d (14)and

    p x yp y x p x y

    p y yt

    t t t tt

    tt

    ( ) = ( ) ( )( )−

    1

    1 (15)

    In most real-world applications, outside the realm of NDLM, the integration with respect to xt−1 in (14) and the implementation of Bayes’ theorem in (15) are both analytically intractable and/or computationally costly. As mentioned in the Introduction, there exist approximations for sequential state estimation and fi ltering based on Kalman-like fi lters, such as the extended Kalman fi lter and the unscented Kalman fi lter. Also, as discussed in the Introduction, there exist several MCMC-type samplers for batch estimation of the whole state vector and parameters similar to the FFBS intro-duced in the previous section.

    Particle fi lters, loosely speaking, combine the sequential estimation nature of Kalman-like fi lters with the fl exibility for modeling of MCMC samplers, while avoiding some of the their shortcomings. On the one hand, like MCMC samplers and unlike Kalman-like fi lters, particle fi lters are designed to allow for more fl exible observational and evolutional dynamics and distributions. On the other

  • 172 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    hand, like Kalman-like fi lters and unlike MCMC samplers, particle fi lters provide online fi ltering and smoothing distributions of states and parameters.

    The goal of most particle fi lters is to draw a set of i.i.d. particles {xt(i)}Ni=1 that approximates p(xt|yt) by starting with a set of i.i.d. particles {x (i)t−1}Ni=1 that approximates p(xt−1|yt−1).

    The most popular fi lters are the bootstrap fi lter (BF), also known as the sequential importance sam-pling with resampling (SISR) fi lter, proposed by Gordon et al. (1993), and the auxiliary particle fi lter (APF), also known as the auxiliary SIR (ASIR) fi lter, proposed by Pitt and Shephard (1999). However, it is worth mentioning that one of the earliest sequential Monte Carlo algorithms was proposed by West (1992). For recent discussion regarding the similarities and differences between BF and APF see, for instance, Carvalho et al. (2010), Doucet and Johansen (2008), and Douc et al. (2009b).

    Propagate–resample fi ltersThe BF of Gordon et al. (1993) is based on sequential SIR steps over time (Smith and Gelfand, 1992). The Kalman recursions from (14) and (15) are combined in

    p x x y y p y x p x x p x yt t t

    tt t t t t

    t, ,.

    −−

    − −−( ) ∝ ( ) ( )1 1

    2

    1 11

    Resample��� �� (( )

    1. Propagate� ���� ���� (16)

    In other words, the BF fi rst propagates particles from the posterior at time t − 1 in order to generate particles from the prior at time t. Then it resamples the propagated particles with weights proportional to their likelihoods. This is Algorithm 1 below, whose recursions are illustrated in Figure 1.

    Resample–propagate fi ltersSimilarly, the APF fi rst resamples particles from the posterior at time t − 1 with weights taking into account the next observed data point, yt. Then it propagates the resampled particles. The identity from equation (16) is rewritten as

    Figure 1. Bootstrap fi lter. A schematic representation of the bootstrap fi lter over two time periods. The squares are yt+1 and yt+2. From top to bottom, the fi rst, second, fourth and fi fth set of dots represent particles, while the third and sixth set of dots represent particle weights

  • Particle Filters and Bayesian Inference 173

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    p x x y y p x x y p y x pt t t

    tt t

    tt t, , ,

    .

    −−

    − −( ) ∝ ( ) ( )1 1 12

    1

    Propagate� ��� ��� xx yt

    t−

    −( )1 11. Resample

    � ���� ���� (17)

    The main diffi culty with the APF is that, in most applications, neither is p(yt|xt−1) available for pointwise evaluation (resampling) nor is p(xt|xt−1, yt) available for sampling (propagation). The APF is fully adapted when these conditions are satisfi ed. The main suggestion in Pitt and Shephard (1999) for general state-space models is as follows:

    (a) use p(yt|g(xt−1)), i.e. the data density p(yt|xt) evaluated at g(xt−1) (usually the expected value, median or mode of the state transition density p(xt|xt−1)) as the proposal weight to resample the old particle xt−1; and

    (b) use q(xt|xt−1, yt) ≡ p(xt|xt−1) as the proposal density to propagate resampled particles to the new set of particles {xt(i)}Ni=1 Note that here q(·) is blind since it does not incorporate the current obser-vations yt. See below for further details on better ways of choosing q(·).

    Since both resampled and propagated particles come from proposal densities, it follows directly from a simple SIR argument that these particles have weights given by

    wp y x p x x p x y

    p y g x p x x p x yt

    t t t t tt

    t t t t tt

    ∝( ) ( ) ( )

    ( )( ) ( )− −

    − − −

    1 11

    1 1 1−−

    ( )

    =( )

    ( )( )

    1

    1

    p y x

    p y g xt t

    t t

    (18)

    This leads to Algorithm 2 below.

    Algorithm 1: Bootstrap fi lter (BF)1. Propagate {x (i)t−1}Ni=1 to {x̃t(i)}Ni=1 via p(xt|xt−1).2. Resample {xt(i)}Ni=1 from {x̃t(i)}Ni=1 with weights wt(i) ∝ p(yt|x̃t(i)).

    Algorithm 2: Auxiliary particle fi lter (APF)1. Resample {x̃ (i)t−1}Ni=1 from {x (i)t−1}Ni=1 with weights wt(i) ∝ p(yt|g(x (i)t−1)).2. Propagate {x̃ (i)t−1}Ni=1 to {x̃t(i)}Ni=1 via p(xt|x̃t−1).3. Resample {xt(i)}Ni=1 from {x̃t(i)}Ni=1 with weights wt(i) ∝ p(yt|x̃t(i))/p(yt|g(x̃(i)t−1)).

    Algorithms 3 and 4 below are the optimal and fully adapted versions of BF and APF when p(yt|xt−1) is analytically tractable and p(xt|xt−1, yt) easy to sample from.

    Algorithm 3: Optimal bootstrap fi lter (OBF)1. Propagate {x(i)t−1}Ni=1 to {x̃t(i)}Ni=1 via p(xt|xt−1, yt).2. Resample {xt(i)}Ni=1 from {x̃t(i)}Ni=1 with weights wt(i) ∝ p(yt|x (i)t−1).

    Algorithm 4: Optimal auxiliary particle fi lter (OAPF)1. Resample {x̃ (i)t−1}Ni=1 from {x (i)t−1}Ni=1 with weights wt(i) ∝ p(yt|x (i)t−1).2. Propagate {x̃ (i)t−1}Ni=1 to {xt(i)}Ni=1 via p(xt|x̃t−1, yt).

    Choosing the proposalImprovements on the basic particle fi lter algorithm include the use of better proposal distributions in the importance sampling stage (Pitt and Shephard, 1999), the use of Markov chain Monte Carlo

  • 174 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    sampling (Gilks and Berzuini, 2001; Fearnhead, 2002; Fearnhead and Clifford, 2003) and the use of resampling (Liu and Chen, 1995; Carpenter et al., 1999), which can be important to avoid having only a small number of particles with non-negligible weight.

    Pitt and Shephard (1999) suggest local linearization of the observation equation (via an extended Kalman fi lter-type approximation) in order to construct a proposal propagation density, say q(xt|xt−1, yt), for the OAPF propagation density p(xt|xt−1, yt), which takes into account the current observation yt and, potentially, outperforms the naïve blind propagation proposal density p(xt|xt−1). See Liu and Chen (1995), Carpenter et al. (1999), Gilks and Berzuini (2001), Doucet et al. (2000), Fearnhead (2002) and Guo et al. (2005), amongst others, for additional discussion on the choice of q(xt|xt−1, yt). See Chen and Lai (2007) for an interesting application of online identifi cation and adaptive control of autoregressive models with exogenous inputs (ARX models) with Markov parameter jumps. More effi cient proposal densities can be obtained in the presence of conditional linearity and/or normality. In other words, when the split of the state vector xt into x1t and x2t leads to, say, x1t|x2t being an NDLM, then part of the sequential learning algorithm can be performed exactly by analytically integrating out x1t. Such fi lters are commonly referred to as the Rao–Blackwellized particle fi lter or mixture Kalman fi lter (Chen and Liu, 2000; Andrieu and Doucet, 2002).

    Resampling or not?It has been argued that the resampling step in the BF and the second resampling step in the APF should only be performed when particle degeneracy is signaled. For instance, Kong et al. (1994) introduced the effective sample size, Neff, which they estimate by

    N̂wt

    i

    i

    Neff = ( )( )=∑1

    2

    1

    (19)

    The particle set that approximates p(xt|yt) is then represented by {(x̃t, wt)(i)}Ni=1, using the notation from Algorithms 1 and 2 above.

    Reducing MC errorRegardless of whether resampling is performed at each time period or not, when the goal is to produce summary statistics based on the posterior p(xt|yt) (for instance, mean, variance, quantiles), it is more effi cient (estimator with lower variance) to perform the computation prior to resampling. For instance, it is more effi cient to estimate E(xt|yt) by ΣNi=1ω t(i)x̃t(i)/ΣNj=1wt(j) than by ΣNi=1x̃t/N.

    Example 1: Local level model. In this example, we use the local level model to compare the per-formance of the four particle fi lter algorithms discussed above. They are the BF, APF, OBF and OAPF. As mentioned in the second section (‘Normal dynamic linear model’ above), the local level model is yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2). For this simple linear model, the traditional Kalman fi lter is available to produce the ‘optimal’ estimate of the fi ltered state vector. We use this estimate in evaluating the performance of particle fi lters.

    Based on results of the second section, it is easy to see that (i) xt|yt ~ N(mt, Atσ2), where mt = (1 − At)mt−1 + Atyt and At = (At−1σ2 + τ2)/(At−1σ2 + τ2 + σ2), (ii) yt|xt−1 ~ N(xt−1, σ2 + τ2), and (iii) xt|xt−1, yt ~ N(ω2xt−1/τ2 + yt/σ2), ω2), where ω−2 = σ−2 + τ−2. Thus the four particle-fi lter algorithms are easy to implement. To compare the fi lters, we employ the criterion of mean square errors, which are com-puted using R runs of each particle fi lter f in {BF, OBF, APF, OAPF}, across M time series of length T. Specifi cally, the MSE is given by MSEf = Σt,m,r(x̂ftmr − x̃tm)2/(TMR), where x̃tm is obtained via the

  • Particle Filters and Bayesian Inference 175

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    standard Kalman fi lter recursions (equations (3)–(8)) for the mth dataset up to time t, and x̂ftmr = ΣNi=1x(i)ftmr/N is the particle approximation for x̃tm based on N particles {x(i)ftmr}Ni=1.

    The relative MSE, relative to the bootstrap fi lter, is defi ned as RMSEf = MSEf /MSEBF for f in {OBF, APF, OAPF}. Results are summarized in Figure 2. From the plots, OAPF outperforms OBF for all four values of τ2 and OBF fares better than BF for all four values of τ2. Also, it seems that BF performs much better than APF when the signal-to-noise ratio, τ/σ, is greater than one.

    Review papersSince Gordon et al. (1993), several review papers have contributed to straightening out the sub-area of sequential Monte Carlo. Here we list a small subset of these papers. The choice is rather subjec-tive and based on our limited and biased views of the fi eld. A few of the early reviews are by Doucet et al. (2000) and Arulampalam et al. (2002), books by Liu (2001), Doucet et al. (2001a) and Ristic et al. (2004) and the 2002 special issue of IEEE Transactions on Signal Processing on sequential Monte Carlo methods. See also the review by Chen (2003).

    More recent studies, along with this paper, are Cappé et al. (2007), Doucet and Johansen (2010) and Prado and West (2010, Ch. 6). They carefully organize and highlight the fast development of the fi eld over the last decade, such as parameter learning, more effi cient particle smoothers, particle fi lters for highly dimensional dynamic systems and, perhaps the most recent one, the interconnections between MCMC and SMC methods.

    PARAMETER LEARNING

    Consider again the general dynamic model. We now address explicitly the unknown vector of static parameters θ of the model:

    y x p y xt t t t, ,θ θ∼ ( ) (20)

    x x p x xt t t t− −( )1 1, ,θ θ∼ (21)

    for t = 1, . . . , T and initial probability density p(x0|θ) and prior p(θ). There are primarily two ways to tackle the problem of learning θ: batch sampling and online sampling.

    Batch samplingThe solution involves obtaining an approximation, say pN(yT|θ), to the joint likelihood p(yT|θ). For the NDLM of the second section (‘Normal dynamic linear model’ above), the predictive density was obtained analytically from equation (13). The approximation pN(yT|θ) can be obtained by any of the previous fi lters as

    p y p y yN

    p y xN T N tt

    t

    T

    T t ti

    i

    N

    t

    T

    θ θ θ( ) = ( ) = ( )−=

    ( )

    ==∏ ∑∏1

    1 11

    1, , (22)

    where xt(i) ~ p(xt|x(i)t−1, θ), for i = 1, . . . , N. Therefore, the components of θ can be sampled iteratively via a standard MCMC sampler, such as a Metropolis–Hastings algorithm, or via an SIR step. Two of the major drawbacks of this solution are that (i) SMC loses its appealing sequential nature and (ii) the overall MCMC or SIR scheme can be highly sensitive to the approximation pN(yT|θ). See, for instance, Chopin (2002) and Del Moral et al. (2006) for more theoretical justifi cations and further

  • 176 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    tau

    RM

    SE

    0.5

    1.0

    1.5

    2.0

    2.5

    0.22 0.71 1 1.41

    N=100

    OBF

    APF

    OAPF

    OBF

    APF

    OAPF

    tau

    RM

    SE

    1.5

    2.0

    2.5

    0.22 0.71 1 1.41

    N=1000

    0.5

    1.0

    tau

    RM

    SE

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    0.22 0.71 1 1.41

    N=50

    OBF

    APF

    OAPF

    OBF

    APF

    OAPF

    tau

    RM

    SE

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    0.22 0.71 1 1.41

    N=100

    Figure 2. Comparison between BF, OBF, APF and OAPF via relative mean square error (RMSE). Local level model is used, where yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2), for t = 1, . . . , T, x0 ~ N(m0, C0), σ = 1, τ = 0.22, 0.71, 1.0 or 1.41, m0 = 0 and C0 = 10. The starting value is x0 = 0 and N denotes the number of particles. RMSE is based on M = 10 time series of length T and R = 10 runs of each particle fi lter per time series. Top row: sample size T = 100; bottom row: T = 1000

  • Particle Filters and Bayesian Inference 177

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    details, Doucet and Tadic (2003), Andrieu et al. (2004), Poyiadjis et al. (2005), Andrieu et al. (2005) and Olsson et al. (2008) for expectation-maximization-like schemes, and Fernández-Villaverde and Rubio-Ramírez (2005, 2007) and DeJong et al. (2009) for applications in dynamic stochastic general equilibrium macroeconomic models.

    Particle fi lters and MCMCBefore introducing particle fi lters that learn about parameters in a sequential manner, we should mention that hybrid schemes that combine particle methods and MCMC methods are abundant. Gilks and Berzuini (2001) and Polson et al. (2008), for instance, use MCMC steps to sample and replenish static parameters in dynamic systems. Andrieu et al. (2010) introduce particle MCMC methods to effi ciently construct proposal distributions in high dimension via SMC methods.

    Online samplingThe solution here is to produce sequential MC approximations to p(xt, θ|yt), sometimes p(xt−1, xt, θ|yt) and/or other small dimensional functions (small compared to t) of (xt, θ) conditional on yt. Simply resampling θ over time is bound to fail since, in general, after a few time steps the particle set will contain only one particle. Gordon et al. (1993) suggest incorporating artifi cial evolution noise for θ when tackling the problem of sequentially learning the static parameters of a state-space model. Since parameters are not states, adding noise artifi cially will eventually distort and compromise the validity of the approximated posterior distributions. Their approach imposes a loss of information in time as artifi cial uncertainties added to the parameters eventually result in a very diffuse posterior density for θ. In what follows, we introduce three well-established fi lters for sequentially learning both xt and θ: (i) the Liu and West fi lter; (ii) the Storvik fi lter; and (iii) the particle learning (PL) fi lter of Carvalho et al. (2010) and Lopes et al. (2010).

    Liu and West fi lterLiu and West (2001) combine (i) the APF of Pitt and Shephard (1999), (ii) a kernel smoothing approximation to p(θ|yt−1) via a mixture of multivariate normals, and (iii) a neat shrinkage idea to incorporate artifi cial evolution for θ without the associated loss of information; see West (1993a,b). More specifi cally, let the set of i.i.d. particles {x(i)t−1, θ (i)t−1}Ni=1 approximate p(xt−1, θ|yt−1) such that p(θ|yt−1) can be approximated by

    p yN

    f m VN t jj

    N

    θ θ− ( )=

    ( ) = ( )∑11

    1; , (23)

    where m(j) = aθ (j)t−1 + (1 − a)θ–, θ– = ΣNj=1θ (j)t−1/N, V = h2ΣNj=1(θ (j)t−1 − θ

    –)(θ–(j)t−1 − θ

    –)′/N and h2 = 1 − a2. The

    subscript t of θt is used only to indicate that samples are from p(θ|yt). The APF of Pitt and Shephard (1999) of equation (17) can now be written for the state vector (xt, θt) as

    p x x y y p y x p x y pt t t t tt

    t t t t tt, , , , , ,− − − − − − − −( ) = ( ) ( )1 1 1 1 1 1 1 1θ θ θ θ θ tt t

    t t tt

    t t

    y

    p x x y p

    −−

    ( )

    × ( )

    11

    1

    1

    .

    , ,

    Resample� �������� ��������

    θ θ θ −−( )12

    ,.

    yt

    Propagate� ������ ������

    (24)

    In general and similar to the APF fi lter of the third section, p(yt |xt−1, θ) is not available for point-wise evaluation and/or p(xt |xt−1, θt, yt) is not easy to sample from. Liu and West resample old particles with

  • 178 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    weights proportional to p(yt |g(xt−1), m(θt−1)), where g(·) and m(·) are described above. Then, they propa-gate θt from the proposal propagation density p(θt |θt−1) and propagate xt conditional on θt from the evolution density q(xt|xt−1, θ, yt) ≡ p(xt |xt−1, θt). The propagated particles (xt, θt) have associated weights

    �ωθ

    θtt t t

    t t t

    p y x

    p y g x m∝

    ( )( ) ( )( )− −

    ,,1 1

    which leads to Algorithm 5 below.The performance of the LW fi lter depends on the choice of tuning parameter a, which drives both

    the shrinkage and the smoothness of the normal approximation. It is common practice to set a around 0.98 or higher. The components of θ can be either transformed in order to accommodate the approxi-mate local normality or the multivariate normal approximation could be replaced by a composition of, say, conditionally normal densities for location parameters and inverse-gamma densities for scale/variance parameters. See, for example, Petris et al. (2009, pp. 222–228) for an example based on the local level model and Carvalho and Lopes (2007) for an application to Markov switching stochastic volatility models.

    Example 2: Stochastic volatility model. In its simplest form, asset returns yt are modeled as con-ditionally independent normal random variables with log-variance xt following a fi rst-order autore-gressive model, i.e. yt |xt ~ N (0, exp{xt}) and xt |xt−1 ~ N(α + βxt−1, τ 2); see Jacquier et al. (1994) and Kim et al. (1998). One possible version of the LW fi lter assumes, for example, that θ = (α, β, logτ2) and g(xt−1) = α + βxt−1.

    Algorithm 5: Liu and West fi lter (LWF)1. Resample {(x̃t−1, θ̃ t−1)(i)}Ni=1 from {xt−1, θt−1)(i)}Ni=1 with weights wt(i) ∝ p(yt|g(x(i)t−1), m(i)), where m(i) is

    defi ned in equation (23).2. Propagate

    (a) {(θ̃ (i)t−1}Ni=1 to {(θ̂t(i)}Ni=1 via N(m̃(i), V), then(b) {(x̃ (i)t−1}Ni=1 to {(x̂t(i)}Ni=1 via p(xt |x̃ (i)t−1, θ̃ t(i)).

    3. Resample {(xt, θt)(i)}Ni=1 from {(x̂t, θ̂)(i)}Ni=1 with weights wp y x

    p y g x mti t t

    iti

    t ti i

    ( )( ) ( )

    −( ) ( )∝

    ( )( )( )

    ˆ , ˆ

    ,

    θ� �1

    .

    Storvik fi lterFor the class of state-space models where p(θ|xt, yt) can be rewritten as p(θ|st), where st is a low-dimensional set of conditionally suffi cient statistics and can be recursively computed via st = S(st−1, xt, yt), Storvik (2002) (see also Fearnhead, 2002) proposed Algorithm 6 below. This algorithm can be thought of as an extension of the bootstrap fi lter with the additional steps of sequentially updating the suffi cient statistics and sampling θ.

    Algorithm 6: Storvik fi lter (SF)1. Propagate {(x(i)t−1}Ni=1 to {(x̃t(i)}Ni=1 via q(xt |xt−1, θ, yt).2. Resample {(xt, st−1)(i)}Ni=1 from {(x̃t, st−1)(i)}Ni=1 with weights w

    p y x p x x

    q x x yti t t

    iti

    ti

    ti

    ti t

    ( )( ) ( )

    −( )

    ( )−

    ( )∝( ) ( )

    ( )� ��, ,

    , ,θ θ

    θ1

    1

    ..3. Compute suffi cient statistics: st = S(st−1, xt, yt).4. Sample θ from p(θ|st).

    Example 2 (continued). The stochastic volatility model admits recursive suffi cient statistics for θ = (α, β, τ 2) when p(θ) is conditionally conjugate normal-inverse gamma. More precisely, when

  • Particle Filters and Bayesian Inference 179

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    (α, β|τ 2) ~ N (b0, τ 2B0) and τ 2 ~ IG(c0, d0), it is easy to see that (α, β|τ2, xt) ~ N(bt, τ 2Bt) and (τ 2|xt) ~ IG(ct, dt), where Bt−1 = B−1t−1 + ztz′t, Bt−1bt = B−1t−1bt−1 + xtzt, ct = ct−1 + 1/2, dt = dt−1 + (xt − b′t zt)xt/2 + (bt−1 − bt)′ B−1t−1bt−1/2 and z′t = (1, xt−1). The recursive suffi cient statistics are functions of xt−1, x2t−1, xt−1xt and x2t.

    All simulated exercises in Storvik (2002) are based on a blind propagation rule, i.e. q(xt |xt−1, θ, yt) above is equal to p(xt|xt−1, θ). In this case, resampling is performed with weights wt ∝ p(yt |xt, θ). Like any other PF with blind propagation, such as the bootstrap fi lter, this fi lter is bound to suffer from particle degeneracy, which in turn directly compromises sequential parameter estimation.

    Particle learningCarvalho et al. (2010) present methods for sequential fi ltering, particle learning (PL) and smoothing for rather general state-space models. They extend Chen and Liu’s (2000) mixture Kalman fi lter (MKF) methods by allowing parameter learning and utilize a resample–propagate algorithm together with a particle set that includes state-suffi cient statistics. Recall the simulated exercise from the third section that empirically shows that resample–propagate fi lters tend to outperform propagate–resample ones. They also show via several simulation studies that PL outperforms both the LW and Storvik fi lters and is comparable to MCMC samplers, even when full adaptation is considered. The advantage is even more pronounced for large values of T.

    Let st and sxt denote the parameter and state-suffi cient statistics satisfying deterministic updating rules st = S(st−1, xt, yt), as in the Storvik fi lter from the previous subsection, and sxt = K(sxt−1, θ, yt), for K(·) mimicking the Kalman fi lter recursions (see Example 3 below). Then PL can be described as follows.

    Algorithm 7: Particle learning (PL)1. Resample (θ̃ , s̃xt−1, s̃t−1) from (θ, sxt−1, st−1) with weights wt ∝ p(yt|sxt−1, θ).2. Sample xt from p(xt |s̃xt−1, θ̃ , yt).3. Update parameter-suffi cient statistics: st = S(s̃t−1, xt, yt).4. Sample θ from p(θ|st).5. Update state-suffi cient statistics: sxt = K(s̃xt−1, θ, yt).

    In many cases S will also be a function of xt−1 and possibly other lags of the state variable, such as in the stochastic volatility model of Example 2. In these cases, the above algorithm is slightly changed and particles for such lagged values are also carried over time. Therefore, step 2 is modifi ed to sample (xt−1, xt) from p(xt−1, xt|sxt−1, θ, yt), which implies sampling xt−1 from p(xt−1|sxt−1, θ, yt) and xt from p(xt|xt−1, θ, yt).

    Example 3: Conditional NDLM. Carvalho et al. (2010) derive the PL scheme for the class of conditional NDLM defi ned by the observation and evolution equations that assume the form of a linear system (see the NDLM equations (1) and (2)) conditional on an auxiliary state λt:

    y F x v v Nt t t tt t= ′ + ( )λ λσ, ,∼ 0 2 (25)

    x G x w w Nt t t tt t= + ( )−λ λτ1 20, ,∼ (26)

    with the quadruple {Fλt, Gλt, σ2λt, τ

    2λt} being a function of the static parameter vector θ. The marginal

    distributions of the observation error and state shock distributions are any combination of normal, scale mixture of normals, or discrete mixture of normals depending on the specifi cation of the

  • 180 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    distribution of the auxiliary state variable p(λt+1|θ) (Chen and Liu, 2000). Extensions to hidden Markov specifi cations where λt+1 evolves according to p(λt+1|λt, θ) are discussed in Carvalho et al. (2010). In the case where the auxiliary state variable λt is discrete, such as in stochastic volatility with jumps models (Markovian or not), the state xt−1 can be analytically integrated out, in addition to xt and λt, at the initial resampling step, i.e. p(yt|(λt−1, sxt−1, θ)(i)) = Σλtp(yt|λt, (s

    xt−1, θ)(i))p(λt|(λt−1, θ)(i)), where the con-

    ditional suffi cient statistics for states (sxt ) and parameters (st) satisfy the deterministic updating rules st = S(st−1, xt, λt, yt) and sxt = K(sxt−1, θ, λt, yt), where S(·) denotes, as defi ned previously, the recursive update of the parameter suffi cient statistics and K(·) denotes the Kalman fi lter recursions of the con-ditional NDLM given in equations (3)–(8). The algorithm can be summarized as follows:

    1. resample (λ̃t−1, θ̃ , s̃xt−1, s̃t−1) from (λt−1, θ, sxt−1, st−1) with weights wt ∝ p(yt|λt−1, sxt−1, θ);2. sample λt from p(λt |λ̃t−1, θ̃ , yt);3. sample xt from p(xt |λt, s̃xt−1, θ̃ , yt);4. compute st = S(st−1, xt, λt, yt);5. sample θ from p(θ|st); and6. compute sxt = K(λt, sxt−1, θ, yt).

    In the case where the auxiliary state variable λt is continuous, the authors extend the above scheme by adding to the current particle set a propagated particle λt+1 ~ p(λt+1 |(λt, θ)(i)).

    PL in time series modelsSuccessful implementations of PL (and hybrid versions of PL) have appeared over the last couple of years. Rios and Lopes (2009), for example, propose a hybrid LW–Storvik fi lter for the Markov switching stochastic volatility model that outperforms the Carvalho and Lopes (2007) fi lter. Lund and Lopes (2009) sequentially estimate a regime-switching macro-fi nance model for the postwar US term structure of interest rates, while Prado and Lopes (2009) adapt PL to study state-space autore-gressive models with structured priors. Chen et al. (2009) propose a hybrid PL–LW sequential MC algorithm that fully estimates nonlinear, non-normal dynamic to stochastic general equilibrium models, with a particular application in a neoclassical growth model. Additionally, Dukić et al. (2009) use PL to track infl uenza epidemics using Google trends data, while Lopes and Polson (2010) use PL to estimate volatility and examine volatility dynamics for fi nancial time series, such as the S&P500 and the NDX100 indices, during the early part of the credit crisis.

    Sequential Bayesian computation via PLLopes et al. (2010) develop a simulation-based approach to sequential Bayesian computation for both dynamic and non-dynamic systems. They show through various important applications that PL provides a simple yet powerful framework for effi cient sequential posterior sampling strategies. For example, Carvalho et al. (2009b) adapt PL to a rich and general class of mixture models that include fi nite mixture models and Dirichlet process mixture models, as well as for the less common settings of latent feature selection through an Indian Buffet process and dependent distribution tracking through a probit stick-breaking model. Taddy et al. (2010) show that PL is the best alternative to perform online posterior fi ltering of tree states in dynamic regression tree models, while Gramacy and Polson (2010) use PL for online updating of Gaussian process models for regression and clas-sifi cation. Shi and Dunson (2009) adopt PL for stochastic variable selection and model search in linear regression and probit models, while Mukherjee and West (2009) focus on model comparison for applications in cellular dynamics in systems biology.

  • Particle Filters and Bayesian Inference 181

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Example 4: Comparison between LW, Storvik and PL. We compare the performance of these particle fi lters using the local level model of Example 1 with parameter learning, where the random walk system equation is replaced by a fi rst-order autoregression. More precisely, yt|xt, θ ~ N (xt, σ2) and xt|xt−1, θ ~ N(α + βxt−1,τ2), where x0 ~ N(m0, C0) and θ = (α, β, τ 2, σ2). The prior distribution of θ is p(θ) = p(σ2)p(τ2)p(α, β|τ 2), where σ2 ~ IG(n0/2, n0σ 20/2), τ2 ~ IG(�0/2, �0τ2/2) and (α, β) ~ N(b0, τ2B0). The recursive suffi cient statistics for θ can easily be derived. It can be shown that β|(τ2, xt) ~ N(bt, τ 2Bt) and τ2|xt ~ IG(�1/2, �1τ 21/2), where �1 = �0 + t, Bt−1 = B0−1 + Z′tZt, Bt−1bt = B0−1b0 + ZtTzt, and �1τ 21 = �0τ 20 + (zt − Xtbt)′zt + (b0 − bt)′B0−1b0, for zt = (x1, . . . , xt)′, Zt = (1t, Z2t), Z2t = (x0, . . . , xt−1)′ and 1t a t-dimensional vector of ones. The quantities (nt, ntτ 2t, bt, Bt) can be rewritten recursively as functions of (�t−1, �t−1τ 2t−1, bt−1, Bt−1), xt−1, xt and yt. A time series of length T = 200 was simulated using θ = (0.0, 0.9, 0.5, 1.0) and x0 = 0. The prior hyperparameters are m0 = 0, C0 = 10, b0 = (0.0, 0.9)′, B0 = I2, n0 = v0 = 10, τ 20 = 0.5 and σ 20 = 1.0, leading to relatively uninformative prior informa-tion. The performance of the fi lters is assessed by running each algorithm for R = 100 times based on N = 1000 particles. A very long PL (N = 100,000) is run to serve as a benchmark for comparison. Let q(γ, α, t) be the 100αth percentile of p(γ|yt), where γ is an element of θ. We defi ne the root mean squared error as the square root of MSE(γ, α, f, t) = Σt,r[q(γ, α, t) − qfr(γ, α, t)]2/R for fi lter f in {LW, STORVIK, PL} and replication r = 1, . . . , R. Finally, a full adaptation is implemented for the three fi lters. In other words, LW differs from PL only through the sequential estimation of θ, Storvik differs from PL only to the extent that Storvik propagates fi rst and then resamples, while PL resa-mples fi rst and then propagates. Results are summarized in Figures 3 and 4. Both the Storvik fi lter and PL are signifi cantly better than the LW fi lter, while PL is moderately better than Storvik, par-ticularly when estimating the pair (τ2, σ2).

    SmoothingIn addition to delivering sequential fi ltering for parameters and states, particle fi lters can also be implemented effectively when the main goal is smoothing the states conditional on the whole vector of observations yT. In this sense, particle smoothers are alternatives to MCMC in state-space models (Kitagawa, 1996). Godsill et al. (2004) introduced an algorithm that relies on (i) forward particle sampling and (ii) backward particle resampling. Carvalho et al. (2010) extend the algorithm to accommodate sequential learning of the parameter vector θ. In this more general case, it can be shown that

    p x y p x x y p x yT T t t tt

    T

    TT, , , ,θ θ θ( ) ( ){ } ( )+

    =

    ∏ 11

    1

    (27)

    whose components, by Bayes’ rule and conditional independence, are

    p x x y p x x p x yt t t t t t t+ +( ) ∝ ( ) ( )1 1, , , ,θ θ θ (28)

    This leads to a backward sampling algorithm that resamples forward particles xt from p(xt |θ, yt) with weights proportional to p(xt+1 |xt, θ). More precisely, for each particle i, for i = 1, . . . , N, start with (x̃T, θ̃ )(i) = (xT, θ)(i), i.e. a draw from p(xT, θ|yT). Then, for t = T − 1, . . . , 1, sample x̃t(i) from {xt(j)}Nj=1 with weights πt(j) ∝ p(x̃t+1|xt(j), θ̃ ). In the end, (x̃1, . . . , x̃T)(i) are draws from p(xT|yT) for i = 1, . . . , N. Note that the algorithm is O(TN2), so the computational time to obtain draws from p(xT|yT) is expected to be much larger than the computational time to obtain draws from p(xt|yt) via standard SMC fi lters for t = 1, . . . , T. See Briers et al. (2010) for an alternative O(TN2) SMC smoother for

  • 182 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Figure 3. Comparison between LWF, SF and PL. Percentiles of p(θ|yt) (2.5th, 50th and 97.5th) based on 100 replications of each particle fi lter with N = 1000 particles (gray lines). Black lines are based on PL and N = 100,000. Liu and West fi lter (left column), Storvik’s fi lter (center column) and particle learning (right column). The row (from top to bottom) represents the components of θ = (α, β, τ2, σ2)

  • Particle Filters and Bayesian Inference 183

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Per

    cent

    ile

    Root MSE

    0.000.050.100.150.200.25

    2.5t

    h50

    th97

    .5th

    Per

    cent

    ile

    Root MSE

    0.000.050.100.15

    2.5t

    h50

    th97

    .5th

    beta

    alpha

    Per

    cent

    ile

    Root MSE

    0.00.10.20.30.4

    2.5t

    h50

    th97

    .5th

    tau2

    Per

    cent

    ile

    Root MSE

    0.00.20.40.60.81.01.2

    2.5t

    h50

    th97

    .5th

    sigma2

    LW Sto

    rvik

    PL

    Figu

    re 4

    . C

    ompa

    riso

    n be

    twee

    n LW

    F, S

    F an

    d PL

    . R

    oot

    mea

    n sq

    uare

    d er

    ror

    of R

    = 1

    00 r

    eplic

    atio

    ns f

    or e

    ach

    fi lte

    r. A

    ll fi l

    ters

    are

    bas

    ed o

    n N

    = 1

    000

    part

    icle

    s an

    d th

    e ro

    ot m

    ean

    squa

    red

    erro

    r is

    com

    pute

    d ag

    ains

    t a

    long

    PL

    run

    (N

    = 1

    00,0

    00)

  • 184 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    the case where θ is known. An O(TN) smoothing algorithm has recently been introduced by Fearnhead et al. (2008b) also for the case where θ is known. See also Douc et al. (2009a) for additional FFBS particle approximations.

    Example 4 (continued): comparison between PL and MCMC. In the case of pure fi ltering, i.e. when the parameter vector θ = (α, β, τ2, σ2) is known and fi xed, it is easy to see that both fi ltered and smoothed distributions, p(xt |yt, θ) and p(xt |yT, θ), are available in closed form (see equations (5) and (10), respectively). For example, Figure 5 shows that results of particle fi ltering and a smoothing

    Time

    Sta

    tes

    0 50 100 150 200

    −4

    −2

    02

    4

    TRUEOAPF

    (a)

    Time

    0 50 100 150 200

    −4

    −2

    02

    4

    MCMCPL

    (b)

    Figure 5. Particle smoothing. (a) Comparing the true fi ltered and smoothed distributions, p(xt |yt) and p(xt |yT), respectively, with approximations based on N = 2000 particles from the OAPF. (b) Comparing the MCMC and PL approximations to the fi ltered and smoothed distributions, p(xt |yt) and p(xt |yT). MCMC is based on M = 2000 draws, after M0 = 10,000 as burn-in, while PL is based on N = 2000 particles

  • Particle Filters and Bayesian Inference 185

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    approximation based on the OAPF (N = 2000 particles) virtually match the true distributions. Figure 6 shows that both MCMC approximation (M = 2000 draws, after M0 = 10,000 as burn-in) and PL approximation (N = 2000 particles) to p(α|yT), p(β|yT), p(τ2|yT) and p(σ2|yT) are virtually identical. Computational cost is measured here in CPU seconds with N = 1000 and M0 = M = 1000 points. FFBS is about one order of magnitude faster than PL for smoothing (34 s versus 233 s), but PL is approximately three orders of magnitude faster than FFBS for fi ltering (2 s versus 3500 s).

    Sequential model assessmentOne of the direct benefi ts of particle fi lters is the simple approximation of marginal likelihoods and Bayes factors. These tasks are usually rather involved under MCMC approximations, where comput-ing marginal likelihoods is essentially an independent task in the MCMC paraphernalia. See Kass and Raftery (1995), Han and Carlin (2000), Lopes and West (2004), Gamerman and Lopes (2006, Ch. 7) and references therein for additional details on several MCMC-based algorithms for Bayesian model assessment.

    Even in the simple case of the NDLM in the second section above and equation (13), computing p(y1, . . . , yT) is a non-trivial task. In this case, the simplest Monte Carlo solution is

    p yN

    p y p y p y pN T Ti

    i

    NT T( ) = ( )( ) ≈ ( ) = ( ) ( )( )

    =∑1 2 2

    1

    2 2 2 2 2σ τ σ τ σ τ σ, , , d dττ 2∫ (29)

    where {(σ2, τ 2)(i)}Ni=1 is a random sample from the prior p(σ2, τ2). Despite its simplicity, this approxi-mation is very unstable when the prior and the likelihood for (σ2, τ2) are moderately separated. Moreover, the MC approximation deteriorates quickly for more general state-space models where the dimension of the parameter space is likely to be greater than two. The sequential Monte Carlo solution to this problem is rather straightforward, with equation (13) being approximated by

    p yN

    p y xN TT t t

    i

    i

    N

    t

    T

    ( ) = ( )( )− ( )==∑∏1 1 2 2

    11

    , ,σ τ (30)

    where {(xt−1, σ2, τ2)(i)}Ni=1 is the particle approximation to p(xt, σ2, τ2|yt−1) obtained from the LW fi lter, Storvik’s fi lter or PL. See the stochastic volatility with Student-t errors below for the sequential computation of posterior model probabilities.

    There are contributions that explicitly deal with parameter, state and model uncertainties all together via SMC methods. Fearnhead (2004), MacEachern et al. (1999) and Carvalho et al. (2009b) use particle methods for general mixtures.

    APPLICATIONS

    In this section we apply particle fi lters to four time series problems that are of common interest in many scientifi c areas. The fi rst application revisits the stochastic volatility model of Example 2. The second application is concerned with the Markov switching stochastic volatility model of Carvalho and Lopes (2007). The last two applications illustrate the use of particle fi lters in modeling realized volatilities and in estimating unemployment rates via a dynamic generalized linear model.

  • 186 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    alpha

    MC

    MC

    −0.2 −0.1 0.0 0.1 0.2

    02

    46

    8

    alpha

    PL

    −0.2 −0.1 0.0 0.1 0.2

    02

    46

    8

    beta

    MC

    MC

    0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

    02

    46

    810

    beta

    PL

    0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

    02

    46

    810

    tau2

    MC

    MC

    0.2 0.4 0.6 0.8 1.0 1.2

    01

    23

    tau2

    PL

    0.2 0.4 0.6 0.8 1.0 1.2

    01

    23

    sigma2

    MC

    MC

    0.6 0.8 1.0 1.2 1.4 1.6

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    sigma2

    PL

    0.6 0.8 1.0 1.2 1.4 1.6

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    Figure 6. Parameter learning. MCMC (left column) and PL (right column) approximations to p(θ|yT). Rows are the components of θ, i.e. α, β, τ2 and σ2. MCMC is based on M = 2000 draws, after M0 = 10,000 as burn-in, while PL is based on N = 2000 particles

  • Particle Filters and Bayesian Inference 187

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Dynamic beta regressionDa Silva et al. (2009) use a dynamic beta regression to analyze (via MCMC) the Brazilian monthly unemployment rate from March 2002 to December 2009. More precisely, they model the unemploy-ment rate at time t, namely yt, by

    yt t t tμ φ φμ φ μ, ,∼ −( )( ) ( )beta beta model1

    μ βt t− = + −{ } ( )1 1 exp link function

    β β βt t tW N W− −( ) ( )1 1, ,∼ transition

    for t = 1, . . . , T, β0 ~ N(m0, C0), φ ~ IG(a0, b0) and W ~ IG(c0, d0). The dynamic beta regression is a special case of the dynamic generalized linear model (DGLM) of West et al. (1985), where the observational distribution belongs to the exponential family. The data were downloaded from the Brazilian Institute for Geography and Statistics (IBGE) site.1

    We illustrate the computation of sequential Bayes factors via particle fi lters by comparing the beta regression model to a simple local level model, i.e. yt |μt, σ2 ~ N(μt, σ2) and μt|μt−1, τ2 ~ N(μt−1, τ2), where μ0 ~ N(m0, C0), σ2 ~ IG(a0, b0) and τ2 ~ IG(c0, d0) with given hyperparameters. The prior hyperparameters were set at m0 = 0.1, C0 = 100, a0 = 2.1, b0 = (a0 + 1)0.00001, c0 = 2.1 and d0 = (c0 + 1)0.00001, for the local level model and at m0 = log(y1/(1 − y1)), C0 = 0.1, a0 = 2.1, b0 = (a0 + 1)15,000, c0 = 2.1 and d0 = (c0 + 1)0.05, for the dynamic beta model. More general dynamics, such as seasonality, could easily be included in both models with only slight modifi cations to the models and particle fi lters. We ignore the seasonality here for simplicity.

    Sequential inference for the local level model was performed by PL whereas that for the dynamic beta regression was performed by the LWF. Results appear in Figure 7, while a Monte Carlo study is presented in Figures 8 and 9. The estimations of μt under both models are relatively similar, with the sequential Bayes factor slightly favoring the dynamic beta regression model. The Monte Carlo error associated with the estimation of parameters and Bayes factors is relatively small. See Carvalho et al. (2009a) for more details about PL for dynamic generalized linear models and dynamic discrete-choice models.

    Stochastic volatility model with Student-t innovationsWe revisit the simple SV model with normal innovations of Example 2 and compute sequential Bayes factors against the alternative SV model with Student-t innovations.2 We use monthly log returns of GE stock from January 1926 to December 1999 for 888 observations. This series was analyzed in Example 12.6 of Tsay (2005, Ch. 12).3 The competing models are

    Observation equation: , , exp ,y x t xt t tθ η( ) { }( )∼ 0

    System equation: , ,x x N xt t t− −( ) +( )1 1 2θ α β τ∼

    where tη (μ, σ2) denotes the Student-t distribution with η degrees of freedom, location μ and scale σ2. The number of degrees of freedom η is treated as known. Sequential posterior inference is based on the Liu and West fi lter with N = 100,000 particles. The shrinkage constant a is set at a = 0.95, whereas prior hyperparameters are m0 = 0, C0 = 10, �0 = 3, τ 20 = 0.01, b0 = (0, 1)′ and B0 = 10I2. 1 http://www.sidra.ibge.gov.br/bda/pesquisas/pme/default.asp#dead.2 For additional early use of APF in SV models see Chib et al. (2002, 2006) and Omori et al. (2007).3 The data are available at http://faculty.chicagobooth.edu/ruey.tsay/teaching/fts2/m-geln.txt.

  • 188 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    (a)

    Months

    0.08

    0.10

    0.12

    0.14

    mar/03 jan/04 jan/06 jan/08 dec/09

    Local levelDynamic Beta

    (b)

    Months

    Bay

    es fa

    ctor

    1.0

    1.1

    1.2

    1.3

    1.4

    mar/03 jan/04 jan/06 jan/08 dec/09

    (c)

    Months

    σσ

    0.00

    20.

    004

    0.00

    60.

    008

    0.01

    0

    mar/03 jan/04 jan/06 jan/08 dec/09

    (d)

    Months

    ττ

    0.00

    20.

    004

    0.00

    60.

    008

    0.01

    0

    mar/03 jan/04 jan/06 jan/08 dec/09

    (e)

    Months

    (( 1((1

    ++φφ ))

    ))(( 12 ))

    0.00

    50.

    010

    0.01

    50.

    020

    0.02

    5

    mar/03 jan/04 jan/06 jan/08 dec/09

    (f)

    Months

    W((1

    2))

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    mar/03 jan/04 jan/06 jan/08 dec/09

    Figure 7. Dynamic beta regression. (a) Sequential Monte Carlo (SMC) approximations for the median and 95% credibility interval based on N = 10,000 particles for both models. (b) Sequential Bayes factor. (c, d) Sequential parameter learning for σ and τ from the local level model. (e, f) Sequential parameter learning for (1 + ø)−1/2 and W1/2 from the dynamic beta model

  • Particle Filters and Bayesian Inference 189

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    σσ

    0.00

    20.

    004

    0.00

    60.

    008

    0.01

    00.

    012

    0.01

    4

    mar/03 jan/04 jan/06 jan/08 dec/09

    ττ

    0.00

    20.

    004

    0.00

    60.

    008

    0.01

    00.

    012

    mar/03 jan/04 jan/06 jan/08 dec/09

    ((1 ((1 ++ φφ))))((1 2))

    0.00

    50.

    010

    0.01

    50.

    020

    0.02

    50.

    030

    0.03

    5

    mar/03 jan/04 jan/06 jan/08 dec/09

    W((1 2))

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    mar/03 jan/04 jan/06 jan/08 dec/09

    Figure 8. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on N = 10,000 particles. Top row: σ and τ from the local level model. Bottom row: (1 + ø)−1/2 and W1/2 from the dynamic beta model

    Particle approximation to the sequential posterior model probabilities, assuming a uniform prior for η over models {t∞, t2, . . . , t20}, appears in Figure 10, where t∞ denotes the normal distribution. Figure 10(d) shows percentiles of p(σt |yt) when integrating out over all competing models in {t∞, t2, . . . , t20}. One can argue that the data slowly move over time from a more t-like, heavy-tail model

  • 190 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    towards a more Gaussian, thin-tail model. Figures 11 and 12 present posterior summaries for the volatilities and parameters of a few competing models.

    Markov switching stochastic volatility modelJumps have been intensively studied in fi nancial data analysis; see, for example, Eraker et al. (2003). So et al. (1998) suggest a model that allows for occasional discrete shifts in the parameter determin-ing the level of the log-volatility through a Markovian process. They claim that this model not only is a better way to explain volatility persistence but is also a tool to capture changes in economic forces, as well as abrupt changes due to unusual market forces. Carvalho and Lopes (2007) adopt the LWF when sequentially estimating univariate fi nancial time series with an MSSV structure. Let us call this fi lter the CL fi lter. For illustration, we consider an MSSV model with two regimes, i.e.

    y x N xt t t, , exp ,θ ∼ 0 ( )( )

    λ θ α β τt t t s tx s N xt− −+( )1 1 2, , ,∼where Pr(st = j|st−1 = i) = pij, for i, j = 1, 2, and parameters θ = (α1, α2, β, τ2, p11, p22). Rios and Lopes (2009) propose an extension of the CL fi lter, which they named the extended LW (ELW) fi lter, which combines features of the LW fi lter and PL. The simulation exercise from Figure 13 shows that the CL fi lter degenerates after 500 observations, whereas the ELW fi lter never collapses.

    Realized volatilityWe consider the intradaily realized volatilities of Alcoa stock from 2 January 2003 to 7 May 2004 for 340 observations. The daily realized volatilities used are the sums of squares of intraday 5 min, 10 min and 20 min log returns measured in percentages; see Tsay (2005, Ch. 11). In what follows,

    Bay

    es fa

    ctor

    1.0

    1.2

    1.4

    1.6

    1.8

    2.0

    mar/03 jan/04 jan/05 jan/06 jan/07 jan/08 jan/09 dec/09

    Figure 9. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on N = 10,000 particles

  • Particle Filters and Bayesian Inference 191

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Degrees of freedom

    PM

    P

    0.04

    80.

    049

    0.05

    00.

    051

    0.05

    20.

    053

    0.05

    4

    N 3 4 5 6 7 8 9 11 13 15 17 19

    t=1

    Degrees of freedom

    PM

    P

    0.00

    0.05

    0.10

    0.15

    N 3 4 5 6 7 8 9 11 13 15 17 19

    t=444

    Degrees of freedom

    PM

    P

    0.00

    0.05

    0.10

    0.15

    N 3 4 5 6 7 8 9 11 13 15 17 19

    t=666

    Degrees of freedom

    PM

    P

    0.00

    0.05

    0.10

    0.15

    N 3 4 5 6 7 8 9 11 13 15 17 19

    t=888

    Figure 10. Stochastic volatility model. Sequential posterior model probability for the number of degrees of freedom η

    we use the logarithms of the daily realized volatilities. Figure 14 presents the time series of log realized volatilities. As expected, all three series behave similarly over time and are highly positively correlated, with the 5 and 10 min and 10 and 20 min realized volatilities more correlated than the 5 and 20 min ones. Table I shows summary statistics of the time series.

    Two competing modelsWe entertain two models: (i) the three RV time series are modeled by independent univariate local level models; and (ii) the trivariate vector of RV time series is modeled by a multivariate local level

  • 192 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    (a)

    Months

    Ret

    urns

    0 200 400 600 800

    −40

    −20

    020

    40

    (b)

    Months

    Sta

    ndar

    d de

    viat

    ion

    0 200 400 600 800

    010

    2030

    4050

    60

    (c)

    Months

    Sta

    ndar

    d de

    viat

    ion

    0 200 400 600 800

    010

    2030

    4050

    60

    (d)

    Months

    Sta

    ndar

    d de

    viat

    ion

    0 200 400 600 800

    010

    2030

    4050

    60

    Figure 11. Stochastic volatility model. (a) GE returns. (b, c) 2.5th, 50th and 97.5th percentiles of p(σt |yt, M), where σ 2t = exp{xt}, for M = t12 and M = t18, respectively. (d) 2.5th, 50th and 97.5th percentiles of p(σt |yt) by integrating out over all competing models in {normal, t2, . . . , t20}

    model. In the fi rst model, say model M1, the i-minute log realized volatility yit, for i = 5, 10, 20, is initially modeled by a local level model: (yit|xit, σ 2i ) ~ N(xit, σ 2i ) and (xit|xi,t−1, τ 2i) ~ N(xi,t−1, τ 2i). In the second model, say model M2, the univariate local level model is extended to jointly model the p = 3 time series of realized volatilities: (yt|xt, Σ) ~ N(1pxt, Σ) and (xt|xt−1, τ2)~ N(xt−1, τ2) and 1p is a p-dimensional unity vector. The diagonal elements of the covariance matrix Σ are σ 2i, for i = 1, . . . , p, and the non-diagonal elements are σij, for i < j = 1, . . . , p.

    Parameter learningThe variances σ 2i and τ 2i under M1 are, a priori, independent with σ 2i ~ IG(a0, b0), τ 2i ~ IG(c0, d0) and hyperparameters a0 = c0 = 10, b0 = 1.1 and d0 = 0.55 common across i = 5, 10, 20. In this case the prior mean and mode of σ 2i are 0.122 and 0.1, respectively, while its prior 95% credibility interval is (0.064, 0.229). Similarly, the prior mean and mode of τ 2i are 0.061 and 0.05, respectively, while

  • Particle Filters and Bayesian Inference 193

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    p((αα))

    Density

    −20

    −10

    010

    20

    0.000.020.040.060.080.100.12

    200

    300

    400

    500

    600

    700

    800

    900

    −0.20.00.20.40.60.8

    Norm

    al

    Mon

    ths

    αα

    200

    300

    400

    500

    600

    700

    800

    900

    −0.20.00.20.40.60.8

    t 12

    Mon

    ths

    αα

    200

    300

    400

    500

    600

    700

    800

    900

    −0.20.00.20.40.60.8

    t 18

    Mon

    ths

    αα

    p((ββ))

    Density

    −20

    −10

    010

    20

    0.000.020.040.060.080.10

    200

    300

    400

    500

    600

    700

    800

    900

    0.850.900.951.00

    Norm

    al

    Mon

    ths

    ββ

    200

    300

    400

    500

    600

    700

    800

    900

    0.850.900.951.00

    t 12

    Mon

    ths

    ββ

    200

    300

    400

    500

    600

    700

    800

    900

    0.850.900.951.00

    t 18

    Mon

    ths

    ββ

    p((ττ2

    ))

    Density

    0.00

    0.05

    0.10

    0.15

    0.20

    010203040

    200

    300

    400

    500

    600

    700

    800

    900

    0.000.050.100.15

    Norm

    al

    Mon

    ths

    ττ2

    200

    300

    400

    500

    600

    700

    800

    900

    0.000.050.100.15

    t 12

    Mon

    ths

    ττ2

    200

    300

    400

    500

    600

    700

    800

    900

    0.000.050.100.15

    t 18

    Mon

    ths

    ττ2

    Figu

    re 1

    2. S

    toch

    astic

    vol

    atili

    ty m

    odel

    . C

    olum

    n 1:

    mar

    gina

    l pr

    ior

    dist

    ribu

    tions

    for

    α,

    β an

    d τ2

    . Col

    umns

    2–4

    : se

    quen

    tial

    2.5t

    h, 5

    0th

    and

    97.5

    th

    perc

    entil

    es o

    f p(

    γ|yt ,

    M1)

    , for

    γ i

    n (α

    , β, τ

    2 , M

    ) an

    d m

    odel

    M ∈

    {no

    rmal

    , t12

    , t18

    }

  • 194 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Figu

    re 1

    3. M

    arko

    v sw

    itchi

    ng s

    toch

    astic

    vol

    atili

    ty. C

    arva

    lho

    and

    Lop

    es’ (

    2007

    ) fi l

    ter

    (las

    t tw

    o ro

    ws)

    and

    Rio

    s an

    d L

    opes

    ’ (20

    09)

    ELW

    fi lt

    er (

    fi rst

    tw

    o ro

    ws)

    . 2.5

    th, 5

    0th

    and

    97.5

    th p

    erce

    ntile

    s of

    the

    mar

    gina

    l di

    stri

    butio

    n of

    eac

    h pa

    ram

    eter

    bas

    ed o

    n N

    = 5

    000

    part

    icle

    s. T

    he d

    otte

    d lin

    es a

    re t

    he

    true

    val

    ues

    α 1 =

    −2.

    5, α

    2 =

    −1.0

    , β =

    0.5

    , τ2

    = 1.

    0, p

    11 =

    0.9

    9 an

    d p 2

    2 =

    0.98

    5

  • Particle Filters and Bayesian Inference 195

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    5−minute

    Day

    0 50 100 150 200 250 300 350

    −1

    01

    23

    4

    −1 0 1 2 3 4

    −1

    01

    23

    4

    5−minute

    10−

    min

    ute

    10−minute

    Day

    0 50 100 150 200 250 300 350

    −1

    01

    23

    4

    −1 0 1 2 3 4

    −1

    01

    23

    4

    5−minute

    20−

    min

    ute

    20−minute

    Day

    0 50 100 150 200 250 300 350

    −1

    01

    23

    4

    −1 0 1 2 3 4

    −1

    01

    23

    4

    10−minute

    20−

    min

    ute

    Figure 14. Realized volatility. Log realized volatility of Alcoa stock based on the sum of squares of intraday 5 min, 10 min and 20 min log returns measured as a percentage

  • 196 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Table I. Summary statistics. Correlations (below main diagonal) and covariances (main diagonal and above)

    RV Mean Median Skewness Kurtosis Correlations/covariances

    5 min 10 min 20 min

    5 min 0.992 0.977 1.091 5.479 0.314 0.270 0.25810 min 0.913 0.871 0.153 0.769 0.857 0.317 0.30720 min 0.850 0.847 0.034 0.843 0.732 0.865 0.396

    its 95% credibility interval is (0.032, 0.115). Under model M2, τ2 ~ IG(c0, d0) and Σ ~ IW(�0, S0)4. When p = 1, σ2 ~ IG(�0/2, S0/2), so we set �0 = 2a0 = 20 and S0 = 2b0Ip = 2.2Ip for comparison to the univariate models. The prior mean and mode of Σ are 0.138Ip and 0.092Ip, respectively. The parameter θ = (τ2, Σ) can be sampled from p(θ|st) = pIG(τ2; ct, dt)pIW (Σ; �t, St), where the recursive suffi cient statistics are ct = ct−1 + 1/2, dt = dt−1 + (xt − xt−1)2/2, �t = �t−1 + 1 and St = St−1 + (yt − 1p xt)(yt − 1p xt)′.

    State learningLet us start by assuming that (xt−1|yt−1, θ) ~ N(mt−1, Ct−1), with x0 ~ N(m0, C0), θ = (Σ, τ2) and sxt−1 = (mt−1, Ct−1). PL starts by resampling the particles {(θ, st−1, sxt−1)(i)}Ni=1 with weights p(yt|sxt−1, θ) = pN(yt; 1pmt−1, Qt), where Qt = RtDp + Σ, Dp = 1p1′p and Rt = Ct−1 + τ2. The state-suffi cient statistics sxt−1 are then propagated to st = (mt, Ct), where mt = (1 − At1p)mt−1 + Atyt, Ct = Rt −AtQtA′t and At = Rt1′pQt−1. Since both xt−1 and xt are used in dt when sampling τ2 from IG(ct, dt), then (xt−1, xt) need to be sampled from p(xt−1, xt|yt, θ) = pN(xt−1; g(yt), Vt)pN(xt; h(yt, xt−1), C

    –t), where Vt−1 = C−1t−1 + 1′pW−11p, g(yt)

    = Vt(C−1t−1mt−1 + 1′pW−1yt), C–

    t−1 = τ−2 + 1′pΣ−11p and h(yt, xt−1) = C

    –t(τ−2xt−1 + 1′pΣ−1yt), for W = τ2Dp + Σ.

    Finally, marginal posterior inference for xt given (yt, θ) is more effi cient if drawn from p(xt|yt, sxt−1, θ) = pN(xt; m̃t, C̃t), where C̃t−1 = Rt−1 + 1′pΣ−11p and m̃t = C̃t(Rt−1mt−1 + 1′pΣ−1yt).

    ResultsFigures 15–17 summarize the sequential learning of parameters and states for the univariate local level model based on N = 10,000 particles, which is fairly large considering the sample size T = 340. Figures 18–21 summarize the results for the multivariate local model, also based on N = 10,000 particles. Figure 22 compares the sequential posterior medians for the latent states from the three individual fi ts of model M1 against the multivariate fi t of model M2. Note the shrinkage effect of model M2, which provides a smoother point estimate for the latent state.

    CONCLUDING REMARKS

    In this paper we review particle fi lters, which are also known as sequential Monte Carlo (SMC) methods. We argue that, after almost two decades since the seminal paper of Gordon et al. (1993), SMC methods now belong in the toolbox of researchers and practitioners in many areas of modern

    4 Σ is inverse-Wishart with parameters �0 and S0 and density p S Sp

    Σ Σ Σ; , exp .νν

    0 0

    1

    2 01

    0

    0 5( ) ∝ − ( ){ }−+ +

    −tr , for �0 > p − 1, S0 > 0 (positive defi nite) and trΣ = σ 21 + . . . + σ 2p. The mean and the mode of Σ are S0/(�0 − p − 1) and S0/(�0 + p + 1), respectively. Its inverse Σ−1 is Wishart with the same parameters and denoted by Σ−1 ~ W(�0, S0). The mean and the mode of Σ−1 are �0S0−1 and (�0 − p − 1)S0−1(�0 ≥ p + 1), respectively.

  • Particle Filters and Bayesian Inference 197

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Figure 15. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(σ 2i |yti), p(τ 2i |y ti) and p(xit|y ti), where yi is i-minute log realized volatilities, for i = 5, 10, 20

  • 198 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    5−minute

    σσ2

    0.10 0.15 0.20 0.25 0.30 0.35 0.40

    05

    1015

    2025

    5−minute

    ττ2

    0.02 0.03 0.04 0.05 0.06 0.07 0.08

    020

    4060

    10−minute

    σσ2

    0.10 0.15 0.20 0.25 0.30 0.35 0.40

    05

    1015

    2025

    10−minute

    ττ2

    0.02 0.03 0.04 0.05 0.06 0.07 0.08

    020

    4060

    20−minute

    σσ2

    0.10 0.15 0.20 0.25 0.30 0.35 0.40

    05

    1015

    2025

    20−minute

    ττ2

    0.02 0.03 0.04 0.05 0.06 0.07 0.08

    020

    4060

    Figure 16. Realized volatility. Histogram approximations to p(σ 2i |yT) and p(τ 2i |yT), for i = 5, 10, 20

  • Particle Filters and Bayesian Inference 199

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    5−minute

    Time

    0 50 100 150 200 250 300

    01

    2

    10−minute

    Time

    0 50 100 150 200 250 300

    01

    2

    20−minute

    Time

    0 50 100 150 200 250 300

    01

    2

    Figure 17. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(xit|yti), where yi is i-minute log realized volatilities, for i = 5, 10, 20

  • 200 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Figure 18. Realized volatility. Sequential 2.5th, 50th and 97.5th percentiles for the unique components of Σ and for the correlations

    science, ranging from signal processing and target tracking to robotics, bioinformatics and fi nancial econometrics. This paper focuses on the latter, with fi ve demonstrations.

    Besides the references of PF in fi nancial econometrics cited above, some additional ones are Johannes et al. (2008) on predictive regressions and optimal portfolio allocation, Raggi and Bordignon (2006), Jasra et al. (2008), Li et al. (2008), Creal (2008) and Li (2009) on Lévy-type SV models, and Johannes et al. (2009) on extracting latent jump diffusions from asset prices. See also Fearnhead and Meligkotsidou (2004) and Fearnhead et al. (2008a) for particle fi lters in

  • Particle Filters and Bayesian Inference 201

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    (a)

    Time

    ττ2

    0 50 100 150 200 250 300 350

    0.02

    0.04

    0.06

    0.08

    0.10

    0.12

    0.14

    0.16

    (b)

    0.03 0.04 0.05 0.06 0.07

    020

    4060

    80

    (c)

    x t

    0 50 100 150 200 250 300 350

    01

    2

    Figure 19. Realized volatility. (a) Sequential 2.5th, 50th and 97.5th percentiles of p(τ2|yt). (b) Histogram approximation to p(τ2|yT). (c) Sequential 2.5th, 50th and 97.5th percentiles of p(xt |yt)

  • 202 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    0.02 0.03 0.04 0.05 0.06 0.07

    020

    4060

    8010

    0

    ττ2

    0.10 0.15 0.20 0.25 0.30 0.35 0.40

    05

    1015

    2025

    30

    σσ2

    5m

    10m

    20m

    Figure 20. Realized volatility. Approximated posterior distributions of τ 2i and σ 2i . Univariate local level models (grey lines) and multivariate local level model (black lines), for i = 5, 10, 20 (solid, dashed and dotted lines)

    partially observed continuous-time models and diffusions. PF for jump Markov systems are studied by Doucet et al. (2001b) and Andrieu et al. (2003).

    ACKNOWLEDGEMENTS

    We would like to express our sincere appreciation of the achievements of Professor Rudolf E. Kalman. His studies have led to many new developments in scientifi c computing, statistical infer-ence, and applications. Particle fi lters are one more example that will have a long-lasting impact in our profession.

  • Particle Filters and Bayesian Inference 203

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Time

    ρ 12

    0 50 100 150 200 250 300 350

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ρ12

    0.0 0.2 0.4 0.6 0.8 1.0

    05

    1015

    Time

    ρ 13

    ρ 23

    0 50 100 150 200 250 300 350

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ρ13

    0.0 0.2 0.4 0.6 0.8 1.0

    05

    1015

    Time

    0 50 100 150 200 250 300 350

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    ρ23

    0.0 0.2 0.4 0.6 0.8 1.0

    05

    1015

    Figure 21. Realized volatility. Sequential learning of correlations. Sequential 2.5th, 50th and 97.5th percentiles of p(ρij |yt) (left column). Histogram approximation to p(ρij |yT) (right column)

  • 204 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Time (first half)

    x t

    0.0

    0.5

    1.0

    1.5

    2.0

    1 43 85 127 170

    5−minute10−minute20−minutemultivariate

    Time (second half)

    x t

    0.0

    0.5

    1.0

    1.5

    171 213 255 297 340

    Figure 22. Realized volatility. Sequential 50th percentiles of p(xit |yt, M1) for i = 5, 10, 20 and p(xt |y t, M2), where M1 is the univariate local level model and M2 its multivariate version

  • Particle Filters and Bayesian Inference 205

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    REFERENCES

    Alspach DL, Sorenson HW. 1972. Nonlinear Bayesian estimation using Gaussian sum approximation. IEEE Transactions on Automatic Control 17: 439–448.

    Andrieu C, Doucet A. 2002. Particle fi ltering for partially observed Gaussian state space models. Journal of the Royal Statistical Society, Series B 64: 827–836.

    Andrieu C, Davy M, Doucet A. 2003. Effi cient particle fi ltering for jump Markov systems: application to time-varying autoregressions. IEEE Transactions on Signal Processing 51: 1762–1770.

    Andrieu C, Doucet A, Singh SS, Tadić VB. 2004. Particle methods for change detection, system identifi cation, and control. Proceedings of the IEEE 92: 423–438.

    Andrieu C, Doucet A, Tadić VB. 2005. On-line parameter estimation in general state-space models. In Proceed-ings of the 44th Conference on Decision and Control; 332–337.

    Andrieu C, Doucet A, Holenstein R. 2010. Particle Markov chain Monte Carlo (with discussion). Journal of the Royal Statistical Society, Series B 72: 269–342.

    Arulampalam MS, Maskell S, Gordon N, Clapp T. 2002. A tutorial on particle fi lters for on-line nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50: 174–188.

    Briers M, Doucet A, Maskell S. 2010. Smoothing algorithms for state-space models. Annals of the Institute Statistical Mathematics 62: 61–89.

    Cappé O, Godsill S, Moulines E. 2007. An overview of existing methods and recent advances in sequential Monte Carlo. IEEE Proceedings in Signal Processing 95: 899–924.

    Carlin BP, Polson NG, Stoffer DS. 1992. A Monte Carlo approach to nonnormal and nonlinear state-space model-ing. Journal of the American Statistical Association 87: 493–500.

    Carpenter J, Clifford P, Fearnhead P. 1999. An improved particle fi lter for non-linear problems. IEE Proceedings on Radar, Sonar and Navigation 146: 2–7.

    Carter CK, Kohn R. 1994. On Gibbs sampling for state space models. Biometrika 81: 541–553.Carvalho CM, Lopes HF. 2007. Simulation-based sequential analysis of Markov switching stochastic volatility

    models. Computational Statistics and Data Analysis 51: 4526–4542.Carvalho CM, Lopes HF, Polson N. 2009a. Particle learning for generalized dynamic conditionally linear models.

    Working paper, University of Chicago Booth School of Business.Carvalho CM, Lopes HF, Polson NG, Taddy M. 2009b. Particle learning for general mixtures. Working paper,

    University of Chicago Booth School of Business.Carvalho CM, Johannes M, Lopes HF, Polson N. 2010. Particle learning and smoothing. Statistical Science (to

    appear).Chen H, Petralia F, Lopes HF. 2009. Sequential Monte Carlo estimation of DSGE models. Working paper,

    University of Chicago Booth School of Business.Chen R, Liu JS. 2000. Mixture Kalman fi lter. Journal of the Royal Statistical Society, Series B 62: 493–

    508.Chen Y, Lai TL. 2007. Identifi cation and adaptive control of change-point ARX models via Rao–Blackwellized

    particle fi lters. IEEE Transactions on Automatic Control 52: 67–72.Chen Z. 2003. Bayesian fi ltering: from Kalman fi lters to particle fi lters, and beyond. Working paper, McMaster

    University, Canada.Chib S, Nardari F, Shephard N. 2002. Markov Chain Monte Carlo Methods for Stochastic Volatility Models.

    Journal of Econometrics 108: 281–316.Chib S, Nardari F, Shephard N. 2006. Analysis of High Dimensional Multivariate Stochastic Volatility Models.

    Journal of Econometrics 134: 341–371.Chopin N. 2002. A sequential particle fi lter method for static models. Biometrika 89: 539–552.Creal D. 2008. Analysis of fi ltering and smoothing algorithms for Lévy-driven stochastic volatility models.

    Computational Statistics and Data Analysis 52: 2863–2876.Da Silva CQ, Migon HS, Correira LT. 2009. Bayesian beta dynamic model and applications. Working paper,

    Department of Statistics, Federal University of Rio de Janeiro.DeJong DN, Dharmarajan H, Liesenfeld R, Moura GV, Richard J-F. 2009. Effi cient likelihood evaluation of state-

    space representations. Working paper, Department of Economics, University of Pittsburgh.Del Moral P, Doucet A, Jasra A. 2006. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society,

    Series B 68: 411–436.

  • 206 H. F. Lopes and R. S. Tsay

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Douc R, Garivier E, Moulines E, Olsson J. 2009a. On the forward fi ltering backward smoothing particle approximations of the smoothing distribution in general state space models. Working paper, Institut Télécom, Paris.

    Douc R, Moulines E, Olsson J. 2009b. Optimality of the auxiliary particle fi lter. Probability and Mathematical Statistics 29: 1–28.

    Doucet A, Johansen A. 2008. A note on auxiliary particle fi lters. Statistics and Probability Letters 78: 1498–1504.

    Doucet A, Johansen A. 2010. A tutorial on particle fi ltering and smoothing: fi fteen years later. In Handbook of Nonlinear Filtering, Crisan D, Rozovsky B (eds). Oxford University Press: Oxford.

    Doucet A, Tadić VB. 2003. Parameter estimation in general state-space models using particle methods. Annals of the Institute of Statistical Mathematics 55: 409–422.

    Doucet A, Godsill S, Andrieu C. 2000. On sequential Monte-Carlo sampling methods for Bayesian fi ltering. Statistics and Computing 10: 197–208.

    Doucet A, De Freitas N, Gordon N. 2001a. Sequential Monte Carlo Methods in Practice. Springer: New York.

    Doucet A, Gordon NJ, Krishnamurthy V. 2001b. Particle fi lters for state estimation of jump Markov linear systems. IEEE Transactions on Signal Processing 49: 613–624.

    Dukić V, Lopes HF, Polson NG. 2009. Tracking fl u epidemics using Google trends and particle learning. Working paper, University of Chicago Booth School of Business.

    Eraker B, Johannes MS, Polson NG. 2003. The impact of jumps in volatility and returns. Journal of Finance 58: 1269–1300.

    Fearnhead P. 2002. Markov chain Monte Carlo, suffi cient statistics and particle fi lter. Journal of Computational and Graphical Statistics 11: 848–862.

    Fearnhead P. 2004. Particle fi lters for mixture models with an unknown number of components. Statistics and Computing 14: 11–21.

    Fearnhead P, Clifford P. 2003. Online inference for hidden Markov models via particle fi lters. Journal of the Royal Statistical Society, Series B 65: 887–899.

    Fearnhead P, Meligkotsidou L. 2004. Exact fi ltering for partially-observed continuoustime models. Journal of the Royal Statistical Society, Series B 66: 771–789.

    Fearnhead P, Papaspiliopoulos O, Roberts GO. 2008a. Particle fi lters for partially observed diffusions. Journal of the Royal Statistical Society, Series B 70: 755–777.

    Fearnhead P, Wyncoll D, Tawn J. 2008b. A sequential smoothing algorithm with linear computational cost. Working paper, Department of Mathematics and Statistics, Lancaster University.

    Fernández-Villaverde J, Rubio-Ramírez JF. 2005. Estimating dynamic equilibrium economies: linear versus non-linear likelihood. Journal of Applied Econometrics 20: 891–910.

    Fernández-Villaverde J, Rubio-Ramírez JF. 2007. Estimating macroeconomic models: a likelihood approach. Review of Economic Studies 74: 1059–1087.

    Frühwirth-Schnatter S. 1994. Data augmentation and dynamic linear models. Journal of Time Series Analysis 15: 183–202.

    Gamerman D, Lopes HF. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. Chapman & Hall/CRC: Boca Raton, FL.

    Gilks WR, Berzuini C. 2001. Following a moving target: MonteCarlo inference for dynamic Bayesian models. Journal of the Royal Statistical Society, Series B 63: 127–146.

    Gordon N, Salmond D, Smith AFM. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F. Radar Signal Process 140: 107–113.

    Godsill SJ, Doucet A, West M. 2004. Monte Carlo smoothing for non-linear time series. Journal of the American Statistical Association 99: 156–168.

    Gramacy R, Polson NG. 2010. Particle learning of Gaussian process models for sequential design and optimiza-tion. Working paper, University of Chicago Booth School of Business.

    Guo D, Wang X, Chen R. 2005. New sequential Monte Carlo methods for nonlinear dynamic systems. Statistics and Computing 15: 135–147.

    Han C, Carlin BP. 2000. MCMC Methods for computing Bayes factors: a comparative review. Journal of the American Statistical Association 96: 1122–1132.

    Harvey AC. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press: Cambridge, CA.

  • Particle Filters and Bayesian Inference 207

    Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011) DOI: 10.1002/for

    Ito K, Xiong K. 2000. Gaussian fi lters for nonlinear fi ltering problems. IEEE Transactions on Automatic Control 45: 910–927.

    Jacquier E, Polson NG, Rossi PE. 1994. Bayesian analysis of stochastic volatility models. Journal of Business and Economic Statistics 20: 69–87.

    Jasra A, Stephens DA, Doucet A, Tsagaris T. 2008. Inference for Lévy-driven stochastic volatility models via adaptive s