Maximum Likelihood Parameter Estimation in State...

32
Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet Department of Statistics, Oxford University University College London 4 th October 2012 A. Doucet (UCL Masterclass Oct. 2012) 4 th October 2012 1 / 32

Transcript of Maximum Likelihood Parameter Estimation in State...

Page 1: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Maximum Likelihood Parameter Estimation inState-Space Models

Arnaud DoucetDepartment of Statistics, Oxford University

University College London

4th October 2012

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 1 / 32

Page 2: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

State-Space Models

Let {Xt}t≥1 be a latent/hidden X -valued Markov process with

X1 ∼ µ (·) and Xt | (Xt−1 = x) ∼ f ( ·| x) .

Let {Yt}t≥1 be an Y-valued Markov observation process such that

Yt | (Xt = x) ∼ g ( ·| x) .

Particle filters estimate {p (x1:t | y1:t )}t≥1 on-line but only estimatesof {p (xt | y1:t )}t≥1 and {p (y1:t )}t≥1 are reliable.Particle smoothing methods allow us to obtain reliable estimates of{p (xt | y1:T )}Tt=1 .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 2 / 32

Page 3: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

State-Space Models with Unknown Parameters

In most scenarios of interest, the state-space model contains anunknown static parameter θ ∈ Θ so that

X1 ∼ µθ (·) and Xt | (Xt−1 = x) ∼ fθ ( ·| xt−1) .

The observations {Yt}t≥1 are conditionally independent given{Xt}t≥1 and θ

Yt | (Xt = xt ) ∼ gθ ( ·| x) .Aim: We would like to infer θ either on-line or off-line.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 3 / 32

Page 4: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Examples

Stochastic Volatility model

Xt = φXt−1 + σVt , Vti.i.d.∼ N (0, 1)

Yt = β exp (Xt/2)Wt , Wti.i.d.∼ N (0, 1)

where θ =(φ, σ2, β

).

Biochemical Network model

Pr(X 1t+dt=x

1t+1,X

2t+dt=x

2t

∣∣ x1t , x2t ) = α x1t dt + o (dt) ,Pr(X 1t+dt=x

1t−1,X 2t+dt=x2t+1

∣∣ x1t , x2t ) = β x1t x2t dt + o (dt) ,

Pr(X 1t+dt=x

1t ,X

2t+dt=x

2t−1

∣∣ x1t , x2t ) = γ x2t dt + o (dt) ,

withYk = X

1k∆T +Wk with Wk

i.i.d.∼ N(0, σ2

)where θ = (α, β,γ) .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 4 / 32

Page 5: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Parameter Inference in State-Space Models

Online Bayesian parameter inference.

Offl ine Maximum Likelihood parameter inference.

Online Maximum Likelihood parameter inference.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 5 / 32

Page 6: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Bayesian Parameter Inference in State-Space Models

Set a prior p (θ) on θ so inference relies now on

p ( θ, x1:t | y1:t ) =p (θ, x1:t , y1:t )

p (y1:t )

wherep (θ, x1:t , y1:t ) = p (θ) pθ (x1:t , y1:t )

with

pθ (x1:t , y1:t ) = µθ (x1)t

∏k=2

fθ (xk | xk−1)t

∏k=1

gθ (yk | xk )

We havep ( θ, x1:t | y1:t ) = p ( θ| y1:t ) pθ (x1:t | y1:t )

Standard and more sophisticated particle methods to sample from{p ( θ, x1:t | y1:t )}t≥1 are ALL unreliabe.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 6 / 32

Page 7: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Bayesian Parameter Inference

At time t = 1(θ(i )1 ,X

(i )1

)∼ p (θ) µθ (x1) then

p ( θ, x1| y1) =N

∑i=1W (i )1 δ(

θ(i )1 ,X

(i )1

) (θ, x1) , W (i )1 ∝ g

θ(i )1

(y1|X

(i )1

).

(θ(i )1 ,X

(i )1

)∼ p ( θ, x1| y1)

and p̂ ( θ, x1| y1) = 1N ∑N

i=1 δθ(i ),X (i )1

(θ, x1) .

At time t ≥ 2

Set θ(i )t = θ

(i )t−1, X

(i )t ∼ fθ(i )t

(xt |X (i )t−1

)and X

(i )1:t =

(X (i )1:t−1,X

(i )t

)p ( θ, x1:t | y1) =

N

∑i=1W (i )t δ(

θ(i )t ,X

(i )1:t

) (θ, x1:t ) , W(i )t ∝ g

θ(i )t

(yt |X

(i )t

).

(θ(i )t ,X

(i )1:t

)∼ p ( θ, x1:t | y1:t ) and

p̂ ( θ, x1:t | y1:t ) =1N ∑N

i=1 δθ(i )t ,X

(i )1:t(θ, x1:t ) .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 7 / 32

Page 8: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Bayesian Parameter Inference

Provide consistent estimates but remarkably ineffi cient (Chopin,

2002). Particles{

θ(i )1

}in Θ space only sampled at time 1:

degeneracy problem!Consider the extended state Zt = (Xt , θt ) then

ν (z1) = p (θ1) µθ1(x1) ,

f (zt | zt−1) = δθt−1 (θt ) fθt (xt | xt−1) , g (yt | zt ) = gθt (yt | xt ) ;i.e. θt = θ1 for any t with θ1 from the prior. Exponential stabilityassumption on {p (zt | y1:t )}t≥1 cannot be satisfied.Use MCMC steps on θ so as to jitter

{θ(i )t

}; e.g. Andrieu, De Freitas

& D. (1999); Fearnhead (2002); Gilks & Berzuini (2001); Carvalho etal. (2010).When p ( θ| y1:t , x1:t ) = p ( θ| st (x1:t , y1:t )) where st (x1:t , y1:t ) is afixed-dimensional vector, “elegant”but still implicitly relies onp̂ (x1:t | y1:t ) so degeneracy will creep in.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 8 / 32

Page 9: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Bayesian Parameter Inference

At time t − 1, we have

p̂ ( θ, x1:t−1| y1:t−1) =1N

N

∑i=1

δ(θ(i )t−1,X

(i )1:t−1

) (θ, x1:t−1) ,

Set θ(i )t = θ

(i )t−1, sample X

(i )t ∼ fθ(i )t

(·|X (i )t−1

), set

X(i )1:t =

(X (i )1:t−1,X

(i )t

)and

p ( θ, x1:t | y1:t ) = ∑Ni=1W

(i )t δ(

θ(i )t ,X

(i )1:t

) (θ, x1:t ) ,

W (i )t ∝ g

θ(i )t

(yt |X

(i )t

).

Resample(

θ′(i )t ,X (i )1:t

)∼ p ( θ, x1:t | y1:t ) then sample

θ(i )t ∼ p

(θ| y1:t ,X

(i )1:t

)to obtain

p̂ ( θ, x1:t | y1:t ) =1N ∑N

i=1 δ(θ(i )t ,X

(i )1:t

) (θ, x1:t ).

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 9 / 32

Page 10: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

A Toy Example

Linear Gaussian state-space model

Xt = θXt−1 + σVVt , Vti.i.d.∼ N (0, 1)

Yt = Xt + σWWt , Wti.i.d.∼ N (0, 1) .

We set p (θ) ∝ 1(−1,1) (θ) so

p ( θ| y1:t , x1:t ) ∝ N(θ;mt , σ2t

)1(−1,1) (θ)

whereσ2t = S

−12,t , mt = S

−12,t S1,t

with

S1,t =t

∑k=2

xk−1xk , S2,t =t

∑k=2

x2k−1

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 10 / 32

Page 11: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Illustration of the Degeneracy Problem

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

SMC estimate of E [ θ| y1:t ], as t increases the degeneracy creeps in.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 11 / 32

Page 12: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Another Toy Example

Linear Gaussian state-space model

Xt = ρXt−1 + Vt , Vti.i.d.∼ N (0, 1)

Yt = Xt + σWt , Wti.i.d.∼ N (0, 1) .

We set ρ ∼ U(−1,1) and σ2 ∼ IG (1, 1).We use particle filter with perfect adaptation and Gibbs moves withN = 10000; particle learning (Andrieu, D. & De Freitas, 1999;Carvalho et al., 2010)

50 runs of the particle method vs ground truth obtained usingKalman filter on states and grid on parameters.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 12 / 32

Page 13: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Another Illustration of Degeneracy for Particle Learning

0.8 0.9 1 1.1 1.200.010.020.030.04

pdf,

 n=1

000

0.8 0.9 1 1.1 1.200.010.020.030.04

pdf,

 n=2

000

0.8 0.9 1 1.1 1.200.010.020.030.04

pdf,

 n=3

000

0.8 0.9 1 1.1 1.200.010.020.030.04

pdf,

 n=4

000

0.8 0.9 1 1.1 1.200.010.020.030.04

σy2

pdf,

 n=5

000

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.4 0.5 0.6 0.7 0.8 0.90

0.02

0.04

0.06

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

0.4 0.5 0.6 0.7 0.8 0.90

0.05

0.1

ρ

Figure: Estimates of p (ρ| y1:t ) and p(

σ2∣∣ y1:t

)over 50 runs (red) vs ground

truth (blue) for t = 103, 2.103, ..., 5.103 for N = 104 (Kantas et al., 2012)

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 13 / 32

Page 14: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Stepping Back

For fixed θ, V [p̂θ (y1:t ) /pθ (y1:t )] is in O (t/N).In a Bayesian context, p ( θ| y1:t ) ∝ pθ (y1:t ) p (θ) so we implicitlyneed to compute pθ (y1:t ) at each particle location θ(i ).

It appears impossible to obtain uniformly in time stable estimates of{p ( θ| y1:t )}t≥1 for a fixed N.However for a given time horizon T , we can use PF to sampleeffi cently from p ( θ| y1:T ); see Lecture 3.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 14 / 32

Page 15: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Likelihood Function Estimation

Let y1:T being given, the log-(marginal) likelihood is given by

`(θ) = log pθ (y1:T ) .

For any θ ∈ Θ, one can estimate `(θ) using particle methods,variance O (T/N) .

Direct maximization of `(θ) diffi cult as estimate ̂̀(θ) is not a smoothfunction of θ even for fixed random seed.

For dim (Xt ) = 1, we can obtain smooth estimate of log-likelihoodfunction by using a smoothed resampling step (e.g. Pitt, 2011); i.e.piecewise linear approximation of Pr (Xt < x | y1:t ) .

For dim (Xt ) > 1, we can obtain estimates of `(θ) highly positivelycorrelated for neigbouring values in Θ (e.g. Lee, 2008).

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 15 / 32

Page 16: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Gradient Ascent

To maximise `(θ) w.r.t θ, use at iteration k + 1

θk+1 = θk + γk ∇`(θ)|θ=θk

where ∇`(θ)|θ=θkis the so-called score vector.

∇`(θ)|θ=θkcan be estimated using finite differences but more

effi ciently using Fisher’s identity

∇`(θ) =∫∇ log pθ (x1:T , y1:T ) pθ (x1:T | y1:T ) dx1:T

where

∇ log pθ (x1:T , y1:T ) = ∇ log µθ (x1)+∑T

t=2∇ log fθ (xt | xt−1) +∑Tt=1∇ log gθ (yt | xt ) .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 16 / 32

Page 17: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Particle Calculation of the Score Vector

We have

∇`(θ) =∫{∇ log µθ (x1) +∇ log gθ (y1| x1)} pθ (x1| y1:T ) dx1

+T

∑t=2

∫{∇ log fθ (xt | xt−1) +∇ log gθ (yt | xt )} pθ (xt−1, xt | y1:T ) dxt−1dxt

To approximate ∇`(θ), we just need particle approximations of{pθ (xt−1, xt | y1:T )}Tt=2 .All the particle smoothing methods detailed before can be applied.

Similar “smoothed additive functionals”have to be computed whenimplementing the Expectation-Maximization.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 17 / 32

Page 18: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Comparison Direct Method vs FB

We want to estimateϕT = ∑T

t=1

∫ϕ (xt−1, xt , yt ) p (xt−1, xt | y1:T ) dxt−1dxt .

Method Direct FB# particles N Ncost O (TN) O

(TN2

),O (TN)

Var. O(T 2/N

)O (T/N)

Bias O (T/N) O (T/N)MSE=Bias2+Var O

(T 2/N

)O(T 2/N2

)“Fast” implementations FB of computational complexity O (NT )outperform direct approach as MSE is O

(T 2/N2

)whereas it is

O(T 2/N

)for direct SMC.

“Naive” implementations FB and TF have MSE of same order asdirect method for fixed computational complexity but MSE is biasdominated for FB/TF whereas it is variance dominated for DirectSMC.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 18 / 32

Page 19: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Experimental Results

Consider a linear Gaussian model

Xt = φXt−1 + σvVt , Vti.i.d.∼ N (0, 1)

Yt = cXt + σwWt , Wti.i.d.∼ N (0, 1) .

We simulate 10,000 observations and compute particle estimates of∫ϕT (x1:T ) p (x1:T | y1:T ) dx1:T

for 4 different additive functionalsϕt (x1:t ) = ϕt−1 (x1:t−1) + ϕ (xt−1, xt , yt ) includingϕ1 (xt−1, xt , yt ) = xt−1xt , ϕ2 (xt−1, xt , yt ) = x2t . [Ground truth canbe computed using Kalman smoother.]

We use 100 replications on the same dataset to estimate the empiricalvariance.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 19 / 32

Page 20: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Boxplots of Direct vs FB Estimates

2500 5000 7500 10000

­500

0

500

Sco

re

Algorithm 1

2500 5000 7500 10000

­500

0

500

Algorithm 2

2500 5000 7500 10000­200

0

200

Sco

re

Time steps

Algorithm 1

2500 5000 7500 10000­200

0

200

Time steps

Algorithm 2

score estimates for parameter σv

score estimates for parameter φ

Direct (left) vs FB (right)

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 20 / 32

Page 21: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Empirical Variance for Direct vs FB Estimates

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

2

4

6

x  1 04

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t.σv

Va

ria

nc

e

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

2 0 0 0

4 0 0 0

6 0 0 0

8 0 0 0

1 0 0 0 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t.φ

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t.σw

Ti m e  s te p s

Va

ria

nc

e

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

5 0 0 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t. c

Ti m e  s te p s

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

2 0

4 0

6 0

8 0

1 0 0

1 2 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t.σv

Var

ianc

e

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

2 0

4 0

6 0

V a r ia n c e  o f  s c o r e  e s tim a te  w .r .t.φ

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

5

1 0

1 5

2 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t.σw

Ti m e  s te p s

Var

ianc

e

0 2 5 0 0 5 0 0 0 7 5 0 0 1 0 0 0 00

1 0

2 0

3 0

4 0

V a r ia n c e  o f s c o r e  e s tim a te  w .r .t. c

Ti m e  s te p s

Direct (left) vs FB (right); the vertical scale is different

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 21 / 32

Page 22: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online ML Parameter Inference

Recursive maximum likelihood (Titterington, 1984; LeGland & Mevel,1997) proceeds as follows

θt+1 = θt + γt ∇ log pθ1:t (yt | y1:t−1)

where pθ1:t (yt | y1:t−1) is computed using θk at time k and∑t γt = ∞, ∑t γ2t < ∞. Under regularity conditions, this convergestowards a local maximum of the (average) log-likelihood.Note that

∇ log pθ1:t (yt | y1:t−1) = ∇ log pθ1:t (y1:t )−∇ log pθ1:t−1 (y1:t−1)

is given by the difference of two pseudo-score vectors where

∇ log pθ1:t (y1:t ) :=∫ (

∑tk=2 ∇ log fθ (xk | xk−1)|θk

+ ∇ log gθ (yk | xk )|θk)pθ1:t (x1:t | y1:t ) dx1:t .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 22 / 32

Page 23: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online ML Particle Parameter Inference

Particle approximation follows

θt+1 = θt + γt ∇̂ log pθ1:t (yt | y1:t−1)

where

∇̂ log pθ1:t (yt | y1:t−1) = ∇̂ log pθ1:t (y1:t )− ∇̂ log pθ1:t−1 (y1:t−1)

is given by the difference of particle estimates of pseudo-score vectors(Poyadjis, D. & Singh, 2011).

Asymptotic variance of ∇̂ log pθ1:t (yt | y1:t−1) is uniformly boundedin O (1/N) for FB estimate whereas it is O (t/N) for direct particlemethod (Del Moral, D. & Singh, 2011). Bias is O (1/N) in bothcases.Major Problem: If we use FB, this is not an online algorithmanymore as it requires a backward pass of order O (t) to approximate∇ log pθ1:t (y1:t ) ...

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 23 / 32

Page 24: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Variance of the Gradient Estimate for Direct vs FB

1 5000 10000 15000 200000

20

40

60

80

100

120

140

Figure: Empirical variance of the gradient estimate for standard versus FBapproximations (SV model)

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 24 / 32

Page 25: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Particle ML Inference using Direct Approach

0 500 1000 1500 20000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

x10 3

Figure: N = 10, 000 particles, online parameter estimates for SV model.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 25 / 32

Page 26: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Particle ML Inference using FB

0 500 1000 1500 20000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

x10 3

Figure: N = 50 particles, online parameter estimates for SV model.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 26 / 32

Page 27: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Forward only Smoothing

Dynamic programming allows us to compute in a single forward passthe FB estimates of

ϕθt =

∫t

ϕt (x1:t ) pθ (x1:t | y1:t ) dx1:t

where

ϕt (x1:t ) =t

∑k=1

ϕ (xk−1, xk , yk )

Forward Backward (FB) decomposition states

pθ (x1:T | y1:T ) = pθ (xT | y1:T )T−1∏t=1

pθ (xt | y1:t , xt+1)

where pθ (xt | y1:t , xt+1) =fθ( xt+1 |xt )pθ( xt |y1:t )

pθ( xt+1 |y1:t ).

Conditioned upon y1:T , {Xt}Tt=1 is a backward Markov chain of initialdistribution p (xT | y1:T ) and inhomogeneous Markov transitions{pθ (xt | y1:t , xt+1)}T−1t=1 independent of T .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 27 / 32

Page 28: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Forward only Smoothing

We have

ϕθt =

∫ϕt (x1:t ) pθ (x1:t−1| y1:t−1, xt ) dx1:t−1

=∫

∫ϕt (x1:t ) pθ (x1:t−1| y1:t−1, xt ) dx1:t−1︸ ︷︷ ︸

V θt (xt )

pθ (xt | y1:t ) dxt

Forward smoothing recursion

V θt (xt ) =

∫ [V θt−1 (xt−1) + ϕ (xt−1:t , yt )

]pθ (xt−1| y1:t−1, xt ) dxt−1

Appears implicitly in Elliott, Aggoun & Moore (1996), Ford (1998)and rediscovered a few times... Presentation follows here (Del Moral,D. & Singh, 2009).

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 28 / 32

Page 29: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Forward only Smoothing

Forward smoothing recursion

V θt (xt ) =

∫ [V θt−1 (xt−1) + ϕ (xt−1:t , yt )

]pθ (xt−1| y1:t−1, xt ) dxt−1

Proof is trivial

V θt (xt ) =

∫ϕt (x1:t ) pθ (x1:t−1| y1:t−1, xt ) dx1:t−1

=∫ [

ϕt−1 (x1:t−1) + ϕ (xt−1:t , yt )]pθ (x1:t−2| y1:t−2, xt−1)×pθ (xt−1| y1:t−1, xt ) dx1:t−1

=∫{∫

ϕt−1 (x1:t−1) pθ (x1:t−2| y1:t−2, xt−1) dx1:t−2︸ ︷︷ ︸V θt−1(xt−1)

+ϕ (xt−1:t , yt )} pθ (xt−1| y1:t−1, xt ) dxt−1

Exact implementation possible for finite state-space and linearGaussian models.

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 29 / 32

Page 30: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Particle Forward only Smoothing

At time t − 1, we have p̂θ (xt−1| y1:t−1) =1N ∑N

i=1 δX (i )t−1

(xt−1) and{V̂ θt−1

(X (i )t−1

)}1≤i≤N

.

At time t, compute p̂θ (xt | y1:t ) = ∑Ni=1W

(i )t δ

X (i )t(xt ) and set

V̂ θt

(X (i )t

)=∫ {

V̂ θt−1 (xt−1) + ϕ (xt−1, xt , yt )

}p̂θ

(xt−1| y1:t−1,X

(i )t

)dxt−1

=∑Nj=1 fθ

(X (i )t |X

(j)t−1

)[V̂ θt−1

(X (j)t−1

)+ϕ(X (j)t−1,X

(i )t ,yt

)]∑Nj=1 fθ

(X (i )t |X

(j)t−1

) ,

ϕ̂θt =

1N ∑N

i=1 V̂θt

(X (i )t

).

This estimate is exactly the same as the Particle FB estimate,computational complexity O

(N2).

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 30 / 32

Page 31: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Online Particle ML Inference

At time t − 1, we have p̂θ1:t−1 (xt−1| y1:t−1),{V̂ θ1:t−1t−1

(X (i )t−1

)},

∇̂ log pθ1:t−1 (y1:t−1) =∫V̂ θ1:t−1t−1 (xt−1) p̂θ1:t−1 (xt−1| y1:t−1) dxt−1

and get θt .

At time t, use your favourite PF to compute p̂θ1:t (xt | y1:t ) and

V̂ θ1:tt

(X (i )t

)=∫ {

V̂ θ1:t−1t−1 (xt−1) + ϕ (xt−1, xt , yt )

}×p̂θ1:t

(xt−1| y1:t−1,X

(i )t

)dxt−1,

ϕ (xt−1:t , yt ) = ∇ log fθ (xt | xt−1)|θt + ∇ log gθ (yt | xt )|θtand

∇̂ log pθ1:t (y1:t ) =∫V̂ θ1:tt (xt ) p̂θ1:t (xt | y1:t ) dxt

Parameter update

θt+1 = θt + γt

(∇̂ log pθ1:t (y1:t )− ∇̂ log pθ1:t−1 (y1:t−1)

)A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 31 / 32

Page 32: Maximum Likelihood Parameter Estimation in State …events.csml.ucl.ac.uk/userdata/masterclasses/doucet/...Maximum Likelihood Parameter Estimation in State-Space Models Arnaud Doucet

Summary

Online Bayesian parameter inference using particle methods is yet anunsolved problem.

Particle smoothing techniques can be used to perform off-line andon-line ML parameter estimation.

Observed information matrix can also be evaluated online in a stablemanner.

For online inference, computational complexity is O(N2)at each

time step and requires evaluating fθ (xt | xt−1) .

A. Doucet (UCL Masterclass Oct. 2012) 4th October 2012 32 / 32