Download - Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Transcript
Page 1: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational Bayesian inference for stochastic processes

Manfred Opper, AI group, TU Berlin

October 13, 2017

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 1 / 28

Page 2: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outline

Probabilistic inference (”inverse problem”)

Why it is not trivial ...

Variational Approximation

Path inference for stochastic differential equations

Drift estimation

Outlook

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 2 / 28

Page 3: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Probabilistic inference

Observations y ≡ (y1, . . . , yK ) (”data”)

Latent, unobserved variables x ≡ (x1, . . . , xN)

Likelihood p(y |x) forward model

Prior distribution p(x)

Inverse problem: Make predictions on x given observations usingBayes rule:

p(x |y) =p(y |x)p(x)

p(y)

Easy ?

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 3 / 28

Page 4: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Probabilistic inference

Observations y ≡ (y1, . . . , yK ) (”data”)

Latent, unobserved variables x ≡ (x1, . . . , xN)

Likelihood p(y |x) forward model

Prior distribution p(x)

Inverse problem: Make predictions on x given observations usingBayes rule:

p(x |y) =p(y |x)p(x)

p(y)

Easy ?

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 3 / 28

Page 5: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Probabilistic inference

Observations y ≡ (y1, . . . , yK ) (”data”)

Latent, unobserved variables x ≡ (x1, . . . , xN)

Likelihood p(y |x) forward model

Prior distribution p(x)

Inverse problem: Make predictions on x given observations usingBayes rule:

p(x |y) =p(y |x)p(x)

p(y)

Easy ?

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 3 / 28

Page 6: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Not quite ...

Often easy to write down the posterior of all hidden variables

p(x1, . . . , xN |data) =p(data|x1, . . . , xN)p(x1, . . . , xN)

p(data)

But what we really need are marginal distributions eg.

p(xi |data) =∫dx1 . . . dxi−1dxi+1 . . . dxN

p(data|x1, . . . , xN)p(x1, . . . , xN)

p(data)

and

p(data) =

∫dx1 . . . . . . dxN p(data|x1, . . . , xN)p(x1, . . . , xN)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 4 / 28

Page 7: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Not quite ...

Often easy to write down the posterior of all hidden variables

p(x1, . . . , xN |data) =p(data|x1, . . . , xN)p(x1, . . . , xN)

p(data)

But what we really need are marginal distributions eg.

p(xi |data) =∫dx1 . . . dxi−1dxi+1 . . . dxN

p(data|x1, . . . , xN)p(x1, . . . , xN)

p(data)

and

p(data) =

∫dx1 . . . . . . dxN p(data|x1, . . . , xN)p(x1, . . . , xN)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 4 / 28

Page 8: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational approximation

Approximate intractable posterior

p(x |y) =p(y |x)p(x)

p(y)

by q(x) which belongs to a family of simpler tractable distributions.

Optimise q by minimising the Kullback–Leibler divergence (relativeentropy)

DKL[q‖p(·|y)].

= Eq

[ln

q(x)

p(x |y)

]=

DKL[q‖p]− Eq[ln p(y |x)] + ln p(y)

Minimize the variational free energy

F [q] = DKL[q‖p]− Eq[ln p(y |x)] ≥ − ln p(y)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 5 / 28

Page 9: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational approximation

Approximate intractable posterior

p(x |y) =p(y |x)p(x)

p(y)

by q(x) which belongs to a family of simpler tractable distributions.

Optimise q by minimising the Kullback–Leibler divergence (relativeentropy)

DKL[q‖p(·|y)].

= Eq

[ln

q(x)

p(x |y)

]=

DKL[q‖p]− Eq[ln p(y |x)] + ln p(y)

Minimize the variational free energy

F [q] = DKL[q‖p]− Eq[ln p(y |x)] ≥ − ln p(y)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 5 / 28

Page 10: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational approximation

Approximate intractable posterior

p(x |y) =p(y |x)p(x)

p(y)

by q(x) which belongs to a family of simpler tractable distributions.

Optimise q by minimising the Kullback–Leibler divergence (relativeentropy)

DKL[q‖p(·|y)].

= Eq

[ln

q(x)

p(x |y)

]=

DKL[q‖p]− Eq[ln p(y |x)] + ln p(y)

Minimize the variational free energy

F [q] = DKL[q‖p]− Eq[ln p(y |x)] ≥ − ln p(y)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 5 / 28

Page 11: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

The variational approximation in statistical physics

(Feynman, Peierls, Bogolubov, Kleinert...)

Let p(x |y) = 1Z e−H(x) and q(x) = 1

Z0e−H0(x)

The variational bound on the free energy is

− lnZ ≤ − lnZ0 + Eq[H(x)]− Eq[H0(x)] = F [q]

Equivalent to first order perturbation theory around H0

Well known approximations: Gaussian, factorising (”mean field”).

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 6 / 28

Page 12: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Example: Fimite dim Gaussian variational densities

q(x) = (2π)−N/2|ΣΣΣ|−1/2 exp

(−1

2(x−µµµ)TΣΣΣ−1(x−µµµ)

).

The variational free energy becomes

F [q] = −N

2log 2π − 1

2log |ΣΣΣ| − N

2− Eq[log p(y, x)]

Taking derivatives w.r.t. variational parameters

0 = Eq [∇x log p(y, x)]

(ΣΣΣ−1)ij = −Eq

[∂2 log p(y, x)

∂xi∂xj

]

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 7 / 28

Page 13: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Example: Fimite dim Gaussian variational densities

q(x) = (2π)−N/2|ΣΣΣ|−1/2 exp

(−1

2(x−µµµ)TΣΣΣ−1(x−µµµ)

).

The variational free energy becomes

F [q] = −N

2log 2π − 1

2log |ΣΣΣ| − N

2− Eq[log p(y, x)]

Taking derivatives w.r.t. variational parameters

0 = Eq [∇x log p(y, x)]

(ΣΣΣ−1)ij = −Eq

[∂2 log p(y, x)

∂xi∂xj

]

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 7 / 28

Page 14: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 8 / 28

Page 15: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Stochastic differential equation

dX

dt= fθ(X ) + ’white noise’

E.g. fθ(x) = −dVθ(x)dx

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 9 / 28

Page 16: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Prior process: Stochastic differential equations (SDE)

Mathematicians prefer Ito version

dXt = f (Xt)︸ ︷︷ ︸Drift

dt + D1/2(Xt)︸ ︷︷ ︸Diffusion

× dWt︸︷︷︸Wiener process

for Xt ∈ Rd

Limit of discrete time process Xk

Xk+1 − Xk = f (Xk)∆t + D1/2(Xk)√

∆t εk .

εk i.i.d. Gaussian.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 10 / 28

Page 17: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

42.0 42.5 43.0 43.5 44.0

0.4

0.8

1.2

t

X(t

)

● ●

●●

Path with observations.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 11 / 28

Page 18: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Inference of unobserved path.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 12 / 28

Page 19: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

What we would like to do

State estimation (smoothing:) p[Xt |{yi}Ni=1, θ]

Use Bayes rule for conditional distribution over paths X0:T

(∞ dimensional object)

p(X0:T |{yi}Ni=1, θ) = pprior (X0:T |θ)︸ ︷︷ ︸dynamics

N∏n=1

p(yn|Xtn)︸ ︷︷ ︸observation model

/p({yi}Ni=1|θ)

Parameter estimation:1 Maximum Likelihood: Maximise p({yi}Ni=1|θ) with respect to θ2 Bayes: Use prior over parameters p(θ) to compute

p(θ|{yi}Ni=1) ∝ p({yi}Ni=1|θ)p(θ)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 13 / 28

Page 20: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

What we would like to do

State estimation (smoothing:) p[Xt |{yi}Ni=1, θ]

Use Bayes rule for conditional distribution over paths X0:T

(∞ dimensional object)

p(X0:T |{yi}Ni=1, θ) = pprior (X0:T |θ)︸ ︷︷ ︸dynamics

N∏n=1

p(yn|Xtn)︸ ︷︷ ︸observation model

/p({yi}Ni=1|θ)

Parameter estimation:1 Maximum Likelihood: Maximise p({yi}Ni=1|θ) with respect to θ2 Bayes: Use prior over parameters p(θ) to compute

p(θ|{yi}Ni=1) ∝ p({yi}Ni=1|θ)p(θ)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 13 / 28

Page 21: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

What we would like to do

State estimation (smoothing:) p[Xt |{yi}Ni=1, θ]

Use Bayes rule for conditional distribution over paths X0:T

(∞ dimensional object)

p(X0:T |{yi}Ni=1, θ) = pprior (X0:T |θ)︸ ︷︷ ︸dynamics

N∏n=1

p(yn|Xtn)︸ ︷︷ ︸observation model

/p({yi}Ni=1|θ)

Parameter estimation:1 Maximum Likelihood: Maximise p({yi}Ni=1|θ) with respect to θ2 Bayes: Use prior over parameters p(θ) to compute

p(θ|{yi}Ni=1) ∝ p({yi}Ni=1|θ)p(θ)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 13 / 28

Page 22: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Example: Process conditioned on endpoint

Wiener process with single, noise free observation y = XT = 0Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 14 / 28

Page 23: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

How to represent path measure ?

Conditioned process is also Markovian!

It fulfils SDEdXt = g(Xt , t)dt + D1/2(Xt) dWt

with a new time dependent drift g(Xt , t) but the same diffusion D.

Previous example: g(x , t) = − xT−t for 0 < t < T .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 15 / 28

Page 24: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

How to represent path measure ?

Conditioned process is also Markovian!

It fulfils SDEdXt = g(Xt , t)dt + D1/2(Xt) dWt

with a new time dependent drift g(Xt , t) but the same diffusion D.

Previous example: g(x , t) = − xT−t for 0 < t < T .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 15 / 28

Page 25: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

How to represent path measure ?

Conditioned process is also Markovian!

It fulfils SDEdXt = g(Xt , t)dt + D1/2(Xt) dWt

with a new time dependent drift g(Xt , t) but the same diffusion D.

Previous example: g(x , t) = − xT−t for 0 < t < T .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 15 / 28

Page 26: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Change of measure theorem and KL divergence for pathprobabilities

Girsanov theorem

dQ

dP(X0:T ) = exp

{−∫ T

0(f − g)>D−1/2 dBt +

1

2

∫ T

0‖f − g‖2

D dt

}Bt : Wiener process with respect to Q and‖f − g‖D = f >(x , t)D−1g(x , t)

Let Q and P be measures over paths for SDEs with drifts g(X , t) andf (X , t) having the same diffusion D(X ). Then

D [Q‖P] = EQ lndQ

dP=

1

2

∫ T

0dt

{∫dx q(x , t) ‖g(x , t)− fθ(x)‖2

}q(x , t) is the marginal density of Xt .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 16 / 28

Page 27: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Change of measure theorem and KL divergence for pathprobabilities

Girsanov theorem

dQ

dP(X0:T ) = exp

{−∫ T

0(f − g)>D−1/2 dBt +

1

2

∫ T

0‖f − g‖2

D dt

}Bt : Wiener process with respect to Q and‖f − g‖D = f >(x , t)D−1g(x , t)

Let Q and P be measures over paths for SDEs with drifts g(X , t) andf (X , t) having the same diffusion D(X ). Then

D [Q‖P] = EQ lndQ

dP=

1

2

∫ T

0dt

{∫dx q(x , t) ‖g(x , t)− fθ(x)‖2

}q(x , t) is the marginal density of Xt .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 16 / 28

Page 28: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

The (full) variational problem

Minimise variational free energy F(Q) =

=1

2

∫ T

0

∫q(x , t)

{‖g(x , t)− fθ(x)‖2 −

∑i

δ(t − ti ) ln p(yi |x)

}dx dt

with respect to the posterior drift g(x , t).

The marginal density q(x , t) and the drift g(x , t) are coupled throughthe Fokker - Planck equation

∂q(x , t)

∂t=

{−∑k

∂kgk(x) +1

2

∑kl

∂k∂lDkl(x)

}q(x , t)

Variation leads to forward–backward PDEs: KSP equations (Kushner’62, Stratonovich ’60 & Pardoux ’82).

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 17 / 28

Page 29: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

The (full) variational problem

Minimise variational free energy F(Q) =

=1

2

∫ T

0

∫q(x , t)

{‖g(x , t)− fθ(x)‖2 −

∑i

δ(t − ti ) ln p(yi |x)

}dx dt

with respect to the posterior drift g(x , t).

The marginal density q(x , t) and the drift g(x , t) are coupled throughthe Fokker - Planck equation

∂q(x , t)

∂t=

{−∑k

∂kgk(x) +1

2

∑kl

∂k∂lDkl(x)

}q(x , t)

Variation leads to forward–backward PDEs: KSP equations (Kushner’62, Stratonovich ’60 & Pardoux ’82).

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 17 / 28

Page 30: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

The Variational Gaussian Approximation for SDE

(Archambeau, Cornford, Opper & Shawe - Taylor, 2007)

Approximate (Gaussian) process over paths X0:T induced by linearSDE:

dXt = {A(t)Xt + b(t)} dt + D1/2dW

Diffusion D must be independent of X !

Cost function (action) is of the form Fθ[m, S ,A, b].

Constraints are evolution eqs. for marginal mean m(t) andcovariance S(t)

dm

dt= Am + b

dS

dt= AS + SA> + D.

→ nonlinear ODEs instead of PDEs !

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 18 / 28

Page 31: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

The Variational Gaussian Approximation for SDE

(Archambeau, Cornford, Opper & Shawe - Taylor, 2007)

Approximate (Gaussian) process over paths X0:T induced by linearSDE:

dXt = {A(t)Xt + b(t)} dt + D1/2dW

Diffusion D must be independent of X !

Cost function (action) is of the form Fθ[m, S ,A, b].

Constraints are evolution eqs. for marginal mean m(t) andcovariance S(t)

dm

dt= Am + b

dS

dt= AS + SA> + D.

→ nonlinear ODEs instead of PDEs !

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 18 / 28

Page 32: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Prediction & comparison with hybrid Monte Carlo

dX = X (θ − X 2)dt + σdW .T = 20, θ = 1, σ2 = 0.8 and N = 40 observations with noise σ2

o = 0.04.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 19 / 28

Page 33: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Posterior for θ

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 20 / 28

Page 34: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Breakdown for large observation noise

Double well with observation noise σo = 0.6Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 21 / 28

Page 35: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational inference for higher dimensions:Mean field approximation

Action functional (Vrettas, Opper & Cornford, 2015) for mean mi (t) andvariance si (t) (compare to Onsager–Machlup)

Fθ[q] =d∑

i=1

1

2σ2i

∫ T

0Eq

[(mi − fi (Xt))2

]dt

+d∑

i=1

1

2σ2i

∫ T

0

{(si − σ2

i )2

4s2i

+ (σ2i − si )Eq

[∂fi (Xt)

∂X it

]}dt

−n∑

j=1

Eq

[ln p(yj |Xtj )

]

Test on Lorenz 1998 model: x = (x1, . . . , xd) with

dx itdt

=(x i+1 − x i−2

)x i−1 − x i + θ + ξi (t)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 22 / 28

Page 36: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Variational inference for higher dimensions:Mean field approximation

Action functional (Vrettas, Opper & Cornford, 2015) for mean mi (t) andvariance si (t) (compare to Onsager–Machlup)

Fθ[q] =d∑

i=1

1

2σ2i

∫ T

0Eq

[(mi − fi (Xt))2

]dt

+d∑

i=1

1

2σ2i

∫ T

0

{(si − σ2

i )2

4s2i

+ (σ2i − si )Eq

[∂fi (Xt)

∂X it

]}dt

−n∑

j=1

Eq

[ln p(yj |Xtj )

]Test on Lorenz 1998 model: x = (x1, . . . , xd) with

dx itdt

=(x i+1 − x i−2

)x i−1 − x i + θ + ξi (t)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 22 / 28

Page 37: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

System of 1000 SDE with only 350 components observed.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 23 / 28

Page 38: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Nonparametric drift estimation

Reconsider SDE dX = f (X )dt + σdW : Infer the function f (·) undersmoothness assumptions from observations of the process X .

Idea (see e.g. Papaspilioupoulis, Pokern, Roberts & Stuart (2012)Assume a Gaussian Process prior f (·) ∼ GP(0,K ) with covariancekernel K (x , x ′).

0 10 20 30 40 50 60 70 80 90 100!2

!1.5

!1

!0.5

0

0.5

1

1.5

2

2.5

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 24 / 28

Page 39: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Nonparametric drift estimation

Reconsider SDE dX = f (X )dt + σdW : Infer the function f (·) undersmoothness assumptions from observations of the process X .

Idea (see e.g. Papaspilioupoulis, Pokern, Roberts & Stuart (2012)Assume a Gaussian Process prior f (·) ∼ GP(0,K ) with covariancekernel K (x , x ′).

0 10 20 30 40 50 60 70 80 90 100!2

!1.5

!1

!0.5

0

0.5

1

1.5

2

2.5

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 24 / 28

Page 40: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Basic idea

Euler discretization of SDEXt+∆ − Xt = f (Xt)∆ +

√∆ εt , for ∆→ 0.

Likelihood (assume densely observed path X0:T ) is Gaussian

p(X0:T |f ) ∝ exp

[− 1

2∆

∑t

||Xt+∆ − Xt ||2]×

exp

[−1

2

∑t

||f (Xt)||2 ∆ +∑t

f (Xt) · (Xt+∆ − Xt)

].

Posterior process is also a GP with analytical solution.

For sparse observations (∆ not small) one needs to imputeunobserved path X0:T between observations e.g. within an(approximate) EM–algorithm (Ruttor, Batz, Opper, 2013) .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 25 / 28

Page 41: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Basic idea

Euler discretization of SDEXt+∆ − Xt = f (Xt)∆ +

√∆ εt , for ∆→ 0.

Likelihood (assume densely observed path X0:T ) is Gaussian

p(X0:T |f ) ∝ exp

[− 1

2∆

∑t

||Xt+∆ − Xt ||2]×

exp

[−1

2

∑t

||f (Xt)||2 ∆ +∑t

f (Xt) · (Xt+∆ − Xt)

].

Posterior process is also a GP with analytical solution.

For sparse observations (∆ not small) one needs to imputeunobserved path X0:T between observations e.g. within an(approximate) EM–algorithm (Ruttor, Batz, Opper, 2013) .

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 25 / 28

Page 42: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

A simple pendulum

dX = Vdt,

dV =−γV + mgl sin(X )

ml2dt + d1/2dWt ,

N = 4000 data points (x , v) with ∆t = 0.3 and known diffusion constantd = 1.

−6 −4 −2 0 2 4 6

−10

−5

05

10

x

f(x,

v =

0)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 26 / 28

Page 43: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

A simple pendulum

dX = Vdt,

dV =−γV + mgl sin(X )

ml2dt + d1/2dWt ,

N = 4000 data points (x , v) with ∆t = 0.3 and known diffusion constantd = 1.

−6 −4 −2 0 2 4 6

−10

−5

05

10

x

f(x,

v =

0)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 26 / 28

Page 44: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outlook

Bias of approximation ?

Not easy, because DKL only known up to aconstant !

Get rid of bias by using q as informative proposal within MCMCsampler.

More general infinite dimensional problems (F. Pinski, G. Simpson,A.M. Stuart, H. Weber, 2015)

Inference for SDE beyond Gaussian approximation (T.Sutter, A.Ganguly and Heinz Koeppl, 2016). Allows for state dependentdiffusion.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 27 / 28

Page 45: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outlook

Bias of approximation ? Not easy, because DKL only known up to aconstant !

Get rid of bias by using q as informative proposal within MCMCsampler.

More general infinite dimensional problems (F. Pinski, G. Simpson,A.M. Stuart, H. Weber, 2015)

Inference for SDE beyond Gaussian approximation (T.Sutter, A.Ganguly and Heinz Koeppl, 2016). Allows for state dependentdiffusion.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 27 / 28

Page 46: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outlook

Bias of approximation ? Not easy, because DKL only known up to aconstant !

Get rid of bias by using q as informative proposal within MCMCsampler.

More general infinite dimensional problems (F. Pinski, G. Simpson,A.M. Stuart, H. Weber, 2015)

Inference for SDE beyond Gaussian approximation (T.Sutter, A.Ganguly and Heinz Koeppl, 2016). Allows for state dependentdiffusion.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 27 / 28

Page 47: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outlook

Bias of approximation ? Not easy, because DKL only known up to aconstant !

Get rid of bias by using q as informative proposal within MCMCsampler.

More general infinite dimensional problems (F. Pinski, G. Simpson,A.M. Stuart, H. Weber, 2015)

Inference for SDE beyond Gaussian approximation (T.Sutter, A.Ganguly and Heinz Koeppl, 2016). Allows for state dependentdiffusion.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 27 / 28

Page 48: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Outlook

Bias of approximation ? Not easy, because DKL only known up to aconstant !

Get rid of bias by using q as informative proposal within MCMCsampler.

More general infinite dimensional problems (F. Pinski, G. Simpson,A.M. Stuart, H. Weber, 2015)

Inference for SDE beyond Gaussian approximation (T.Sutter, A.Ganguly and Heinz Koeppl, 2016). Allows for state dependentdiffusion.

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 27 / 28

Page 49: Variational Bayesian inference for stochastic …ensslin/Bayes_Forum/...2017/10/13  · Variational Bayesian inference for stochastic processes Manfred Opper, AI group, TU Berlin October

Many thanks to my collaborators:

Dan Cornford & Michail Vrettas (Aston U)Andreas Ruttor, Florian Stimberg, Philipp Batz (TUB)

Manfred Opper, AI group, TU Berlin (TU Berlin) Variational approximation October 13, 2017 28 / 28