erence - Uppsala Universityuser.it.uu.se/~thosc112/CLDS_UTFSM/lecture1handout.… · ·...

Lecture 1 - Introduction, dynamical models andstrategies for state inference

Thomas Schöne-mail: [email protected]

Division of Systems and ControlDepartment of Information TechnologyUppsala University

Lecture 1 - Introduction, dynamical models and strategies for state inference

Thomas Schön, Summer School at Universidad Técnica Federico Santa María, Valparaíso, Chile in January 2014.

Hinting at the potential – state estimation (I/II) 2(36)

Fighter aircraft navigation using particle filters together with Saab.

The task is to find the aircraftposition using information fromseveral sensors:

• Inertial sensors• Radar• Terrain elevation database

This sensor fusion problem requires a nonlinear state estimationproblem to be solved, where we want to compute p(xt | y1:t).



Hinting at the potential – state estimation (II/II) 3(36)

Key theory that allowed us to do this• Particle filter (Lecture 3 of this course)• Rao-Blackwellized particle filter

Details of this particular example are provided inThomas Schön, Fredrik Gustafsson, and Per-Johan Nordlund. Marginalized Particle Filters for Mixed Linear/NonlinearState-Space Models. IEEE Transactions on Signal Processing, 53(7):2279-2289, July 2005.

Show movie



Hinting at the potential – system identification (I/IV) 4(36)

The theory provided in Lecture 3/4 allows us to perform inference instate space models (SSMs)

xt+1 | xt ∼ fθ(xt+1 | xt, ut) yt | xt ∼ hθ(yt | xt, ut)

Consider the special case of a Wiener model (a linear Gaussianstate space (LGSS) model followed by a static nonlienarity)

Lut h(·) Σ

vt et

ytzt



Hinting at the potential – system identification (II/IV)5(36)

Consider the blind problem,

Lut h(·) Σ

et

ytzt

The task is to learn the parameters of the linear system L and findthe nonlinearity h(·) (entire function has to be learned) based onlyon the output measurements y1:T , {y1, . . . , yT}.We do not impose any assumption on the nonlinearity and allow forcolored noises.



Hinting at the potential – system identification (III/IV)6(36)

Using a PMCMC method (introduced in Lecture 5) we can computethe posterior distribution p(θ | y1:T), where θ contains the unknownparameters and the unknown measurement function.

−10

0

10

20

Magnitude(dB)

0 0.5 1 1.5 2 2.5 3

−50

0

50

100

Frequency (rad/s)

Phase

(deg)

TruePosterior mean99 % credibility

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

z

h(z)

TruePosterior mean99 % credibility

Show movie



Hinting at the potential – system identification (IV/IV)7(36)

Key theory that allowed us to do this:

• Particle MCMC (Lecture 5)

• Particle smoothing/ backward simulation (Lecture 4)

• Gaussian processes (Not covered in this course)

Details of this particular example are available inFredrik Lindsten, Thomas B. Schön and Michael I. Jordan. Bayesian semiparametric Wiener system identification.Automatica, 49(7): 2053-2063, July 2013.

We also have Maximum Likelihood solutions to Wiener modelidentification, again the particle filter/smoother playes a key role,Adrian Wills, Thomas B. Schön, Lennart Ljung and Brett Ninness. Identification of Hammerstein-Wiener models.Automatica, 49(1): 70-81, January 2013.



Important message! 8(36)

Given the computational tools that we have today it can berewarding to resist the linear Gaussian convenience!!



The aim of this course 9(36)

The aim of this course is to provide an introduction to the theoryand application of (new) computational methods for inference in

dynamical systems.

The key computational methods we refere to are,

• Sequential Monte Carlo (SMC) methods (e.g., particle filters andparticle smoothers) for nonlinear state inference problems.

• Expectation maximisation (EM) and Markov chain Monte Carlo(MCMC) methods for nonlinear system identification.

Course home page:

http://user.it.uu.se/~thosc112/CLDS_UTFSM/



Dynamical systems are everywhere 10(36)

Sensor fusion in dynamical systems - applications and research challengesThomas Schön, [email protected]

DREAMS SeminarBerkeley, CA



Model – data – inference algorithm 11(36)

In solving problems we have to make assumptions and a model willto a large extent capture many of these assumptions.

A model is a compact and interpretable representation of the datathat is observed.

Using models to solve problems requires three key ingredients;

1. Data: Measurements from the system we are interested in.

2. Model: We use probabilistic models, allowing us to employprobability theory to represent and systematically work with theuncertainty that is inherent in most data.

3. Inference algorithm: The key inference algortithms of thiscourse are the sequential Monte Carlo methods.



Outline of the course 12(36)

L1 Introduction, dynamical models and strategies for stateinference (estimation)

a) Introduction and modeling dynamical systems using SSMsb) Strategies for state inference in nonlinear systems

L2 EM and MCMC introduced by learning LGSS modelsa) Maximum likelihood (ML) learning using EMb) The Monte Carlo ideac) Bayesian learning using Gibbs sampling (MCMC)

L3 Solving the nonlinear filtering problem using the particle filtera) Introduce importance sampling and rejection samplingb) Derive the particle filter (most common SMC sampler)

L4 Particle smoothers and ML nonlinear sys. id.a) Derive particle smoothers (PS)b) Maximum likelihood nonlinear sys. id. using EM and PS

L5 Bayesian nonlinear system identificationa) Particle Markov chain Monte Carlo (PMCMC)b) Bayesian nonlinear system identification using SMC and MCMC



Outline - Part 1 13(36)

1. Probabilistic modeling of dynamical systemsa) Nonlinear state space model (SSM)b) Linear Gaussian state space (LGSS) modelc) Conditionally linear Gaussian state space (CLGSS) model

2. Strategies for state inferencea) Forward computationsb) Backward computations



1. Representing an SSM using pdf’s 14(36)

Definition (State space model (SSM))

A state space model (SSM) consists of a Markov process {xt}t≥1and a measurement process {yt}t≥1, related according to

xt+1 | xt ∼ fθ,t(xt+1 | xt, ut),yt | xt ∼ gθ,t(yt | xt, ut),

x1 ∼ µθ(x1),

where xt ∈ Rnx denotes the state, ut ∈ Rnu denotes a knowndeterministic input signal, yt ∈ Rny denotes the observedmeasurement and θ ∈ Θ ⊆ Rnθ denotes any unknown (static)parameters.



2. Representing SSM using difference equations 15(36)

In engineering literature, the SSM is often written in terms of adifference equation and an accompanying measurement equation,

xt+1 = aθ,t(xt, ut) + vθ,t,yt = cθ,t(xt, ut) + eθ,t,



3. Representing SSM using a graphical model (I/II) 16(36)

x1 x2 x3. . .

xT

y1 y2 y3 yT

Figure: Graphical model for the SSM. Each stochastic variable is encodedusing a node, where the nodes that are filled (gray) corresponds tovariables that are observed and nodes that are not filled (white) are latentvariables. The arrows pointing to a certain node encodes which variablesthe corresponding node are conditioned upon.

The SSM is an instance of a graphical model called Bayesiannetwork, or belief network.



3. Representing SSM using a graphical model (II/II) 17(36)

A Bayesian network directly describes how the joint distribution of allthe involved variables (here p(x1:T, y1:T)) is decomposed into aproduct of factors,

p(x1:T, y1:T) =T

∏t=1

p(xt | pa(xt))T

∏t=1

p(yt | pa(yt)),

where pa(xt) denotes the set of parents to xt.

p(xt:T, y1:T) = µ(x1)T−1

∏t=1

fθ,t(xt+1 | xt)T

∏t=1

gθ,t(yt | xt).

Graphical models offers a powerful framework for modeling,inference and learning,

Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer.

Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press.



The LGSS model 18(36)

Definition (Linear Gaussian State Space (LGSS) model)

The time invariant linear Gaussian state space (LGSS) model isdefined by

xt+1 = Axt + But + vt,yt = Cxt + Dut + et,

where xt ∈ Rnx denotes the state, ut ∈ Rnu denotes the knowninput signal and yt ∈ Rny denotes the observed measurement. Theinitial state and the noise are distributed according to

x1vtet

∼ N

µ00

,

P1 0 00 Q S0 ST R

.



Notation for Gaussian (Normal) variables 19(36)

The pdf of a Gaussian variable is denoted N (x | µ, Σ), i.e.,

N (x | µ, Σ) , 1(2π)n/2

√det Σ

exp(−1

2(x− µ)TΣ−1(x− µ)

)

See the appendix of the lecture notes for basic theorems needed inmanipulating Gaussian variables.



CLGSS model 20(36)

Definition (Conditionally linear Gaussian state space (CLGSS)model)Assume that the state xt of an SSM can be partitioned according toxt =

(sT

t zTt)T

. The SSM is then a CLGSS model if the conditionalprocess {(zt, yt) | s1:t}t≥1 is described by an LGSS model.

Conditioned on part of the state vector, the rest of the state behaveslike an LGSS model.

This can be exploited in deriving inference algorithms!

The zt-process is conditionally linear, motivating the name linearstate for zt and nonlinear state for st.



SLGSS model (I/II) 21(36)

Definition (Switching linear Gaussian state space (SLGSS))

The SLGSS model is defined according to

zt+1 = Astzt + Bstut + vst ,yt = Cstzt + Dstut + est ,st ∼ p(st | st−1, zt−1),

where zt ∈ Rnx denotes the state, st ∈ {1, . . . , S} denotes theswitching variable, ut ∈ Rnu denotes the known input signal andyt ∈ Rny denotes the observed measurement. The initial state x1and the noise are distributed according to

x1vst

est

∼ N

µvst

est

,

P1 0 00 Qst Sst

0 (Sst)T Rst

.



SLGSS model (II/II) 22(36)

s1 s2 s3. . .

sT

z1 z2 z3

. . .zT

y1 y2 y3 yT

Figure: Graphical model for the switching linear Gaussian state space(SLGSS) model.



Mixed Gaussian state space (MGSS) model 23(36)

Definition (Mixed Gaussian state space (MGSS) model)The MGSS model is defined according to

xt+1 = ft(st) + At(st)zt + vt(st),yt = ht(st) + Ct(st)zt + et(st),

where

xt =

(stzt

), ft(st) =

(f st (st)

f zt (st)

), At(st) =

(As

t(st)Az

t (st)

).

The noises are distributed according to

vt(st) ∼ N((

00

),(

Qs(st) Qsz(st)Qsz(st)T Qz(st)

))= N (0, Q(st))

et(st) ∼ N (0, R(st)) .



State inference 24(36)

State inference referes to the problem of finding information aboutthe state(s) xk:l based on the available measurements y1:t.

We will represent this information using PDFs.

Name Probability density functionFiltering p(xt | y1:t)Prediction p(xt+1 | y1:t)k-step prediction p(xt+k | y1:t)Joint smoothing p(x1:T | y1:T)Marginal smoothing p(xt | y1:T), t ≤ TFixed-lag smoothing p(xt−l+1:t | y1:t), l > 0Fixed-interval smoothing p(xr:t | y1:T), r < t ≤ T

Notation y1:t , {y1, y2, . . . , yt}.Lecture 1 - Introduction, dynamical models and strategies for state inference


The nonlinear filtering problem 25(36)

State filtering problem: Find xt based on {u1:T, y1:T} when themodel is given by,

xt+1 | xt ∼ f (xt+1 | xt, ut),yt | xt ∼ g(yt | xt, ut),

x1 ∼ µ(x1), (θ ∼ p(θ)).

Strategy: Compute the filter PDF p(xt | y1:t).



Forward computations 26(36)

Summarizing this development, we have measurement update

p(xt | y1:t) =

measurement︷︸︸︷g(yt | xt)

prediction pdf︷︸︸︷p(xt | y1:t−1)

p(yt | y1:t−1),

and time update

p(xt | y1:t−1) =∫

f (xt | xt−1)︸︷︷︸dynamics

p(xt−1 | y1:t−1)︸︷︷︸filtering pdf

dxt−1,



Backward computations 27(36)

By marginalizing

p(x1:T | y1:T) = p(xT | y1:T)T−1

∏t=1

f (xt+1 | xt)p(xt | y1:t)

p(xt+1 | y1:t).

w. r. t. x1:t−1 and xt+1:T we obtain the following expression for themarginal smoothing pdf

p(xt | y1:T) = p(xt | y1:t)∫ f (xt+1 | xt)p(xt+1 | y1:T)

p(xt+1 | y1:t)dxt+1.



Forward and backward computations 28(36)

Two key strategies are used:

1. Forward filtering and backward smoothing (FFBSm): Providesthe smoothing density (exact or approximate).

2. Forward filtering and backward simulation (FFBSi): Providessamples distributed (exactly or approximately) according to thesmoothing density.



Example – linear smoothing 29(36)

Basic results of Gaussian variables provides the backward kernel

p(xt | xt+1, y1:t) = N(xt | Jtxt+1 − Jtxt+1|t + xt|t, Pt|t − JtAPt|t

),

where Jt = Pt|tAT(APt|tAT + Q)−1. Using this together withp(xt+1 | y1:T) = N (xt+1 | xt+1|T, Pt+1|T) results in

p(xt | y1:T) = N(xt | xt|T, Pt|T

),

where

xt|T = xt|t + Jt(xt+1|T − xt+1|t

),

Pt|T = Pt|t + Jt(Pt+1|T − Pt+1|t

)JTt ,

Jt = Pt|tAT(APt|tA

T + Q)−1 = Pt|tATP−1

t+1|t.



Exact backward simulation in LGSS models 30(36)

Algorithm 1 Backwards simulator (LGSS)1. Initialise: Run the Kalman filter and store xt|t, Pt|t for t = 1, . . . , T.2. Draw xT ∼ N (xT|T, PT|T)3. For j = 1 to M do:4. For t = T− 1 to 1 do:

(a) Draw xt ∼ N (µt, Lt), where

µt = xt|t + Pt|tAT(APt|tA

T + Q)−1(xjt+1 −Axt|t),

Lt = Pt|t − Pt|tAT(APt|tA

T + Q)−1APt|t.

(b) Store the sample xjt:T = (xj

t, xjt+1:T).

5. End6. End




Consider

xt+1 = 0.9xt + vt, vt ∼ N (0, 0.1),yt = xt + et, et ∼ N (0, 1),

x1 ∼ N (0, 10) and generate T = 50 samples y1:T.

Use FFBSi to sample 5 000 backwards trajectories {xj1:T}M

j=1 fromthe JSD p(x1:T | y1:T).




Histograms of {xjt}M

j=1 for t = 1, t = 25 and t = 50 (from left toright). The true marginal smoothing densities p(xt | y1:T) are shownusing a solid black curve.

2 4 6 8x1

−4 −2 0 2 4x25

−4 −2 0 2 4x50

As expected they agree!



Why backward simulation? 33(36)

Backward simulation is the key strategy underlying several of thebest particle smoothers that are available today.

For a detailed introduction to backward simulation see:Fredrik Lindsten and Thomas B. Schön. Backward simulation methods for Monte Carlo statistical inference. Foundationsand Trends in Machine Learning, 6(1):1-143, 2013.



Summary – Lecture 1 (I/III) 34(36)

A model is a compact and interpretable representation of the datathat is observed. In this course, model = PDF!

Introduced the nonlinear state space model (SSM)

xt+1 | xt ∼ fθ,t(xt+1 | xt, ut),yt | xt ∼ gθ,t(yt | xt, ut),

x1 ∼ µθ,t(x1), (θ ∼ p(θ)).

and several important special cases,

a) LGSS

b) CLGSS

c) MGSS and SLGSS



Summary – Lecture 1 (II/III) 35(36)

State inference referes to the problem of finding information aboutthe state(s) xk:l based on the available measurements y1:t.

We showed the key strategies for state filtering and statesmoothing. These strategies forms the foundation for everything wedo throughout this course.

The state filtering problem: Compute the filtering pdf p(xt | y1:t).

Solved via the forward computations

p(xt | y1:t) =

measurement︷︸︸︷g(yt | xt)

prediction pdf︷︸︸︷p(xt | y1:t−1)

p(yt | y1:t−1),

p(xt | y1:t−1) =∫

f (xt | xt−1)︸︷︷︸dynamics

p(xt−1 | y1:t−1)︸︷︷︸filtering pdf

dxt−1,



Summary – Lecture 1 (III/III) 36(36)

We also derived the backward computations, where the backwardkernel is key

p(xt | xt+1, y1:t) =f (xt+1 | xt)p(xt | y1:t)

p(xt+1 | y1:t).

Two strategies for combining forward and backward computationswere introduced, Forward filtering and backward smoothing(FFBSm) and Forward filtering and backward simulation (FFBSi).



erence - Uppsala Universityuser.it.uu.se/~thosc112/CLDS_UTFSM/lecture1handout.… · ·...

Documents

Transcript of erence - Uppsala Universityuser.it.uu.se/~thosc112/CLDS_UTFSM/lecture1handout.… · ·...