Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh

31
Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs Onno Zoeter & Tom Heskes Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo, Min-Oh

description

Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs Onno Zoeter & Tom Heskes. Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh. Contents. Basic Model: SLDS Motivation: Complexity for Posteriors Methods - PowerPoint PPT Presentation

Transcript of Bayesian Time Series Models Seminar 2012.07.19 Summarized and Presented by Heo , Min-Oh

Page 1: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

Expectation Propagation and Generalized EP Methods for Inference in Switching LDSs

Onno Zoeter & Tom Heskes

Bayesian Time Series Models Seminar2012.07.19

Summarized and Presented by Heo, Min-Oh

Page 2: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

2

Contents Basic Model: SLDS

Motivation: Complexity for Posteriors

Methods¨ Assumed Density Filtering

cf) Clique Tree Inference in HMM¨ EP in a Nutshell¨ Expectation propagation for Smoothing in SLDS¨ Generalized EP

Experiments

Appendix: Canonical form & the corresponding operations

Page 3: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

3

Model Switching linear dynamical system

¨ conditionally Gaussian state space model¨ Switching Kalman filter model¨ Hybrid model

Ellipse: Gaussian Rectangle: multinomialShading: observed

Observation model

Transition model

Switch part

Page 4: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

4

Complexity for Posteriors Posterior distribution for Filtering problems

¨ should consider all possible sequences of S1:T

P(Xt | St, y1:T, ) = MT-1 Gaussian mixture

Page 5: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

5

Complexity for Posteriors Ex)

¨ For i=2, if we consider P(X1, X2),

¨ # of Gaussian components in P(X2) without approximation: 4¨ Then, P(Xi) is a mixture of 2i Gaussians.

Representing the correct marginal dist. in a hybrid network can require space that is exponential in the size of network

Exact inference in CLG networks including standard discrete networks is NP-hard (even in polytrees).

Even the problem of computing the prob. of a single discrete vairable is NP-hard

Page 6: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

6

Exact Inference: Filtering as Clique Tree Propaga-tion Recursive filtering process as message passing in

a clique tree with belief state

HMM case:

Forward pass in a sum-product clique tree alg.

)( )1(SP

)|( )1()1( oSP )|( )1()2( oSP ),|( )2()1()2( ooSP

Page 7: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

7

Assumed Density Filtering (ADF) ADF forces the belief state to live in some restricted fam-

ily F, e.g., product of histograms, Gaussian. Given a prior , do one step of exact Bayesian up-

dating to get . Then do a projection step to find the closest approximation in the family:

If F is the exponential family, we can solve the KL minimization by moment matching.

( ) ( )ˆ ˆarg min ( || )t tuq F

D q

( 1)ˆ t F ( )ˆ t

u F

( 1)ˆ tu ( )ˆ t

u

( 1)ˆ t

Page 8: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

8

Assumed Density Filtering (ADF) Minimizing KL(p||q) with respect to exponential family q(z),

¨ By setting gradient w.r.t. η to zero,

¨ For general exponential family, the following holds

¨ So,

Moment matching¨ The optimum solution is to match the expected sufficient statistics

from the derivative of

Notation in this Chapter:

Page 9: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

9

Potentials in Sum-product alg.

Forward Message:

Approximating forward pass message for Filtering only!

Page 10: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

10

One Example for DBN

( )t

Page 11: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

EP in a nutshell Approximate a function by a simpler one:

Where each lives in a parametric, exponen-tial family (e.g. Gaussian)

Factors can be conditional distributions in a Bayesian network

a

afp )()( xx a

afq )(~)( xx

)(~ xaf

)(xaf

Page 12: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

EP algorithm Iterate the fixed-point equations:

specifies where the approximation needs to be good

Coordinated local approximations

))()(~||)()((minarg)(~ \\ xxxxx aa

aaa qfqfDf

ab

ba fq )(~)(\ xxwhere

)(\ xaq

Page 13: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(Loopy) Belief propagation Specialize to factorized approximations:

Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)¨ “send messages”

i

iaia xff )(~

)(~ x “messages”

)()( \ xx aa qf

)()(~ \ xx aa qf

Page 14: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

EP versus BP EP approximation can be in a restricted family,

e.g. Gaussian EP approximation does not have to be factorized EP applies to many more problems

¨ e.g. mixture of discrete/continuous variables

Page 15: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

15

Expectation Propagation EP approximate smoothing algorithm

¨ Smoother is backward (smoothing) version based on the assumed density filter

Considering forward & Backward pass together

¨ In exact case: (backward message) ¨ In approximation:

Page 16: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

16

Expectation Propagation

Page 17: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

17

Convergence in EP for SLDS Sometimes approximation may fail

How to resolve¨ Iteration

Step 1 to 4 in ADF iteration to Find local approximation that are consistent as possible

¨ Use damped messages Normalisablility in step 4 in ADF is guaranteed if the sum of

the respective inverse covariance matrices is positive definite Damped message in canonical space (appendix)

Page 18: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

18

Generalized EP¨ More accurate approximation similar to Kikuchi’s ex-

tension of the Bethe free-energy¨ Outer cluster

Larger than cliques of junction tree¨ Overlap

Page 19: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

19

K=1 case:

Clusters form the cliques and separators in a junc-tion tree

Outer cluster:

Overlaps:

Counting number: 1

Counting number: -1(1-2 = -1)Counting number: 0(1- (3-2) = 0)Counting number: 0(1- (4-3+0) =0)

Page 20: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

20

Page 21: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

21

Alternative Backward Pass (ABP) Approximation to smoothed posteriors

¨ Based on Traditional Kalman smoother form¨ Treat discrete and continuous latent states separately

Page 22: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

22

Experiments – with exact posteriors 100 Models: generated by drawing parameters

from conjugate priors Dataset: Generated a sequence of length 8

Page 23: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

23

Experiments – with exact posteriors, Gibbs sam-pling

Page 24: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

24

Experiments – Effect of larger outer clusters

Page 25: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

25

APPENDIX

Page 26: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

26

Canonical form Represents the intermediate result as a log-qua-

dratic form exp(Q(x))

Page 27: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

27

Operation on canonical form (1/4) Multiplication

¨ Product of two canonical form factors

¨ Ex)

Page 28: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

28

Operation on canonical form (2/4) Division

Vacuous canonical form¨ Causes no effect for multiplication and division¨ Defined as

Page 29: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

29

Operation on canonical form (3/4) Marginalization

¨ The integral is finite iff KYY is positive definite

Page 30: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

30

Operation on canonical form (4/4) Reduction

¨ Reduce a canonical form to context representing evi-dence

¨ If Y=y,

Page 31: Bayesian Time Series  Models Seminar 2012.07.19 Summarized and Presented by  Heo , Min-Oh

(c)2012, Biointelligence Lab, http://bi.snu.ac.kr

31

Sum-product algorithms Inference in linear Gaussian networks

¨ able to adapt variable elimination and clique tree algorithms using canonical forms

¨ Marginalization operation is well-defined for an arbitrary canonical form

Reduction for instantiating continuous variable¨ cf) discrete case: simply zeroing the entries that are not consistent with Z

= z

Computational complexity¨ Linear in the number of cliques¨ At most cubic in the size of the largest clique