Introduction to Graphical Models

30
Introduction to Graphical Models Brookes Vision Lab Reading Group

description

Introduction to Graphical Models. Brookes Vision Lab Reading Group. Graphical Models. To build a complex system using simpler parts. System should be consistent Parts are combined using probability Undirected – Markov random fields Directed – Bayesian Networks. Overview. Representation - PowerPoint PPT Presentation

Transcript of Introduction to Graphical Models

Page 1: Introduction to Graphical Models

Introduction to Graphical Models

Brookes Vision Lab Reading Group

Page 2: Introduction to Graphical Models

Graphical Models

• To build a complex system using simpler parts.

• System should be consistent• Parts are combined using probability• Undirected – Markov random fields• Directed – Bayesian Networks

Page 3: Introduction to Graphical Models

Overview

• Representation• Inference• Linear Gaussian Models• Approximate inference• Learning

Page 4: Introduction to Graphical Models

Causality : Sprinkler “causes” wet grass

Representation

Page 5: Introduction to Graphical Models

Conditional Independence

• Independent of ancestors given parents• P(C,S,R,W) = P(C) P(S|C) P(R|C,S) P(W|C,S,R)• = P(C) P(S|C) P(R|C) P(W|S,R)

• Space required for n binary nodes– O(2n) without factorization– O(n2k) with factorization, k = maximum fan-in

Page 6: Introduction to Graphical Models

Inference

• Pr(S=1|W=1) = Pr(S=1,W=1)/Pr(W=1) = 0.2781/0.6471 = 0.430• Pr(R=1|W=1) = Pr(R=1,W=1)/Pr(W=1) = 0.4581/0.6471 = 0.708

Page 7: Introduction to Graphical Models

Explaining Away

• S and R “compete” to explain W=1

• S and R are conditionally dependent

• Pr(S=1|R=1,W=1) = 0.1945

Page 8: Introduction to Graphical Models

Inference

where

where

Page 9: Introduction to Graphical Models

Inference

• Variable elimination• Choosing optimal ordering – NP hard• Greedy methods work well• Computing several marginals• Dynamic programming avoids redundant

computation• Sound familiar ??

Page 10: Introduction to Graphical Models

Bayes Balls for Conditional Independence

Page 11: Introduction to Graphical Models

A Unifying (Re)View

Linear GaussianModel (LGM)

FA SPCA PCA LDS

Mixture of Gaussians VQ HMM

Continuous-State LGM

Basic Model

Discrete-State LGM

Page 12: Introduction to Graphical Models

Basic Model● State of a system is a k-vector x (unobserved)● Output of a system is a p-vector y (observed) ● Often k << p

● Basic model ● xt+1 = A xt + w● yt = C xt + v

● A is the k x k transition matrix● C is a p x k observation matrix● w = N(0, Q)● v = N(0, R)

● Noise processes are essential

Zero mean w.l.o.g

Page 13: Introduction to Graphical Models

Degeneracy in Basic Model

• Structure in Q can be moved to A and C• W.l.o.g. Q = I• R cannot be restricted as yt are observed• Components of x can be reordered arbitrarily.• Ordering is based on norms of columns of C.• x1 = N(µ1, Q1)• A and C are assumed to have rank k.• Q, R, Q1 are assumed to be full rank.

Page 14: Introduction to Graphical Models

Probability Computation

• P( xt+1 | xt ) = N(A xt, Q ; xt+1)

• P( yt | xt ) = N( C xt, R; yt)

• P({x1,..,xT,{y1,..,yT}) =

P(x1) П P(xt+1|xtП P(yt|xt)• Negative log probability

Page 15: Introduction to Graphical Models

Inference● Given model parameters {A, C, Q, R, µ1, Q1}● Given observations y● What can be infered about hidden states x ?● Total likelihood

● Filtering : P (x(t) | {y(1), ... , y(t)})● Smoothing: P (x(t) | {y(1), ... , y(T)})● Partial smoothing: P (x(t) | {y(1), ... , y(t+t')})● Partial prediction: P (x(t) | {y(1), ... , y(t-t')})● Intermediate values of recursive methods for computing total likelihood.

Page 16: Introduction to Graphical Models

Learning• Unknown parameters {A, C, Q, R, µ1, Q1}• Given observations y• Log-likelihood

F(Q,Ө) – free energy

Page 17: Introduction to Graphical Models

EM algorithm• Alternate between maximizing F(Q,Ө) w.r.t. Q and

Ө.

• F = L at the beginning of M-step• E-step does not change Ө• Therefore, likelihood does not decrease.

Page 18: Introduction to Graphical Models

Continuous-State LGM

Continuous-State LGM

Static Data Modeling Time-series Modeling

● No temporal dependence ● Factor analysis● SPCA● PCA

● Time ordering of data crucial● LDS (Kalman filter models)

Page 19: Introduction to Graphical Models

Static Data Modelling

• A = 0• x = w• y = C x + v• x1 = N(0,Q)• y = N(0, CQC'+R)• Degeneracy in model• Learning : EM

– R restricted• Inference

Page 20: Introduction to Graphical Models

Factor Analysis

• Restrict R to be diagonal.• Q = I• x – factors• C – factor loading matrix• R – uniqueness• Learning – EM , quasi-Newton optimization• Inference

Page 21: Introduction to Graphical Models

SPCA

• R = єI• є – global noise level• Columns of C span the principal subspace.• Learning – EM algorithm• Inference

Page 22: Introduction to Graphical Models

PCA• R = lim є->0 єI• Learning

– Diagonalize sample covariance of data– Leading k eigenvalues and eigenvectors define C– EM determines leading eigenvectors without

diagonalization• Inference

– Noise becomes infinitesimal– Posterior collapses to a single point

Page 23: Introduction to Graphical Models

Linear Dynamical Systems

• Inference – Kalman filter• Smoothing – RTS recursions• Learning – EM algorithm

– C known – Shumway and Stoffer, 1982– All unknown – Ghahramani and Hinton, 1995

Page 24: Introduction to Graphical Models

Discrete-State LGM

• xt+1 = WTA[A xt + w]

• yt = C xt + v• x1 = WTA[N(µ1,Q1)]

Page 25: Introduction to Graphical Models

Discrete-State LGM

Discrete-state LGM

Static Data Modeling Time-series Modeling

● Mixture of Gaussians● VQ

● HMM

Page 26: Introduction to Graphical Models

Static Data Modelling

• A = 0• x = WTA[w]• w = N(µ,Q)• Y = C x + v• лj = P(x = ej)

• Nonzero µ for nonuniform лj

• y = N(Cj, R)

• Cj – jth column of C

Page 27: Introduction to Graphical Models

Mixture of Gaussians• Mixing coefficients of cluster лj

• Mean – columns Cj

• Variance – R• Learning: EM (corresponds to ML

competitive learning)• Inference

Page 28: Introduction to Graphical Models

Vector Quantization• Observation noise becomes infinitesimal• Inference problem solved by 1NN rule• Euclidean distance for diagonal R• Mahalanobis distance for unscaled R• Posterior collapses to closest cluster• Learning with EM = batch version of k-

means

Page 29: Introduction to Graphical Models

Time-series modelling

Page 30: Introduction to Graphical Models

HMM

• Transition matrix T• Ti,j = P(xt+1 = ej | xt = ei)• For every T, there exist A and Q• Filtering : forward recursions• Smoothing: forward-backward algorithm• Learning: EM (called Baum-Welsh

reestimation)• MAP state sequences - Viterbi