Variational Methods for Graphical Models

62
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shiraz

description

Variational Methods for Graphical Models. Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul. Presented by: Afsaneh Shirazi. Outline. Motivation Inference in graphical models Exact inference is intractable Variational methodology Sequential approach Block approach - PowerPoint PPT Presentation

Transcript of Variational Methods for Graphical Models

Page 1: Variational Methods for Graphical Models

Variational Methods for Graphical Models

Micheal I. JordanZoubin GhahramaniTommi S. JaakkolaLawrence K. Saul

Presented by: Afsaneh Shirazi

Page 2: Variational Methods for Graphical Models

2

Outline

• Motivation• Inference in graphical models• Exact inference is intractable• Variational methodology

– Sequential approach– Block approach

• Conclusions

Page 3: Variational Methods for Graphical Models

3

Motivation(Example: Medical Diagnosis)

symptoms

diseases

What is the most probable disease?

Page 4: Variational Methods for Graphical Models

4

Motivation

• We want to answer some queries about our data

• Graphical model is a way to model data• Inference in some graphical models is

intractable (NP-hard)• Variational methods simplify the inference

in graphical models by using approximation

Page 5: Variational Methods for Graphical Models

5

Graphical Models

• Directed (Bayesian network)

• Undirected

S1

S3

S5

S4

S2P(S2)

P(S1)

P(S5|S3,S4)

P(S3|S1,S2) P(S4|S3)

(C1)

(C2)

(C3)

Page 6: Variational Methods for Graphical Models

6

Inference in Graphical Models

Inference: Given a graphical model, the process of computing answers to queries

• How computationally hard is this decision problem?

• Theorem: Computing P(X = x) in a Bayesian network is NP-hard

Page 7: Variational Methods for Graphical Models

7

Why Exact Inference is Intractable?

symptoms

diseases

Diagnose the most probable disease

Page 8: Variational Methods for Graphical Models

8

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

)()|(),( dPdfPdfP f

Page 9: Variational Methods for Graphical Models

9

Why Exact Inference is Intractable?

symptoms

diseases:Noisy-OR model)|( dfP i

101

Page 10: Variational Methods for Graphical Models

10

Why Exact Inference is Intractable?

symptoms

diseases :Noisy-OR model)|( dfP i

101

))1,0,1(|0( ifP

Page 11: Variational Methods for Graphical Models

11

Why Exact Inference is Intractable?

)( 0

)( 0

1)|1(

)1()1()|0()(

0

ij ijij

ij ijij

j

d

i

d

ij

dijii

edfP

e

qqdfP

Page 12: Variational Methods for Graphical Models

12

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

jj

ii dPdfP

dPdfPdfP

)()|(

)()|(),(f

j jjkij kijjkiij ijji ddd

eee*0

*)( 0)0( 000 ...

Page 13: Variational Methods for Graphical Models

13

Why Exact Inference is Intractable?

symptoms

diseases

: Observed symptoms

jj

ii dPdfP

dPdfPdfP

)()|(

)()|(),(f

)1(...)1( )( 0)0( 000

kij kijjkiij ijji ddee

Page 14: Variational Methods for Graphical Models

14

Reducing the Computational Complexity

Variational Methods

Simple graph for exact methods

Approximate the probability

distribution

Use the role of convexity

Page 15: Variational Methods for Graphical Models

15

Express a Function Variationally

• is a concave function)ln(x

))((min )ln(

Hxx

))ln((min )( xxHx

Page 16: Variational Methods for Graphical Models

16

Express a Function Variationally

• is a concave function)ln(x

)1)ln((min )ln(

xx

Page 17: Variational Methods for Graphical Models

17

Express a Function Variationally

• If the function is not convex or concave: transform the function to a desired form

• Example: logistic function

xexf

11 )( ))((min

)(

Hx

exf

))(ln()( xfxg ))((min)(

Hxxg

Transformation

Approximation

Transforming back

Page 18: Variational Methods for Graphical Models

18

Approaches to Variational Methods

• Sequential Approach: (on-line) nodes are transformed in an order, determined during inference process

• Block Approach: (off-line) has obvious substructures

Page 19: Variational Methods for Graphical Models

19

Sequential Approach(Two Methods)

Untransformed Graph

Transform one node at a time

Simple Graph for exact methods

Reintroduce one node at a time

Simple Graph for exact methods

Completelytransformed

Graph

Page 20: Variational Methods for Graphical Models

20

Sequential Approach (Example)

)( 01)|1( ij ijijd

i edfP

symptoms

diseases

Log Concave

Page 21: Variational Methods for Graphical Models

21

Sequential Approach (Example)

)( 01)|1( ij ijijd

i edfP

symptoms

diseases

Log Concave

)(1 fxx ee

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

Page 22: Variational Methods for Graphical Models

22

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

434)1( 3edP

)0( 3 dP

Page 23: Variational Methods for Graphical Models

23

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

Page 24: Variational Methods for Graphical Models

24

Sequential Approach (Example)

symptoms

diseases

)(

)( ][)|1( 0

ij

dfi

jijiiii eedfP

1

Page 25: Variational Methods for Graphical Models

25

Sequential Approach (Upper Bound and Lower Bound)

• We need both lower bound and upper bound

),(),(),(

)|(jj

jj dfPdfP

dfPfdP

)),(()),(()),((

)|(jj

jj dfPLBdfPUB

dfPUBfdP

Page 26: Variational Methods for Graphical Models

26

How to Compute Lower Bound for a Concave Function?

• Lower bound for concave functions:

j j

jj

j j

jj

jj

qz

afq

qz

qafzaf

)(

)()(

Variational parameter is probability distribution

jq

Page 27: Variational Methods for Graphical Models

27

Block Approach (Overview)

• Off-line application of sequential approach– Identify some structure amenable to exact

inference– Family of probability distribution via

introduction of parameters– Choose best approximation based on

evidence

Page 28: Variational Methods for Graphical Models

28

Block Approach (Details)

• KL divergence

}{ )()(ln)()||(

S SPSQSQPQD

)|( EHP

),|( EHQFamily of

),|( *EHQ

Minimize KL divergence

Page 29: Variational Methods for Graphical Models

29

Block Approach (Example – Boltzmann machine)

ZeSP

ji i iijiij SSS

0

)|(

Si

Sj

ij

Page 30: Variational Methods for Graphical Models

30

Block Approach (Example – Boltzmann machine)

ZeSP

ji i iijiij SSS

0

)|(

Si

Sj=1

1ij

Ej

jijici S 00

Page 31: Variational Methods for Graphical Models

31

Block Approach (Example – Boltzmann machine)

si

sj

c

SSS

ZeEHP

ji i icijiij

0

),|(

ii Si

Hi

SiEHQ

1)1(),|(

i

j

Page 32: Variational Methods for Graphical Models

32

Block Approach (Example – Boltzmann machine)

si

sji

j

Minimize KL Divergence

j

ijiji )( 0

xex

11 )(

Page 33: Variational Methods for Graphical Models

33

Block Approach (Example – Boltzmann machine)

si

sji

j

Minimize KL Divergence

j

ijiji )( 0

Mean field equations: solve for fixed point

Page 34: Variational Methods for Graphical Models

34

Conclusions

• Time or space complexity of exact calculation is unacceptable

• Complex graphs can be probabilistically simple

• Inference in simplified models provides bounds on probabilities in the original model

Page 35: Variational Methods for Graphical Models

35

Page 36: Variational Methods for Graphical Models

36

Extra Slides

Page 37: Variational Methods for Graphical Models

37

Concerns

• Approximation accuracy• Strong dependencies can be identified• Not based on convexity transformation• Not able to assure that the framework will

transfer to other examples• Not straightforward to develop a

variational approximation for new architectures

Page 38: Variational Methods for Graphical Models

38

Justification for KL Divergence

• Best lower bound on the probability of the evidence

}{

}{

}{

)|(),(ln)|(

)|(),()|(ln

),(ln)(ln

H

H

H

EHQEHPEHQ

EHQEHPEHQ

EHPEP

)(EP

Page 39: Variational Methods for Graphical Models

39

EM

• Maximum likelihood parameter estimation:

• Following function is the lower bound on log likelihood

)|( EP

)|(ln)|()|,(ln)|(),(}{

EHQEHQEHPEHQQLH

),()|(ln QLEP KL Divergence between Q(H|E) and P(H|E,)

Page 40: Variational Methods for Graphical Models

40

EM

1. Maximize the bound with respect to Q

2. Fix Q, maximize with respect to

),(maxarg :step) (E )()1( kQ

k QLQ

),(maxarg :step) (M )1()1( kk QL

),|( )(kEHP

Traditional EMApproximation to EM algorithm

Page 41: Variational Methods for Graphical Models

41

Principle of InferenceDAG

Junction Tree

Inconsistent Junction TreeInitialization

Consistent Junction TreePropagation

)|( eEvVPMarginalization

Page 42: Variational Methods for Graphical Models

42

Example: Create Join Tree

X1 X2

Y1 Y2

HMM with 2 time steps:

Junction Tree:

X1,X2X1,Y1 X2,Y2X1 X2

Page 43: Variational Methods for Graphical Models

43

Example: Initialization

Variable Associated Cluster

Potential function

X1 X1,Y1

Y1 X1,Y1

X2 X1,X2

Y2 X2,Y2

X1,Y1 P(X1)

X1,Y1 P(X1)P(Y1 | X1)

X1,X 2 P(X2 | X1)

X 2,Y 2 P(Y2 | X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 44: Variational Methods for Graphical Models

44

Example: Collect Evidence

• Choose arbitrary clique, e.g. X1,X2, where all potential functions will be collected.

• Call recursively neighboring cliques for messages:

• 1. Call X1,Y1.– 1. Projection:

– 2. Absorption:

X1 X1,Y1 P(X1,Y1)P(X1)Y1

{X1,Y1} X1

X1,X 2 X1,X 2X1

X1old P(X2 | X1)P(X1)P(X1,X2)

Page 45: Variational Methods for Graphical Models

45

Example: Collect Evidence (cont.)

• 2. Call X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X 2,Y 2 P(Y2 | X2)1Y 2

{X 2,Y 2} X 2

X1,X2X1,Y1 X2,Y2X1 X2

X1,X 2 X1,X 2X 2

X 2old P(X1,X2)

Page 46: Variational Methods for Graphical Models

46

Example: Distribute Evidence

• Pass messages recursively to neighboring nodes

• Pass message from X1,X2 to X1,Y1:– 1. Projection:

– 2. Absorption:

X1 X1,X 2 P(X1,X2)P(X1)X 2

{X1,X 2} X1

X1,Y1 X1,Y1X1

X1old P(X1,Y1) P(X1)

P(X1)

Page 47: Variational Methods for Graphical Models

47

Example: Distribute Evidence (cont.)

• Pass message from X1,X2 to X2,Y2:– 1. Projection:

– 2. Absorption:

X 2 X1,X 2 P(X1,X2)P(X2)X1

{X1,X 2} X 2

X 2,Y 2 X 2,Y 2X 2

X 2old P(Y2 | X2) P(X2)

1P(Y2,X2)

X1,X2X1,Y1 X2,Y2X1 X2

Page 48: Variational Methods for Graphical Models

48

Example: Inference with evidence

• Assume we want to compute: P(X2|Y1=0,Y2=1) (state estimation)

• Assign likelihoods to the potential functions during initialization:

X1,Y1 0 if Y11

P(X1,Y10) if Y10

X 2,Y 2 0 if Y20

P(Y21 | X2) if Y21

Page 49: Variational Methods for Graphical Models

49

Example: Inference with evidence (cont.)

• Repeating the same steps as in the previous case, we obtain:

X1,Y1 0 if Y11

P(X1,Y10,Y21) if Y10

X1 P(X1,Y10,Y21)X1,X 2 P(X1,Y10,X2,Y21)X 2 P(Y10,X2,Y21)

X 2,Y 2 0 if Y20

P(Y10,X2,Y21) if Y21

Page 50: Variational Methods for Graphical Models

50

Variable EliminationGeneral idea:• Write query in the form

• Iteratively– Move all irrelevant terms outside of innermost sum– Perform innermost sum, getting a new term– Insert the new term into the product

}\{

)|(),(nxX i

iin paxPXP e

Page 51: Variational Methods for Graphical Models

51

x

kxkx yyxfyyf ),,,('),,( 11

m

ilikx i

yyxfyyxf1

,1,1,11 ),,(),,,('

Complexity of variable elimination

• Suppose in one elimination step we compute

This requires • multiplications

• additions

Complexity is exponential in number of variables in the intermediate factor

i

iYXm )Val()Val(

i

iYX )Val()Val(

Page 52: Variational Methods for Graphical Models

52

Chordal Graphs

• elimination ordering undirected chordal graph

Graph:• Maximal cliques are factors in elimination• Factors in elimination are cliques in the graph• Complexity is exponential in size of the largest

clique in graph

LT

A B

X

V S

D

V S

LT

A B

X D

Page 53: Variational Methods for Graphical Models

53

Induced Width• The size of the largest clique in the induced

graph is thus an indicator for the complexity of variable elimination

• This quantity is called the induced width of a graph according to the specified ordering

• Finding a good ordering for a graph is equivalent to finding the minimal induced width of the graph

Page 54: Variational Methods for Graphical Models

54

Properties of Junction Trees

• In every junction tree:– For each cluster (or sepset) ,

– The probability distribution of any variable , using any cluster (or sepset) that contains

X)(XX P

VX V

}\{

)(V

VPX

X

Page 55: Variational Methods for Graphical Models

55

Exact inference Using Junction Trees

• Undirected tree• Each node is a cluster • Running intersection property:

– Given two clusters and , all clusters on the path between and contain

• Separator sets (sepsets): – Intersection of adjacent clusters

X YXY YX

ADEABD DEFAD DE

Cluster ABDSepset DE

Page 56: Variational Methods for Graphical Models

56

Constructing Junction Trees

Marrying ParentsX4

X6

X5X3

X2

X1

Page 57: Variational Methods for Graphical Models

57

Moral GraphX4

X6

X5X3

X2

X1

Page 58: Variational Methods for Graphical Models

58

TriangulationX4

X6

X5X3

X2

X1

Page 59: Variational Methods for Graphical Models

59

Identify CliquesX4

X6

X5X3

X2

X1

X2X5X6X1X2X3

X2X3X5 X2X4

Page 60: Variational Methods for Graphical Models

60

Junction Tree

• Junction tree is a subgraph of the clique graph satisfying the running intersection property

X1X2X3 X2X5X6X2X3X5X2X3 X2X5

X2

X2X5X6

X2X4

X1X2X3

X2X3X5 X2X4

Page 61: Variational Methods for Graphical Models

61

Constructing Junction Trees

DAG

Moral Graph

Triangulated Graph

Junction Tree

Identify Cliques

Page 62: Variational Methods for Graphical Models

62

Sequential Approach (Example)

• Lower bound for medical diagnosis ex: j j

jj

jj q

zafqzaf )()(

jij

ij

ijijij

j ij

jijiij

ij jiji

ij ijij

fdq

fdq

qd

fq

df

d

i

e

e

e

edfP

)()1()(

)(

)(

0|

0|

|0|

)(0

)( 0

1)|1(