Modeling Rich Structured Data via Kernel Distribution...

55
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Le Song Lecture 19, Nov 1, 2012 Reading: Chap 8, C. Bishop Book Inference in Graphical Models

Transcript of Modeling Rich Structured Data via Kernel Distribution...

Page 1: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012

Le Song

Lecture 19, Nov 1, 2012

Reading: Chap 8, C. Bishop Book

Inference in Graphical Models

Page 2: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Conditional Independence Assumptions

Global Markov Assumption

๐ด โŠฅ ๐ต|๐ถ, ๐‘ ๐‘’๐‘๐บ ๐ด, ๐ต; ๐ถ

2

Local Markov Assumption

๐‘‹ โŠฅ ๐‘๐‘œ๐‘›๐‘‘๐‘’๐‘ ๐‘๐‘’๐‘›๐‘‘๐‘Ž๐‘›๐‘ก๐‘‹|๐‘ƒ๐‘Ž๐‘‹

๐ด ๐ถ ๐ต ๐‘‹

๐‘ƒ๐‘Ž๐‘‹

๐‘๐‘œ๐‘›๐‘‘๐‘’๐‘ ๐‘๐‘’๐‘›๐‘‘๐‘Ž๐‘›๐‘ก๐‘‹

๐ต๐‘ ๐‘€๐‘

๐‘ƒ

๐ต๐‘ ๐‘€๐‘

Moralize

Triangulate

Undirected Tree Undirected Chordal Graph

Page 3: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Distribution Factorization

Bayesian Networks (Directed Graphical Models) ๐ผ โˆ’ ๐‘š๐‘Ž๐‘: ๐ผ๐‘™ ๐บ โŠ† ๐ผ ๐‘ƒ

โ‡”

๐‘ƒ(๐‘‹1, โ€ฆ , ๐‘‹๐‘›) = ๐‘ƒ(๐‘‹๐‘– | ๐‘ƒ๐‘Ž๐‘‹๐‘–)

๐‘›

๐‘–=1

3

Markov Networks (Undirected Graphical Models) ๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ก๐‘™๐‘ฆ ๐‘๐‘œ๐‘ ๐‘–๐‘ก๐‘–๐‘ฃ๐‘’ ๐‘ƒ, ๐ผ โˆ’ ๐‘š๐‘Ž๐‘: ๐ผ ๐บ โŠ† ๐ผ ๐‘ƒ

โ‡”

๐‘ƒ(๐‘‹1, โ€ฆ , ๐‘‹๐‘›) = 1

๐‘ ฮจ๐‘– ๐ท๐‘–

๐‘š

๐‘–=1

๐‘ = ฮจ๐‘– ๐ท๐‘–

๐‘š

๐‘–=1๐‘ฅ1,๐‘ฅ2,โ€ฆ,๐‘ฅ๐‘›

Clique Potentials

Conditional Probability Tables (CPTs)

Maximal Clique Normalization

(Partition Function)

Page 4: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Inference in Graphical Models

Graphical models give compact representations of probabilistic distributions ๐‘ƒ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› (n-way tables to much smaller tables)

How do we answer queries about ๐‘ƒ?

Compute likelihood

Compute conditionals

Compute maximum a posteriori assignment

We use inference as a name for the process of computing answers to such queries

4

Page 5: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Most queries involve evidence

Evidence ๐‘’ is an assignment of values to a set ๐ธ variables

Evidence are observations on some variables

Without loss of generality ๐ธ = ๐‘‹๐‘˜+1, โ€ฆ , ๐‘‹๐‘›

Simplest query: compute probability of evidence

๐‘ƒ ๐‘’ = โ€ฆ ๐‘ƒ(๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘˜ , ๐‘’)๐‘ฅ๐‘˜๐‘ฅ1

This is often referred to as computing the likelihood of ๐‘’

Query Type 1: Likelihood

5

๐ธ

Sum over this set of variables

Page 6: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query Type 2: Conditional Probability

Often we are interested in the conditional probability distribution of a variable given the evidence

๐‘ƒ ๐‘‹ ๐‘’ =๐‘ƒ ๐‘‹, ๐‘’

๐‘ƒ ๐‘’=๐‘ƒ(๐‘‹, ๐‘’)

๐‘ƒ(๐‘‹ = ๐‘ฅ, ๐‘’)๐‘ฅ

It is also called a posteriori belief in ๐‘‹ given evidence ๐‘’

We usually query a subset ๐‘Œ of all variables ๐’ณ = {๐‘Œ, ๐‘, ๐‘’} and โ€œdonโ€™t careโ€ about the remaining ๐‘

๐‘ƒ ๐‘Œ ๐‘’ = ๐‘ƒ(๐‘Œ, ๐‘ = ๐‘ง|๐‘’)

๐‘ง

Take all possible configuration of ๐‘ into account

The processes of summing out the unwanted variable Z is called marginalization

6

Page 7: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query Type 2: Conditional Probability Example

7

๐ธ

Sum over this set of variables

๐ธ Sum over this set of variables

Interested in the conditionals for these variables

Interested in the conditionals for these variables

Page 8: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Prediction: what is the probability of an outcome given the starting condition

The query node is a descendent of the evidence

Diagnosis: what is the probability of disease/fault given symptoms

The query node is an ancestor of the evidence

Learning under partial observations (Fill in the unobserved)

Information can flow in either direction

Inference can combine evidence from all parts of the networks

Application of a posteriori Belief

8

๐ด ๐ต ๐ถ

๐ด ๐ต ๐ถ

Page 9: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query Type 3: Most Probable Assignment

Want to find the most probably joint assignment for some variables of interests

Such reasoning is usually performed under some given evidence ๐‘’, and ignoring (the values of other variables) ๐‘

Also called maximum a posteriori (MAP) assignment for ๐‘Œ

๐‘€๐ด๐‘ƒ ๐‘Œ ๐‘’ = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐‘ฆ ๐‘ƒ ๐‘Œ ๐‘’ = ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐‘ฆ ๐‘ƒ ๐‘Œ, ๐‘ = ๐‘ง ๐‘’ ๐‘ง

9

๐ธ

Sum over this set of variables

Interested in the most probable values for these variables

Page 10: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Application of MAP assignment

Classification

Find most likely label, given the evidence

Explanation

What is the most likely scenario, given the evidence

Cautionary note:

The MAP assignment of a variable dependence on its context โ€“ the set of variables being jointly queried

Example:

MAP of ๐‘‹, ๐‘Œ ?

(0, 0)

MAP of ๐‘‹?

1

10

X Y P(X,Y)

0 0 0.35

0 1 0.05

1 0 0.3

1 1 0.3

X P(X)

0 0.4

1 0.6

Page 11: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Computing the a posteriori belief ๐‘ƒ ๐‘‹ ๐‘’ in a GM is NP-hard in general

Hardness implies we cannot find a general procedure that works efficiently for arbitrary GMs

For particular families of GMs, we can have provably efficient procedures

For some families of GMs, we need to design efficient approximate inference algorithms

Complexity of Inference

11

eg. trees

eg. grids

Page 12: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Approaches to inference

Exact inference algorithms

Variable elimination algorithm

Message-passing algorithm (sum-product, belief propagation algorithm)

The junction tree algorithm

Approximate inference algorithms

Sampling methods/Stochastic simulation

Variational algorithms

12

Page 13: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Marginalization and Elimination

A metabolic pathway: What is the likelihood protein ๐ธ is produced

Query: P(E)

๐‘ƒ ๐ธ = ๐‘ƒ ๐‘Ž, ๐‘, ๐‘, ๐‘‘, ๐ธ๐‘Ž๐‘๐‘๐‘‘

Using graphical models, we get

๐‘ƒ ๐ธ = ๐‘ƒ ๐‘Ž)๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ(๐ธ|๐‘‘๐‘Ž๐‘๐‘๐‘‘

13

๐ด ๐ต ๐ถ ๐ท ๐ธ

Naรฏve summation needs to enumerate over an

exponential number of terms

Page 14: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Rearranging terms and the summations

๐‘ƒ ๐ธ

= ๐‘ƒ ๐‘Ž)๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ(๐ธ|๐‘‘

๐‘Ž๐‘๐‘๐‘‘

= ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘Ž

๐‘Ž๐‘๐‘๐‘‘

Elimination in Chains

14

๐ด ๐ต ๐ถ ๐ท ๐ธ

Page 15: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Elimination in Chains (cont.)

Now we can perform innermost summation efficiently

๐‘ƒ ๐ธ

= ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘Ž

๐‘Ž๐‘๐‘๐‘‘

= ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ(๐‘)

๐‘๐‘๐‘‘

The innermost summation eliminates one variable from our summation argument at a local cost.

15

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘ƒ(๐‘)

Equivalent to matrix-vector multiplication, |Val(A)| * |Val(B)|

Page 16: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Elimination in Chains (cont.)

Rearranging and then summing again, we get

๐‘ƒ ๐ธ

= ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐‘’ ๐‘‘ ๐‘ƒ(๐‘)

๐‘๐‘๐‘‘

= ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘

๐‘๐‘๐‘‘

= ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ(๐‘)

๐‘๐‘‘

16

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘ƒ(๐‘) ๐‘ƒ(๐‘)

Equivalent to matrix-vector multiplication, |Val(B)| * |Val(C)|

C B 0 1

0 0 .15 0.35

1 0.85 0.65

B 0

0 0 .25

1 0.75

Page 17: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Elimination in Chains (cont.)

Eliminate nodes one by one all the way to the end

๐‘ƒ ๐ธ = ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ(๐‘‘)

๐‘‘

Computational Complexity for a chain of length ๐‘˜

Each step costs O(|Val(๐‘‹๐‘–)| * |Val(๐‘‹๐‘–+1)|) operations: O(๐‘˜๐‘›2)

ฮจ ๐‘‹๐‘– = ๐‘ƒ ๐‘‹๐‘– ๐‘‹๐‘–โˆ’1)๐‘ƒ(๐‘‹๐‘–โˆ’1)๐‘ฅ๐‘–โˆ’1

Compare to naรฏve summation: O(๐‘›๐‘˜)

โ€ฆ ๐‘ƒ(๐‘ฅ1, โ€ฆ , ๐‘‹๐‘˜)๐‘ฅ๐‘˜โˆ’1๐‘ฅ1

17

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘ƒ(๐‘) ๐‘ƒ(๐‘)

Page 18: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Undirected Chains

18

๐ด ๐ต ๐ถ ๐ท ๐ธ

Rearrange terms, perform local summation โ€ฆ

๐‘ƒ ๐ธ

= 1

๐‘ฮจ ๐‘, ๐‘Ž ฮจ ๐‘, ๐‘ ฮจ ๐‘‘, ๐‘ ฮจ(๐ธ, ๐‘‘)

๐‘Ž๐‘๐‘๐‘‘

=1

๐‘ ฮจ ๐‘, ๐‘ ฮจ ๐‘‘, ๐‘ ฮจ ๐ธ, ๐‘‘ ฮจ ๐‘, ๐‘Ž

๐‘Ž๐‘๐‘๐‘‘

=1

๐‘ ฮจ ๐‘, ๐‘ ฮจ ๐‘‘, ๐‘ ฮจ ๐ธ, ๐‘‘ ฮจ ๐‘

๐‘๐‘๐‘‘

Page 19: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

The Sum-Product Operation

During inference, we try to compute an expression

Sum-product form: ฮจฮจโˆˆ๐“•๐‘

๐“ง = {๐‘‹1, โ€ฆ , ๐‘‹๐‘›} the set of variables

๐“• a set of factors such that for each ฮจ โˆˆ ๐“•, ๐‘†๐‘๐‘œ๐‘๐‘’ ฮจ โˆˆ ๐“ง

๐“จ โŠ‚ ๐“ง a set of query variables

๐“ฉ = ๐“งโˆ’๐“จ the variables to eliminate

The result of eliminating the variables in ๐“ฉ is a factor

๐œ ๐“จ = ฮจ

ฮจโˆˆ๐“•๐‘ง

This factor does not necessarily correspond to any probability or conditional probability in the network.

๐‘ƒ ๐“จ =๐œ(๐“จ)

๐œ(๐“จ)

19

Page 20: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Inference via Variable Elimination

General Idea

Write query in the form

๐‘ƒ ๐‘‹1, ๐‘’ = โ€ฆ ๐‘ƒ ๐‘ฅ๐‘– ๐‘ƒ๐‘Ž๐‘‹๐‘–๐‘–๐‘ฅ2๐‘ฅ3๐‘ฅ๐‘›

The sum is ordered to suggest an elimination order

Then iteratively

Move all irrelevant terms outside of innermost sum

Perform innermost sum, getting a new term

Insert the new term into the product

Finally renormalize

๐‘ƒ ๐‘‹1 ๐‘’ = ๐œ ๐‘‹1, ๐‘’

๐œ(๐‘‹1, ๐‘’)๐‘ฅ1

20

Page 21: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

A more complex network

A food web

What is the probability ๐‘ƒ ๐ด ๐ป that hawks are leaving given that the grass condition is poor?

21

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

Page 22: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ, ๐ท, ๐ธ, ๐น, ๐บ, ๐ป

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

Choose an elimination order: ๐ป, ๐บ, ๐น, ๐ธ, ๐ท, ๐ถ, ๐ต (<)

Step 1: Eliminate G

Conditioning (fix the evidence node on its observed value)

๐‘šโ„Ž ๐‘’, ๐‘“ = ๐‘ƒ(๐ป = โ„Ž|๐‘’, ๐‘“)

Example: Variable Elimination

22

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ

Page 23: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ, ๐ท, ๐ธ, ๐น, ๐บ

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž(๐‘’, ๐‘“)

Step 2: Eliminate ๐บ

Compute ๐‘š๐‘” ๐‘’ = ๐‘ƒ ๐‘” ๐‘’ ๐‘” = 1

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘š๐‘” ๐‘’ ๐‘šโ„Ž(๐‘’, ๐‘“)

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž(๐‘’, ๐‘“)

Example: Variable Elimination

23

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

Page 24: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ, ๐ท, ๐ธ, ๐น

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

Step 3: Eliminate ๐น

Compute ๐‘š๐‘“ ๐‘’, ๐‘Ž = ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž(๐‘’, ๐‘“) ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“(๐‘’, ๐‘Ž)

Example: Variable Elimination

24

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ

Page 25: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ, ๐ท, ๐ธ

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“ ๐‘Ž, ๐‘’

Step 3: Eliminate ๐ธ

Compute ๐‘š๐‘’ ๐‘Ž, ๐‘, ๐‘‘ = ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“(๐‘Ž, ๐‘’) ๐‘’

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’(๐‘Ž, ๐‘, ๐‘‘)

Example: Variable Elimination

25

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

Page 26: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ, ๐ท

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“ ๐‘Ž, ๐‘’

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’ ๐‘Ž, ๐‘, ๐‘‘

Step 3: Eliminate ๐ท

Compute ๐‘š๐‘‘ ๐‘Ž, ๐‘ = ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’(๐‘Ž, ๐‘, ๐‘‘) ๐‘‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘š๐‘‘(๐‘Ž, ๐‘)

Example: Variable Elimination

26

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ

Page 27: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต, ๐ถ

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“ ๐‘Ž, ๐‘’

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’ ๐‘Ž, ๐‘, ๐‘‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘š๐‘‘ ๐‘Ž, ๐‘

Step 3: Eliminate ๐ถ

Compute ๐‘š๐‘ ๐‘Ž, ๐‘ = ๐‘ƒ ๐‘ ๐‘ ๐‘š๐‘‘(๐‘Ž, ๐‘) ๐‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘š๐‘(๐‘Ž, ๐‘)

Example: Variable Elimination

27

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ

Page 28: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to eliminate ๐ต

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“ ๐‘Ž, ๐‘’

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’ ๐‘Ž, ๐‘, ๐‘‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘š๐‘‘ ๐‘Ž, ๐‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘š๐‘ ๐‘Ž, ๐‘

Step 3: Eliminate ๐ถ

Compute ๐‘š๐‘ ๐‘Ž = ๐‘ƒ(๐‘)๐‘š๐‘(๐‘Ž, ๐‘) ๐‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘š๐‘(๐‘Ž)

Example: Variable Elimination

28

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ

Page 29: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Query: ๐‘ƒ(๐ด|โ„Ž), need to renormalize over ๐ด

Initial factors ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘ƒ โ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘ƒ ๐‘” ๐‘’ ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘ƒ ๐‘“ ๐‘Ž ๐‘šโ„Ž ๐‘’, ๐‘“

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ๐‘š๐‘“ ๐‘Ž, ๐‘’

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘Ž ๐‘š๐‘’ ๐‘Ž, ๐‘, ๐‘‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ ๐‘š๐‘‘ ๐‘Ž, ๐‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘ƒ ๐‘ ๐‘š๐‘ ๐‘Ž, ๐‘

โ‡’ ๐‘ƒ ๐‘Ž ๐‘š๐‘ ๐‘Ž

Step 3: renormalize

๐‘ƒ ๐‘Ž, โ„Ž = ๐‘ƒ ๐‘Ž ๐‘š๐‘ ๐‘Ž , compute ๐‘ƒ(โ„Ž) = ๐‘ƒ ๐‘Ž ๐‘š๐‘(๐‘Ž)๐‘Ž

โ‡’ ๐‘ƒ ๐‘Ž โ„Ž = ๐‘ƒ ๐‘Ž ๐‘š๐‘(๐‘Ž)

๐‘ƒ ๐‘Ž ๐‘š๐‘(๐ด)๐‘Ž

Example: Variable Elimination

29

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ

Page 30: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Complexity of variable elimination

Suppose in one elimination step we compute

๐‘š๐‘ฅ ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜ = ๐‘š๐‘ฅโ€ฒ (๐‘ฅ, ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜)๐‘ฅ

๐‘š๐‘ฅโ€ฒ ๐‘ฅ, ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜ = ๐‘š๐‘– ๐‘ฅ, ๐‘ฆ๐‘๐‘–

๐‘˜๐‘–=1

This requires

๐‘˜ โˆ— ๐‘‰๐‘Ž๐‘™ ๐‘‹ โˆ— ๐‘‰๐‘Ž๐‘™ ๐‘Œ๐‘๐‘–๐‘– multiplications

For each value of ๐‘ฅ, ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜, we do k multiplications

๐‘‰๐‘Ž๐‘™ ๐‘‹ โˆ— ๐‘‰๐‘Ž๐‘™ ๐‘Œ๐‘๐‘–๐‘– additions

For each value of ๐‘ฆ1, โ€ฆ , ๐‘ฆ๐‘˜, we do ๐‘‰๐‘Ž๐‘™ ๐‘‹ additions

Complexity is exponential in the number of variables in the intermediate factor

30

๐‘‹

๐‘ฆ1 ๐‘ฆ๐‘˜ ๐‘ฆ๐‘–

Page 31: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Inference in Graphical Models

General form of the inference problem

๐‘ƒ ๐‘‹1, โ€ฆ , ๐‘‹๐‘› โˆ ฮจ(๐ท๐‘–)๐‘–

Want to query ๐‘Œ variable given evidence ๐‘’, and โ€œdonโ€™t careโ€ a set of ๐‘ variables

Compute ๐œ ๐‘Œ, ๐‘’ = ฮจ(๐ท๐‘–)๐‘–๐‘ using variable elimination

Renormalize to obtain the conditionals ๐‘ƒ ๐‘Œ|๐‘’ =๐œ(๐‘Œ,๐‘’)

๐œ(๐‘Œ,๐‘’)๐‘Œ

Two examples: use graph structure

to order computation

31

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

Chain:

DAG:

Page 32: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

From Variable Elimination to Message Passing

Recall that induced dependency during marginalization is captured in elimination cliques

Summation Elimination

Intermediate term Elimination cliques

Can this lead to an generic inference algorithm?

32

Page 33: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Nice localization in computation

๐‘ƒ ๐ธ = ๐‘ƒ ๐‘Ž)๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ(๐ธ|๐‘‘๐‘Ž๐‘๐‘๐‘‘

๐‘ƒ ๐ธ = ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ ๐‘‘ ๐‘ ( ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘Ž)๐‘Ž๐‘๐‘๐‘‘

Chain: Query E

33

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐‘

๐‘š๐ต๐ถ ๐‘

๐‘š๐ถ๐ท ๐‘‘

๐‘ƒ ๐ธ = ๐‘š๐ท๐ธ ๐ธ

๐‘š๐ด๐ต ๐‘ ๐‘š๐ต๐ถ ๐‘ ๐‘š๐ถ๐ท ๐‘‘ ๐‘š๐ท๐ธ ๐ธ

Page 34: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Start elimination away from the query variable

๐‘ƒ(๐ถ) = ๐‘ƒ ๐‘Ž)๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘‘ ๐‘ ๐‘ƒ(๐‘’|๐‘‘๐‘Ž๐‘๐‘’๐‘‘

๐‘ƒ(๐ถ) = ( ๐‘ƒ ๐‘‘ ๐ถ ( ๐‘ƒ(๐‘’|๐‘‘))) ( ๐‘ƒ ๐ถ ๐‘ ( ๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘Ž๐‘Ž๐‘ )๐‘’๐‘‘ )

Chain: Query C

34

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐‘

๐‘š๐ต๐ถ ๐ถ

๐‘š๐ท๐ธ ๐‘‘

๐‘š๐ท๐ถ ๐ถ

๐‘ƒ ๐ถ = ๐‘š๐ท๐ถ ๐ถ ๐‘š๐ต๐ถ(๐ถ)

๐‘š๐ด๐ต ๐‘ ๐‘š๐ต๐ถ ๐ถ ๐‘š๐ท๐ถ ๐ถ ๐‘š๐ธ๐ท ๐‘‘

Page 35: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Chain: What if I want to query everybody

๐‘ƒ ๐ต = ( ๐‘ƒ ๐‘ ๐ต ( ๐‘ƒ ๐‘‘ ๐‘๐‘‘๐‘ ( ๐‘ƒ ๐‘’ ๐‘‘ )))๐‘’ ๐‘ƒ ๐ต ๐‘Ž ๐‘ƒ ๐‘Ž๐‘Ž

Query ๐‘ƒ ๐ด , ๐‘ƒ ๐ต , ๐‘ƒ ๐ถ , ๐‘ƒ ๐ท , ๐‘ƒ(๐ธ)

Computational cost

Each message ๐‘‚ ๐พ2

Chain length is ๐ฟ

Cost for each query is about ๐‘‚ ๐ฟ๐พ2

For ๐ฟ queries, cost is about ๐‘‚ ๐ฟ2๐พ2

35

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐ต ๐‘š๐ถ๐ต ๐ต ๐‘š๐ท๐ถ ๐‘ ๐‘š๐ธ๐ท ๐‘‘

Page 36: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

What is shared in these queries?

๐‘ƒ ๐ต = ( ๐‘ƒ ๐‘ ๐ต ( ๐‘ƒ ๐‘‘ ๐‘๐‘‘๐‘ ( ๐‘ƒ ๐‘’ ๐‘‘ )))๐‘’ ๐‘ƒ ๐ต ๐‘Ž ๐‘ƒ ๐‘Ž๐‘Ž

๐‘ƒ ๐ธ = ๐‘ƒ ๐ธ ๐‘‘ ๐‘ƒ ๐‘‘ ๐‘ ( ๐‘ƒ ๐‘ ๐‘ ๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘Ž)๐‘Ž๐‘๐‘๐‘‘

๐‘ƒ ๐ถ = ( ๐‘ƒ ๐‘‘ ๐ถ ( ๐‘ƒ(๐‘’|๐‘‘))) ( ๐‘ƒ ๐ถ ๐‘ ( ๐‘ƒ ๐‘ ๐‘Ž ๐‘ƒ ๐‘Ž๐‘Ž๐‘ )๐‘’๐‘‘ )

36

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐‘ ๐‘š๐ต๐ถ ๐‘ ๐‘š๐ถ๐ท ๐‘‘ ๐‘š๐ท๐ธ ๐ธ

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐‘ ๐‘š๐ต๐ถ ๐ถ ๐‘š๐ท๐ถ ๐ถ ๐‘š๐ธ๐ท ๐‘‘

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐ต ๐‘š๐ถ๐ต ๐ต ๐‘š๐ท๐ถ ๐‘ ๐‘š๐ธ๐ท ๐‘‘

The number of unique message is 2(๐ฟ โˆ’ 1)

Page 37: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Forward-backward algorithm

Compute and cache the 2(๐ฟ โˆ’ 1) unique messages

In query time, just multiply together the messages from the neighbors

eg. ๐‘ƒ ๐ท = ๐‘š๐ถ๐ท ๐ท ๐‘š๐ธ๐ท(๐ท)

37

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ด๐ต ๐‘ ๐‘š๐ต๐ถ ๐‘ ๐‘š๐ถ๐ท ๐‘‘ ๐‘š๐ท๐ธ ๐‘’

Forward pass:

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ต๐ด ๐‘Ž ๐‘š๐ถ๐ต ๐‘ ๐‘š๐ท๐ถ ๐‘ ๐‘š๐ธ๐ท ๐‘‘

Backward pass:

๐ด ๐ต ๐ถ ๐ท ๐ธ

๐‘š๐ถ๐ท ๐ท ๐‘š๐ธ๐ท ๐ท For all queries, ๐‘‚ 2๐ฟ๐พ2

Page 38: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

DAG: Variable elimination

Elimination order H, G, F, E, B, C, D

๐‘ƒ ๐ด =

๐‘ƒ ๐ด ๐‘ƒ ๐‘‘ ๐ด ( ( ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘ )( ๐‘ƒ ๐‘’ ๐‘, ๐‘‘ ( ๐‘ƒ ๐‘” ๐‘’ )( ๐‘ƒ ๐‘“ ๐ด ๐‘ƒ โ„Ž ๐‘’, ๐‘“ ))) โ„Ž ๐‘“๐‘” ๐‘’๐‘๐‘๐‘‘

38

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ป(๐ธ๐น) ๐‘’, ๐‘“

๐‘š๐น(๐ด๐ธ) ๐ด, ๐‘’

๐‘š๐บ๐ธ ๐‘’

๐‘š๐ธ(๐ด๐ถ๐ท) ๐ด, ๐‘, ๐‘‘

๐‘š๐ต๐ถ ๐‘

๐‘š๐ถ(๐ด๐ท) ๐ด, ๐‘‘

๐‘š๐ท๐ด ๐ด

4-way tables

created!

Page 39: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

DAG: Cliques of size 4 are generated

39

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ป(๐ธ๐น) ๐‘’, ๐‘“

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐บ๐ธ ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐น(๐ด๐ธ) ๐ด, ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ธ(๐ด๐ถ๐ท) ๐ด, ๐‘, ๐‘‘

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ต๐ถ ๐‘

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ถ(๐ด๐ท) ๐ด, ๐‘‘

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ท๐ด ๐ด

4-way tables

created!

Page 40: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

DAG: A different elimination order

Elimination order G, H, F, B, C, D, E

๐‘ƒ ๐ด

= ( ๐‘ƒ(๐‘‘|๐ด)๐‘‘ ๐‘ƒ(๐‘’|๐‘, ๐‘‘)๐‘ ๐‘ƒ ๐‘ ๐‘ƒ ๐‘ ๐‘๐‘ ๐‘ƒ ๐‘“ ๐ด ๐‘ƒ โ„Ž ๐‘’, ๐‘“โ„Ž๐‘“ ๐‘ƒ ๐‘” ๐‘’๐‘” )๐‘’

40

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐บ๐ธ ๐‘’

๐‘š๐น(๐ด๐ธ) ๐ด, ๐‘’

๐‘š๐ป(๐ธ๐น) ๐‘’, ๐‘“

๐‘š๐ถ(๐ธ๐ท) ๐‘’, ๐‘‘

๐‘š๐ต๐ถ ๐‘

๐‘š๐ธ๐ด ๐ด

๐‘š๐ท(๐ด๐ธ) ๐ด, ๐‘’

NO 4-way tables!

Page 41: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

DAG: No cliques of size 4

41

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐บ๐ธ ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ป(๐ธ๐น) ๐‘’, ๐‘“

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐น(๐ด๐ธ) ๐ด, ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ต๐ถ ๐‘

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ถ(๐ท๐ธ) ๐‘‘, ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ท(๐ด๐ธ) ๐ด, ๐‘’

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐‘š๐ธ๐ด ๐ด

Page 42: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Any thoughts?

Chain has nice properties

forward-backward algorithm works

Immediate results (messages) along edges

Can we generalize to other graphs? (trees, loopy graphs?)

How about undirected trees? Is there a forward-backward algorithm?

Loopy graph is more complicated Different elimination order results in different computational cost

Can we somehow make loopy graph behave like trees?

42

Page 43: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Tree Graphical Models

43

Undirected tree: a unique path between any pair of nodes

Directed tree: all nodes except the root have exactly one parent

Page 44: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Equivalence of directed and undirected trees

Any undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it

A directed tree and the corresponding undirected tree make the conditional independence assertions

Parameterization are essentially the same

Undirected tree: ๐‘ƒ ๐‘‹ =1

๐‘ ฮจ ๐‘‹๐‘– ฮจ(๐‘‹๐‘– , ๐‘‹๐‘—)(๐‘–,๐‘—)โˆˆE๐‘–โˆˆV

Directed tree: ๐‘ƒ ๐‘‹ = ๐‘ƒ ๐‘‹๐‘Ÿ ๐‘ƒ(๐‘‹๐‘—|๐‘‹๐‘–)๐‘–,๐‘— โˆˆ๐ธ

Equivalence: ฮจ ๐‘‹๐‘– = ๐‘ƒ ๐‘‹๐‘Ÿ , ฮจ ๐‘‹๐‘– , ๐‘‹๐‘— = ๐‘ƒ ๐‘‹๐‘— ๐‘‹๐‘– , ๐‘ =

1,ฮจ ๐‘‹๐‘– = 1

44

Page 45: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Message passing on trees

Message passed along tree edges

๐‘ƒ ๐‘‹๐‘–, ๐‘‹๐‘— , ๐‘‹๐‘˜ , ๐‘‹๐‘™, ๐‘‹๐‘“ โˆ

ฮจ ๐‘‹๐‘– ฮจ ๐‘‹๐‘— ฮจ ๐‘‹๐‘˜ ฮจ ๐‘‹๐‘™ ฮจ ๐‘‹๐‘“ ฮจ ๐‘‹๐‘– , ๐‘‹๐‘— ฮจ ๐‘‹๐‘˜ , ๐‘‹๐‘— ฮจ ๐‘‹๐‘™ , ๐‘‹๐‘— ฮจ(๐‘‹๐‘– , ๐‘‹๐‘“)

๐‘ƒ ๐‘“ = ฮจ(๐‘‹๐‘“) (ฮจ ๐‘‹๐‘– ฮจ ๐‘‹๐‘– , ๐‘‹๐‘“ ฮจ ๐‘‹๐‘— ฮจ ๐‘‹๐‘– , ๐‘‹๐‘— ( ฮจ ๐‘‹๐‘˜ ฮจ ๐‘‹๐‘˜ , ๐‘‹๐‘—๐‘ฅ๐‘˜ )( ฮจ ๐‘‹๐‘™ ฮจ ๐‘‹๐‘™ , ๐‘‹๐‘—๐‘ฅ๐‘™ )๐‘ฅ๐‘— )๐‘ฅ๐‘–

45

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘š๐‘™๐‘— ๐‘‹๐‘— ๐‘š๐‘˜๐‘— ๐‘‹๐‘˜

๐‘š๐‘—๐‘– ๐‘‹๐‘–

๐‘š๐‘–๐‘“ ๐‘‹๐‘“

Page 46: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Sharing messages on trees

Query f

Query j

46

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘–๐‘— ๐‘‹๐‘— ๐‘š๐‘“๐‘– ๐‘‹๐‘–

Page 47: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Computational cost for all queries

Query ๐‘ƒ ๐‘‹๐‘˜ , ๐‘ƒ ๐‘‹๐‘™ , ๐‘ƒ ๐‘‹๐‘— , ๐‘ƒ ๐‘‹๐‘– , ๐‘ƒ ๐‘‹๐‘“

Doing things separately

Each message ๐‘‚ ๐พ2

Number of edges is ๐ฟ

Cost for each query is about ๐‘‚ ๐ฟ๐พ2

For ๐ฟ queries, cost is about ๐‘‚ ๐ฟ2๐พ2

47

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘–๐‘— ๐‘‹๐‘— ๐‘š๐‘“๐‘– ๐‘‹๐‘–

Page 48: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Forward-backward algorithm in trees

Forward: pick one leave as root, compute all messages, cache

Backward: pick another root, compute all messages, cache

Eg. Query j

48

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘—๐‘˜ ๐‘‹๐‘˜

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘–๐‘— ๐‘‹๐‘— ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘–๐‘— ๐‘‹๐‘—

resuse

Page 49: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Computational saving for trees

Compute forward and backward messages for each edge, save them

Doing things separately

Each message ๐‘‚ ๐พ2

Number of edges is ๐ฟ

2๐ฟ unique messages

Cost for all queries is about ๐‘‚ 2๐ฟ๐พ2

49

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘š๐‘“๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘— ๐‘‹๐‘— ๐‘š๐‘—๐‘˜ ๐‘‹๐‘˜

๐‘š๐‘—๐‘™ ๐‘‹๐‘™

Page 50: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Message passing algorithm

๐‘š๐‘—๐‘– ๐‘‹๐‘– โˆ ฮจ ๐‘‹๐‘– , ๐‘‹๐‘—๐‘‹๐‘—ฮจ ๐‘‹๐‘— ๐‘š๐‘ ๐‘— ๐‘‹๐‘—๐‘ โˆˆN ๐‘— \i

50

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘–

N ๐‘— \i

๐‘๐‘Ÿ๐‘œ๐‘‘๐‘ข๐‘๐‘ก ๐‘œ๐‘“ ๐‘–๐‘›๐‘๐‘œ๐‘š๐‘–๐‘›๐‘” ๐‘š๐‘’๐‘ ๐‘ ๐‘Ž๐‘”๐‘’๐‘ 

๐‘š๐‘ข๐‘™๐‘ก๐‘–๐‘๐‘™๐‘ฆ ๐‘๐‘ฆ ๐‘™๐‘œ๐‘๐‘Ž๐‘™ ๐‘๐‘œ๐‘ก๐‘’๐‘›๐‘ก๐‘–๐‘Ž๐‘™๐‘ 

๐‘†๐‘ข๐‘š ๐‘œ๐‘ข๐‘ก ๐‘‹๐‘— ๐‘‹๐‘— can send

message when incoming messages from ๐‘ ๐‘— \i arrive

Page 51: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

From Variable Elimination to Message Passing

Recall Variable Elimination Algorithm

Choose an ordering in which the query node ๐‘“ is the final node

Eliminate node ๐‘– by removing all potentials containing ๐‘–, take sum/product over ๐‘ฅ๐‘–

Place the resultant factor back

For a Tree graphical model:

Choose query node f as the root of the tree

View tree as a directed tree with edges pointing towards ๐‘“

Elimination of each node can be considered as message-passing directly along tree branches, rather than on some transformed graphs

Thus, we can use the tree itself as a data-structure to inference

51

Page 52: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

How about general graph?

Trees are nice

Can just compute two messages for each edge

Order computation along the graph

Associate intermediate results with edges

General graph is not so clear

Different elimination generate different cliques and factor size

Computation and immediate results not associated with edges

Local computation view is not so clear

52

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘“ ๐‘‹๐‘“

๐‘š๐‘“๐‘– ๐‘‹๐‘– ๐‘š๐‘–๐‘— ๐‘‹๐‘— ๐‘š๐‘—๐‘˜ ๐‘‹๐‘˜

๐‘š๐‘—๐‘™ ๐‘‹๐‘™

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

๐ด ๐ต

๐ถ ๐ท

๐ธ ๐น

๐บ ๐ป

Can we make them tree like or treat them

as trees?

Page 53: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Message passing for loopy graph

Local message passing for trees guarantees the consistency of local marginals

๐‘ƒ ๐‘‹๐‘– computed is the correct one

๐‘ƒ ๐‘‹๐‘– , ๐‘‹๐‘— computed is the correct on

โ€ฆ

For loopy graphs, no consistency guarantees for local message passing

53

๐‘“ ๐‘– ๐‘—

๐‘˜

๐‘™

๐‘š๐‘˜๐‘— ๐‘‹๐‘—

๐‘š๐‘™๐‘— ๐‘‹๐‘—

๐‘š๐‘—๐‘– ๐‘‹๐‘–

Page 54: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Inference for loopy graph models is NP-hard in general

Treat loopy graphs locally as if they were trees

Iteratively estimate the marginal

Read in messages

Process messages

Send updated out messages

Repeat for all variables until convergence

Loopy belief propagation

54

A

Page 55: Modeling Rich Structured Data via Kernel Distribution ...lsong/teaching/CSE6740/lecture18-inferenceGM.pdf0 1 0.05 1 0 0.3 1 1 0.3 X P(X) 0 0.4 1 0.6 . Computing the a posteriori belief

Message update schedule

Synchronous update:

๐‘‹๐‘— can send message when incoming messages from ๐‘ ๐‘— \i

arrive

Slow

Provably correct for tree, may converge for loopy graphs

Asynchronous update:

๐‘‹๐‘— can send message when there is a change in any incoming messages

from ๐‘ ๐‘— \i

Fast

Not easy to prove convergence, but empirically it often works

55