Post on 07-Apr-2020
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012
Le Song
Lecture 19, Nov 1, 2012
Reading: Chap 8, C. Bishop Book
Inference in Graphical Models
Conditional Independence Assumptions
Global Markov Assumption
π΄ β₯ π΅|πΆ, π πππΊ π΄, π΅; πΆ
2
Local Markov Assumption
π β₯ ππππππ πππππππ‘π|πππ
π΄ πΆ π΅ π
πππ
ππππππ πππππππ‘π
π΅π ππ
π
π΅π ππ
Moralize
Triangulate
Undirected Tree Undirected Chordal Graph
Distribution Factorization
Bayesian Networks (Directed Graphical Models) πΌ β πππ: πΌπ πΊ β πΌ π
β
π(π1, β¦ , ππ) = π(ππ | ππππ)
π
π=1
3
Markov Networks (Undirected Graphical Models) π π‘ππππ‘ππ¦ πππ ππ‘ππ£π π, πΌ β πππ: πΌ πΊ β πΌ π
β
π(π1, β¦ , ππ) = 1
π Ξ¨π π·π
π
π=1
π = Ξ¨π π·π
π
π=1π₯1,π₯2,β¦,π₯π
Clique Potentials
Conditional Probability Tables (CPTs)
Maximal Clique Normalization
(Partition Function)
Inference in Graphical Models
Graphical models give compact representations of probabilistic distributions π π1, β¦ , ππ (n-way tables to much smaller tables)
How do we answer queries about π?
Compute likelihood
Compute conditionals
Compute maximum a posteriori assignment
We use inference as a name for the process of computing answers to such queries
4
Most queries involve evidence
Evidence π is an assignment of values to a set πΈ variables
Evidence are observations on some variables
Without loss of generality πΈ = ππ+1, β¦ , ππ
Simplest query: compute probability of evidence
π π = β¦ π(π₯1, β¦ , π₯π , π)π₯ππ₯1
This is often referred to as computing the likelihood of π
Query Type 1: Likelihood
5
πΈ
Sum over this set of variables
Query Type 2: Conditional Probability
Often we are interested in the conditional probability distribution of a variable given the evidence
π π π =π π, π
π π=π(π, π)
π(π = π₯, π)π₯
It is also called a posteriori belief in π given evidence π
We usually query a subset π of all variables π³ = {π, π, π} and βdonβt careβ about the remaining π
π π π = π(π, π = π§|π)
π§
Take all possible configuration of π into account
The processes of summing out the unwanted variable Z is called marginalization
6
Query Type 2: Conditional Probability Example
7
πΈ
Sum over this set of variables
πΈ Sum over this set of variables
Interested in the conditionals for these variables
Interested in the conditionals for these variables
Prediction: what is the probability of an outcome given the starting condition
The query node is a descendent of the evidence
Diagnosis: what is the probability of disease/fault given symptoms
The query node is an ancestor of the evidence
Learning under partial observations (Fill in the unobserved)
Information can flow in either direction
Inference can combine evidence from all parts of the networks
Application of a posteriori Belief
8
π΄ π΅ πΆ
π΄ π΅ πΆ
Query Type 3: Most Probable Assignment
Want to find the most probably joint assignment for some variables of interests
Such reasoning is usually performed under some given evidence π, and ignoring (the values of other variables) π
Also called maximum a posteriori (MAP) assignment for π
ππ΄π π π = ππππππ₯π¦ π π π = ππππππ₯π¦ π π, π = π§ π π§
9
πΈ
Sum over this set of variables
Interested in the most probable values for these variables
Application of MAP assignment
Classification
Find most likely label, given the evidence
Explanation
What is the most likely scenario, given the evidence
Cautionary note:
The MAP assignment of a variable dependence on its context β the set of variables being jointly queried
Example:
MAP of π, π ?
(0, 0)
MAP of π?
1
10
X Y P(X,Y)
0 0 0.35
0 1 0.05
1 0 0.3
1 1 0.3
X P(X)
0 0.4
1 0.6
Computing the a posteriori belief π π π in a GM is NP-hard in general
Hardness implies we cannot find a general procedure that works efficiently for arbitrary GMs
For particular families of GMs, we can have provably efficient procedures
For some families of GMs, we need to design efficient approximate inference algorithms
Complexity of Inference
11
eg. trees
eg. grids
Approaches to inference
Exact inference algorithms
Variable elimination algorithm
Message-passing algorithm (sum-product, belief propagation algorithm)
The junction tree algorithm
Approximate inference algorithms
Sampling methods/Stochastic simulation
Variational algorithms
12
Marginalization and Elimination
A metabolic pathway: What is the likelihood protein πΈ is produced
Query: P(E)
π πΈ = π π, π, π, π, πΈππππ
Using graphical models, we get
π πΈ = π π)π π π π π π π π π π(πΈ|πππππ
13
π΄ π΅ πΆ π· πΈ
NaΓ―ve summation needs to enumerate over an
exponential number of terms
Rearranging terms and the summations
π πΈ
= π π)π π π π π π π π π π(πΈ|π
ππππ
= π π π π π π π πΈ π π π π π π
ππππ
Elimination in Chains
14
π΄ π΅ πΆ π· πΈ
Elimination in Chains (cont.)
Now we can perform innermost summation efficiently
π πΈ
= π π π π π π π πΈ π π π π π π
ππππ
= π π π π π π π πΈ π π(π)
πππ
The innermost summation eliminates one variable from our summation argument at a local cost.
15
π΄ π΅ πΆ π· πΈ
π(π)
Equivalent to matrix-vector multiplication, |Val(A)| * |Val(B)|
Elimination in Chains (cont.)
Rearranging and then summing again, we get
π πΈ
= π π π π π π π π π π(π)
πππ
= π π π π πΈ π π π π π π
πππ
= π π π π πΈ π π(π)
ππ
16
π΄ π΅ πΆ π· πΈ
π(π) π(π)
Equivalent to matrix-vector multiplication, |Val(B)| * |Val(C)|
C B 0 1
0 0 .15 0.35
1 0.85 0.65
B 0
0 0 .25
1 0.75
Elimination in Chains (cont.)
Eliminate nodes one by one all the way to the end
π πΈ = π πΈ π π(π)
π
Computational Complexity for a chain of length π
Each step costs O(|Val(ππ)| * |Val(ππ+1)|) operations: O(ππ2)
Ξ¨ ππ = π ππ ππβ1)π(ππβ1)π₯πβ1
Compare to naΓ―ve summation: O(ππ)
β¦ π(π₯1, β¦ , ππ)π₯πβ1π₯1
17
π΄ π΅ πΆ π· πΈ
π(π) π(π)
Undirected Chains
18
π΄ π΅ πΆ π· πΈ
Rearrange terms, perform local summation β¦
π πΈ
= 1
πΞ¨ π, π Ξ¨ π, π Ξ¨ π, π Ξ¨(πΈ, π)
ππππ
=1
π Ξ¨ π, π Ξ¨ π, π Ξ¨ πΈ, π Ξ¨ π, π
ππππ
=1
π Ξ¨ π, π Ξ¨ π, π Ξ¨ πΈ, π Ξ¨ π
πππ
The Sum-Product Operation
During inference, we try to compute an expression
Sum-product form: ΨΨβππ
π§ = {π1, β¦ , ππ} the set of variables
π a set of factors such that for each Ξ¨ β π, πππππ Ξ¨ β π§
π¨ β π§ a set of query variables
π© = π§βπ¨ the variables to eliminate
The result of eliminating the variables in π© is a factor
π π¨ = Ξ¨
Ξ¨βππ§
This factor does not necessarily correspond to any probability or conditional probability in the network.
π π¨ =π(π¨)
π(π¨)
19
Inference via Variable Elimination
General Idea
Write query in the form
π π1, π = β¦ π π₯π ππππππ₯2π₯3π₯π
The sum is ordered to suggest an elimination order
Then iteratively
Move all irrelevant terms outside of innermost sum
Perform innermost sum, getting a new term
Insert the new term into the product
Finally renormalize
π π1 π = π π1, π
π(π1, π)π₯1
20
A more complex network
A food web
What is the probability π π΄ π» that hawks are leaving given that the grass condition is poor?
21
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
Query: π(π΄|β), need to eliminate π΅, πΆ, π·, πΈ, πΉ, πΊ, π»
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
Choose an elimination order: π», πΊ, πΉ, πΈ, π·, πΆ, π΅ (<)
Step 1: Eliminate G
Conditioning (fix the evidence node on its observed value)
πβ π, π = π(π» = β|π, π)
Example: Variable Elimination
22
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ πΉ
πΊ
Query: π(π΄|β), need to eliminate π΅, πΆ, π·, πΈ, πΉ, πΊ
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ(π, π)
Step 2: Eliminate πΊ
Compute ππ π = π π π π = 1
β π π π π π π π π π π π π π, π π π π ππ π πβ(π, π)
β π π π π π π π π π π π π π, π π π π πβ(π, π)
Example: Variable Elimination
23
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ πΉ
Query: π(π΄|β), need to eliminate π΅, πΆ, π·, πΈ, πΉ
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
Step 3: Eliminate πΉ
Compute ππ π, π = π π π πβ(π, π) π
β π π π π π π π π π π π π π, π ππ(π, π)
Example: Variable Elimination
24
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ
Query: π(π΄|β), need to eliminate π΅, πΆ, π·, πΈ
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
β π π π π π π π π π π π π π, π ππ π, π
Step 3: Eliminate πΈ
Compute ππ π, π, π = π π π, π ππ(π, π) π
β π π π π π π π π π π ππ(π, π, π)
Example: Variable Elimination
25
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
Query: π(π΄|β), need to eliminate π΅, πΆ, π·
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
β π π π π π π π π π π π π π, π ππ π, π
β π π π π π π π π π π ππ π, π, π
Step 3: Eliminate π·
Compute ππ π, π = π π π ππ(π, π, π) π
β π π π π π π π ππ(π, π)
Example: Variable Elimination
26
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ
Query: π(π΄|β), need to eliminate π΅, πΆ
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
β π π π π π π π π π π π π π, π ππ π, π
β π π π π π π π π π π ππ π, π, π
β π π π π π π π ππ π, π
Step 3: Eliminate πΆ
Compute ππ π, π = π π π ππ(π, π) π
β π π π π ππ(π, π)
Example: Variable Elimination
27
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ
Query: π(π΄|β), need to eliminate π΅
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
β π π π π π π π π π π π π π, π ππ π, π
β π π π π π π π π π π ππ π, π, π
β π π π π π π π ππ π, π
β π π π π ππ π, π
Step 3: Eliminate πΆ
Compute ππ π = π(π)ππ(π, π) π
β π π ππ(π)
Example: Variable Elimination
28
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ
Query: π(π΄|β), need to renormalize over π΄
Initial factors π π π π π π π π π π π π π, π π π π π π π π β π, π
β π π π π π π π π π π π π π, π π π π π π π πβ π, π
β π π π π π π π π π π π π π, π π π π πβ π, π
β π π π π π π π π π π π π π, π ππ π, π
β π π π π π π π π π π ππ π, π, π
β π π π π π π π ππ π, π
β π π π π ππ π, π
β π π ππ π
Step 3: renormalize
π π, β = π π ππ π , compute π(β) = π π ππ(π)π
β π π β = π π ππ(π)
π π ππ(π΄)π
Example: Variable Elimination
29
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ
Complexity of variable elimination
Suppose in one elimination step we compute
ππ₯ π¦1, β¦ , π¦π = ππ₯β² (π₯, π¦1, β¦ , π¦π)π₯
ππ₯β² π₯, π¦1, β¦ , π¦π = ππ π₯, π¦ππ
ππ=1
This requires
π β πππ π β πππ ππππ multiplications
For each value of π₯, π¦1, β¦ , π¦π, we do k multiplications
πππ π β πππ ππππ additions
For each value of π¦1, β¦ , π¦π, we do πππ π additions
Complexity is exponential in the number of variables in the intermediate factor
30
π
π¦1 π¦π π¦π
Inference in Graphical Models
General form of the inference problem
π π1, β¦ , ππ β Ξ¨(π·π)π
Want to query π variable given evidence π, and βdonβt careβ a set of π variables
Compute π π, π = Ξ¨(π·π)ππ using variable elimination
Renormalize to obtain the conditionals π π|π =π(π,π)
π(π,π)π
Two examples: use graph structure
to order computation
31
π΄ π΅ πΆ π· πΈ
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
Chain:
DAG:
From Variable Elimination to Message Passing
Recall that induced dependency during marginalization is captured in elimination cliques
Summation Elimination
Intermediate term Elimination cliques
Can this lead to an generic inference algorithm?
32
Nice localization in computation
π πΈ = π π)π π π π π π π π π π(πΈ|πππππ
π πΈ = π πΈ π π π π ( π π π π π π π π)ππππ
Chain: Query E
33
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π
ππ΅πΆ π
ππΆπ· π
π πΈ = ππ·πΈ πΈ
ππ΄π΅ π ππ΅πΆ π ππΆπ· π ππ·πΈ πΈ
Start elimination away from the query variable
π(πΆ) = π π)π π π π π π π π π π(π|πππππ
π(πΆ) = ( π π πΆ ( π(π|π))) ( π πΆ π ( π π π π πππ )ππ )
Chain: Query C
34
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π
ππ΅πΆ πΆ
ππ·πΈ π
ππ·πΆ πΆ
π πΆ = ππ·πΆ πΆ ππ΅πΆ(πΆ)
ππ΄π΅ π ππ΅πΆ πΆ ππ·πΆ πΆ ππΈπ· π
Chain: What if I want to query everybody
π π΅ = ( π π π΅ ( π π πππ ( π π π )))π π π΅ π π ππ
Query π π΄ , π π΅ , π πΆ , π π· , π(πΈ)
Computational cost
Each message π πΎ2
Chain length is πΏ
Cost for each query is about π πΏπΎ2
For πΏ queries, cost is about π πΏ2πΎ2
35
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π΅ ππΆπ΅ π΅ ππ·πΆ π ππΈπ· π
What is shared in these queries?
π π΅ = ( π π π΅ ( π π πππ ( π π π )))π π π΅ π π ππ
π πΈ = π πΈ π π π π ( π π π π π π π π)ππππ
π πΆ = ( π π πΆ ( π(π|π))) ( π πΆ π ( π π π π πππ )ππ )
36
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π ππ΅πΆ π ππΆπ· π ππ·πΈ πΈ
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π ππ΅πΆ πΆ ππ·πΆ πΆ ππΈπ· π
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π΅ ππΆπ΅ π΅ ππ·πΆ π ππΈπ· π
The number of unique message is 2(πΏ β 1)
Forward-backward algorithm
Compute and cache the 2(πΏ β 1) unique messages
In query time, just multiply together the messages from the neighbors
eg. π π· = ππΆπ· π· ππΈπ·(π·)
37
π΄ π΅ πΆ π· πΈ
ππ΄π΅ π ππ΅πΆ π ππΆπ· π ππ·πΈ π
Forward pass:
π΄ π΅ πΆ π· πΈ
ππ΅π΄ π ππΆπ΅ π ππ·πΆ π ππΈπ· π
Backward pass:
π΄ π΅ πΆ π· πΈ
ππΆπ· π· ππΈπ· π· For all queries, π 2πΏπΎ2
DAG: Variable elimination
Elimination order H, G, F, E, B, C, D
π π΄ =
π π΄ π π π΄ ( ( π π π π π )( π π π, π ( π π π )( π π π΄ π β π, π ))) β ππ ππππ
38
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ»(πΈπΉ) π, π
ππΉ(π΄πΈ) π΄, π
ππΊπΈ π
ππΈ(π΄πΆπ·) π΄, π, π
ππ΅πΆ π
ππΆ(π΄π·) π΄, π
ππ·π΄ π΄
4-way tables
created!
DAG: Cliques of size 4 are generated
39
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ»(πΈπΉ) π, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΊπΈ π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΉ(π΄πΈ) π΄, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΈ(π΄πΆπ·) π΄, π, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ΅πΆ π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΆ(π΄π·) π΄, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ·π΄ π΄
4-way tables
created!
DAG: A different elimination order
Elimination order G, H, F, B, C, D, E
π π΄
= ( π(π|π΄)π π(π|π, π)π π π π π ππ π π π΄ π β π, πβπ π π ππ )π
40
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΊπΈ π
ππΉ(π΄πΈ) π΄, π
ππ»(πΈπΉ) π, π
ππΆ(πΈπ·) π, π
ππ΅πΆ π
ππΈπ΄ π΄
ππ·(π΄πΈ) π΄, π
NO 4-way tables!
DAG: No cliques of size 4
41
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΊπΈ π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ»(πΈπΉ) π, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΉ(π΄πΈ) π΄, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ΅πΆ π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΆ(π·πΈ) π, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππ·(π΄πΈ) π΄, π
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
ππΈπ΄ π΄
Any thoughts?
Chain has nice properties
forward-backward algorithm works
Immediate results (messages) along edges
Can we generalize to other graphs? (trees, loopy graphs?)
How about undirected trees? Is there a forward-backward algorithm?
Loopy graph is more complicated Different elimination order results in different computational cost
Can we somehow make loopy graph behave like trees?
42
Tree Graphical Models
43
Undirected tree: a unique path between any pair of nodes
Directed tree: all nodes except the root have exactly one parent
Equivalence of directed and undirected trees
Any undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it
A directed tree and the corresponding undirected tree make the conditional independence assertions
Parameterization are essentially the same
Undirected tree: π π =1
π Ξ¨ ππ Ξ¨(ππ , ππ)(π,π)βEπβV
Directed tree: π π = π ππ π(ππ|ππ)π,π βπΈ
Equivalence: Ξ¨ ππ = π ππ , Ξ¨ ππ , ππ = π ππ ππ , π =
1,Ξ¨ ππ = 1
44
Message passing on trees
Message passed along tree edges
π ππ, ππ , ππ , ππ, ππ β
Ξ¨ ππ Ξ¨ ππ Ξ¨ ππ Ξ¨ ππ Ξ¨ ππ Ξ¨ ππ , ππ Ξ¨ ππ , ππ Ξ¨ ππ , ππ Ξ¨(ππ , ππ)
π π = Ξ¨(ππ) (Ξ¨ ππ Ξ¨ ππ , ππ Ξ¨ ππ Ξ¨ ππ , ππ ( Ξ¨ ππ Ξ¨ ππ , πππ₯π )( Ξ¨ ππ Ξ¨ ππ , πππ₯π )π₯π )π₯π
45
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
πππ ππ πππ ππ
πππ ππ
πππ ππ
Sharing messages on trees
Query f
Query j
46
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
Computational cost for all queries
Query π ππ , π ππ , π ππ , π ππ , π ππ
Doing things separately
Each message π πΎ2
Number of edges is πΏ
Cost for each query is about π πΏπΎ2
For πΏ queries, cost is about π πΏ2πΎ2
47
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
Forward-backward algorithm in trees
Forward: pick one leave as root, compute all messages, cache
Backward: pick another root, compute all messages, cache
Eg. Query j
48
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
π π π
π
π
πππ ππ
πππ ππ
πππ ππ
resuse
Computational saving for trees
Compute forward and backward messages for each edge, save them
Doing things separately
Each message π πΎ2
Number of edges is πΏ
2πΏ unique messages
Cost for all queries is about π 2πΏπΎ2
49
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
πππ ππ πππ ππ πππ ππ
πππ ππ
Message passing algorithm
πππ ππ β Ξ¨ ππ , ππππΞ¨ ππ ππ π πππ βN π \i
50
π π π
π
π
πππ ππ
πππ ππ
πππ ππ
N π \i
πππππ’ππ‘ ππ ππππππππ πππ π ππππ
ππ’ππ‘ππππ¦ ππ¦ πππππ πππ‘πππ‘ππππ
ππ’π ππ’π‘ ππ ππ can send
message when incoming messages from π π \i arrive
From Variable Elimination to Message Passing
Recall Variable Elimination Algorithm
Choose an ordering in which the query node π is the final node
Eliminate node π by removing all potentials containing π, take sum/product over π₯π
Place the resultant factor back
For a Tree graphical model:
Choose query node f as the root of the tree
View tree as a directed tree with edges pointing towards π
Elimination of each node can be considered as message-passing directly along tree branches, rather than on some transformed graphs
Thus, we can use the tree itself as a data-structure to inference
51
How about general graph?
Trees are nice
Can just compute two messages for each edge
Order computation along the graph
Associate intermediate results with edges
General graph is not so clear
Different elimination generate different cliques and factor size
Computation and immediate results not associated with edges
Local computation view is not so clear
52
π π π
π
π
πππ ππ
πππ ππ
πππ ππ πππ ππ
πππ ππ πππ ππ πππ ππ
πππ ππ
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
π΄ π΅
πΆ π·
πΈ πΉ
πΊ π»
Can we make them tree like or treat them
as trees?
Message passing for loopy graph
Local message passing for trees guarantees the consistency of local marginals
π ππ computed is the correct one
π ππ , ππ computed is the correct on
β¦
For loopy graphs, no consistency guarantees for local message passing
53
π π π
π
π
πππ ππ
πππ ππ
πππ ππ
Inference for loopy graph models is NP-hard in general
Treat loopy graphs locally as if they were trees
Iteratively estimate the marginal
Read in messages
Process messages
Send updated out messages
Repeat for all variables until convergence
Loopy belief propagation
54
A
Message update schedule
Synchronous update:
ππ can send message when incoming messages from π π \i
arrive
Slow
Provably correct for tree, may converge for loopy graphs
Asynchronous update:
ππ can send message when there is a change in any incoming messages
from π π \i
Fast
Not easy to prove convergence, but empirically it often works
55