1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning...

30
1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University November 10 th , 2006 Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website Chapter 9 - Jordan

Transcript of 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning...

Page 1: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

1

Loopy Belief PropagationGeneralized Belief PropagationUnifying Variational and GBPLearning Parameters of MNs

Graphical Models – 10708

Carlos Guestrin

Carnegie Mellon University

November 10th, 2006

Readings:K&F: 11.3, 11.5Yedidia et al. paper from the class websiteChapter 9 - Jordan

Page 2: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 2

More details on Loopy BP

Numerical problem: messages < 1 get multiplied together

as we go around the loops numbers can go to zero normalize messages to one:

Zi!j doesn’t depend on Xj, so doesn’t change the answer

Computing node “beliefs” (estimates of probs.):

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 3: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 3

An example of running loopy BP

Page 4: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 4

(Non-)Convergence of Loopy BP

Loopy BP can oscillate!!! oscillations can small oscillations can be really bad!

Typically, if factors are closer to uniform, loopy does well

(converges) if factors are closer to deterministic, loopy doesn’t

behave well

One approach to help: damping messages new message is average of old message and

new one:

often better convergence but, when damping is required to

get convergence, result often badgraphs from Murphy et al. ’99

Page 5: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 5

Loopy BP in Factor graphs

What if we don’t have pairwise Markov nets?

1. Transform to a pairwise MN

2. Use Loopy BP on a factor graph

Message example: from node to factor:

from factor to node:

A B C D E

ABC ABD BDE CDE

Page 6: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 6

Loopy BP in Factor graphs

From node i to factor j: F(i) factors whose scope

includes Xi

From factor j to node i: Scope[j] = Y[{Xi}

A B C D E

ABC ABD BDE CDE

Page 7: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 7

What you need to know about loopy BP

Application of belief propagation in loopy graphs

Doesn’t always converge damping can help good message schedules can help (see book)

If converges, often to incorrect, but useful results

Generalizes from pairwise Markov networks by using factor graphs

Page 8: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 8

Announcements

Monday’s special recitation Pradeep Ravikumar on exciting new approximate

inference algorithms

Page 9: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 9

Loopy BP v. Clique trees: Two ends of a spectrum

Difficulty

SATGrade

HappyJob

Coherence

Letter

IntelligenceDIG

GJSL

HGJ

CD

GSI

Page 10: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 10

Generalize cluster graph

Generalized cluster graph: For set of factors F Undirected graph Each node i associated with a

cluster Ci

Family preserving: for each factor fj 2 F, 9 node i such that scope[fi]µ Ci

Each edge i – j is associated with a set of variables Sij µ Ci Å Cj

Page 11: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 11

Running intersection property

(Generalized) Running intersection property (RIP) Cluster graph satisfies RIP if

whenever X2 Ci and X2 Cj then 9 one and only one path from Ci to Cj where X2Suv for every edge (u,v) in the path

Page 12: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 12

Examples of cluster graphs

Page 13: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 13

Two cluster graph satisfying RIP with different edge sets

Page 14: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 14

Generalized BP on cluster graphs satisfying RIP

Initialization: Assign each factor to a clique (), Scope[]µC()

Initialize cliques:

Initialize messages:

While not converged, send messages:

Belief:

Page 15: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 15

Cluster graph for Loopy BP

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 16: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 16

What if the cluster graph doesn’t satisfy RIP

Page 17: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 17

Region graphs to the rescue

Can address generalized cluster graphs that don’t satisfy RIP using region graphs: Yedidia et al. from class website

Example in your homework!

Hint – From Yedidia et al.: Section 7 – defines region graphs Section 9 – message passing on region graphs Section 10 – An example that will help you a lot!!!

Page 18: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 18

Revisiting Mean-Fields

Choice of Q: Optimization problem:

Page 19: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 19

Interpretation of energy functional

Energy functional:

Exact if P=Q:

View problem as an approximation of entropy term:

Page 20: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 20

Entropy of a tree distribution

Entropy term: Joint distribution:

Decomposing entropy term:

More generally: di number neighbors of Xi

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 21: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 21

Loopy BP & Bethe approximation

Energy functional:

Bethe approximation of Free Energy: use entropy for trees, but loopy graphs:

Theorem: If Loopy BP converges, resulting ij & i are stationary point (usually local maxima) of Bethe Free energy!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 22: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 22

GBP & Kikuchi approximation

Exact Free energy: Junction Tree

Bethe Free energy:

Kikuchi approximation: Generalized cluster graph spectrum from Bethe to exact entropy terms weighted by counting numbers see Yedidia et al.

Theorem: If GBP converges, resulting Ci are stationary point (usually local maxima) of Kikuchi Free energy!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence DIG

GJSL

HGJ

CD

GSI

Page 23: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 23

What you need to know about GBP

Spectrum between Loopy BP & Junction Trees: More computation, but typically better answers

If satisfies RIP, equations are very simple

General setting, slightly trickier equations, but not hard

Relates to variational methods: Corresponds to local optima of approximate version of energy functional

Page 24: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 24

Learning Parameters of a BN

Log likelihood decomposes:

Learn each CPT independently Use counts

D

SG

HJ

C

L

I

Page 25: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 25

Log Likelihood for MN

Log likelihood of the data:

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 26: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 26

Log Likelihood doesn’t decompose for MNs Log likelihood:

A convex problem Can find global optimum!!

Term log Z doesn’t decompose!!

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 27: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 27

Derivative of Log Likelihood for MNs

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Page 28: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 28

Derivative of Log Likelihood for MNs

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence Derivative:

Setting derivative to zero Can optimize using gradient ascent

Let’s look at a simpler solution

Page 29: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 29

Iterative Proportional Fitting (IPF)

Difficulty

SATGrade

HappyJob

Coherence

Letter

Intelligence

Setting derivative to zero:

Fixed point equation:

Iterate and converge to optimal parameters Each iteration, must compute:

Page 30: 1 Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs Graphical Models – 10708 Carlos Guestrin.

10-708 – Carlos Guestrin 2006 30

What you need to know about learning MN parameters? BN parameter learning easy MN parameter learning doesn’t decompose!

Learning requires inference!

Apply gradient ascent or IPF iterations to obtain optimal parameters