School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models...

27
School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Variational Inference II: Mean Field and Generalized Mean Field and Generalized MF MF Eric Xing Eric Xing Lecture 17, November 5, 2009 Reading: X 1 X 4 X 2 X 3 X 4 X 2 X 3 X 1 X 1 X 2 X 1 X 3 X 1 X 4 X 2 X 3 X 4 X 2 X 3 X 1 X 1 X 2 X 1 X 3

Transcript of School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models...

Page 1: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

School of Computer Science

© Eric Xing @ CMU, 2005-2009 1

Probabilistic Graphical Models

Variational Inference II:Variational Inference II:

Mean Field and Generalized MFMean Field and Generalized MF

Eric XingEric Xing

Lecture 17, November 5, 2009

Reading:

X1

X4

X2 X3

X4

X2 X3

X1X1

X2

X1

X3

X1

X4

X2 X3

X4

X2 X3

X1X1

X2

X1

X3

Page 2: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Inference Problems

Compute the likelihood of observed data Compute the marginal distribution over a particular subset

of nodes Compute the conditional distribution for disjoint subsets A

and B Compute a mode of the density

Methods we have

Brute force EliminationMessage Passing

(Forward-backward , Max-product /BP, Junction Tree)

Sharing intermediate termsIndividual computations independent

Page 3: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Exponential Family GMs

Canonical Parameterization

Effective canonical parameters

Regular family:

Minimal representation: if there does not exist a nonzero vector such that is a

constant

Canonical Parameters Sufficient Statistics Log-normalization Function

Page 4: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Mean Parameterization

The mean parameter associated with a sufficient statistic

is defined as

Realizable mean parameter set

A convex subset of Convex hull for discrete case

Convex polytope when is finite

Page 5: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Convex Polytope

Convex hull representation

Half-plane based representation Minkowski-Weyl Theorem:

any polytope can be characterized by a finite collection of linear inequality constraints

Page 6: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Conjugate Duality Duality between MLE and Max-Ent:

For all , a unique canonical parameter satisfying

The log-partition function has the variational form

For all , the supremum in (*) is attained uniquely at specified by the moment-matching conditions

Bijection for minimal exponential family

Page 7: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Variational Inference In General An umbrella term that refers to various mathematical tools for

optimization-based formulations of problems, as well as associated techniques for their solution

General idea: Express a quantity of interest as the solution of an optimization problem

The optimization problem can be relaxed in various ways Approximate the functions to be optimized Approximate the set over which the optimization takes place

Goes in parallel with MCMC

Page 8: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Bethe Variational Problem (BVP)

We already have: a convex (polyhedral) outer bound

the Bethe approximate entropy

Combining the two ingredients, we have

a simple structured problem (differentiable & constraint set is a simple polytope)

Max-product is the solver!

Page 9: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Connection to Sum-Product Alg. Lagrangian method for BVP:

Sum-product and Bethe Variational (Yedidia et al., 2002) For any graph G, any fixed point of the sum-product updates

specifies a pair of such that

For a tree-structured MRF, the solution is unique, where correspond to the exact singleton and pairwise marginal distributions of the MRF, and the optimal value of BVP is equal to

Page 10: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Inexactness of Bethe and Sum-Product

From Bethe entropy approximation Example

From pseudo-marginal outer bound strict inclusion

32

1 4

Page 11: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Kikuchi Approximation Recall: Bethe variational method uses a tree-based (Bethe)

approximation to entropy, and a tree-based outer bound on the marginal polytope

Kikuchi method extends these tree-based approximations to more general hyper-trees

Generalized pseudomarginal set

Normalization Marginalization

Hyper-tree based approximate entropy

Hyper-tree based generalization of BVP

Page 12: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Summary So Far

Formulate the inference problem as a variational optimization problem

The Bethe and Kikuchi free energy are approximations to the negative entropy

© Eric Xing @ CMU, 2005-2009 12

Page 13: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Next Step …

We will develop a set of lower-bound methods

© Eric Xing @ CMU, 2005-2009 13

Page 14: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Tractable Subgraph

Given a GM with a graph G, a subgraph F is tractable if We can perform exact inference on it

Example:

© Eric Xing @ CMU, 2005-2009 14

Page 15: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Mean Parameterization

For an exponential family GM defined with graph G and sufficient statistics , the realizable mean parameter set

For a given tractable subgraph F, a subset of mean parameters is of interest

Inner Approximation

© Eric Xing @ CMU, 2005-2009 15

Page 16: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Optimizing a Lower Bound

Any mean parameter yields a lower bound on the log-partition function

Moreover, equality holds iff and are dually coupled, i.e.,

Proof Idea: (Jensen’s Inequality)

Optimizing the lower bound gives This is an inference!

© Eric Xing @ CMU, 2005-2009 16

Page 17: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Mean Field Methods In General

However, the lower bound can’t explicitly evaluated in general Because the dual function typically lacks an explicit form

Mean Field Methods Approximate the lower bound

Approximate the realizable mean parameter set

The MF optimization problem

Still a lower bound?

© Eric Xing @ CMU, 2005-2009 17

Page 18: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

KL-divergence

Kullback-Leibler Divergence

For two exponential family distributions with the same STs:

© Eric Xing @ CMU, 2005-2009 18

Primal Form

Mixed Form

Dual Form

Page 19: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Mean Field and KL-divergence

Optimizing a lower bound

Equivalent to minimize a KL-divergence

Therefore, we are doing minimization

© Eric Xing @ CMU, 2005-2009 19

Page 20: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Naïve Mean Field

Fully factorized variational distribution

© Eric Xing @ CMU, 2005-2009 20

Page 21: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Naïve Mean Field for Ising Model

Sufficient statistics and Mean Parameters

Naïve Mean Field Realizable mean parameter subset

Entropy

Optimization Problem

© Eric Xing @ CMU, 2005-2009 21

Page 22: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Naïve Mean Field for Ising Model

Optimization Problem

Update Rule

resembles “message” sent from node to

forms the “mean field” applied to from its

neighborhood

© Eric Xing @ CMU, 2005-2009 22

Page 23: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Non-Convexity of Mean Field

Mean field optimization is always non-convex for any exponential family in which the state space is finite

Finite convex hull

contains all the extreme points

If is a convex set, then

Mean field has been used successfully

© Eric Xing @ CMU, 2005-2009 23

Page 24: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Structured Mean Field

Mean field theory is general to any tractable sub-graphs Naïve mean field is based on the fully unconnected sub-graph

Variants based on structured sub-graphs can be derived

© Eric Xing @ CMU, 2005-2009 24

Page 25: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Other Notations

Mean Parameterization Form

Distribution Form

where

Naïve Mean Field for Ising Model:

© Eric Xing @ CMU, 2005-2009 25

Page 26: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Examples to add

GMF for Ising Models

Factorial HMM

Bayesian Gaussian Model

Latent Dirichlet Allocation

© Eric Xing @ CMU, 2005-2009 26

Page 27: School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing.

Summary

Message-passing algorithms (e.g., belief propagation, mean field) are solving approximate versions of exact variational principle in exponential families

There are two distinct components to approximations: Can use either inner or outer bounds to Various approximation to the entropy function

BP: polyhedral outer bound and non-convex Bethe approximation MF: non-convex inner bound and exact form of entropy Kikuchi: tighter polyhedral outer bound and better entropy

approximation

© Eric Xing @ CMU, 2005-2009 27