School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models...

School of Computer Science

Probabilistic Graphical Models

Variational Inference II:Variational Inference II:

Mean Field and Generalized MFMean Field and Generalized MF

Eric XingEric Xing

Lecture 17, November 5, 2009

Reading:

Inference Problems

Compute the likelihood of observed data Compute the marginal distribution over a particular subset

of nodes Compute the conditional distribution for disjoint subsets A

and B Compute a mode of the density

Methods we have

Brute force EliminationMessage Passing

(Forward-backward , Max-product /BP, Junction Tree)

Sharing intermediate termsIndividual computations independent

Exponential Family GMs

Canonical Parameterization

Effective canonical parameters

Regular family:

Minimal representation: if there does not exist a nonzero vector such that is a

constant

Canonical Parameters Sufficient Statistics Log-normalization Function

Mean Parameterization

The mean parameter associated with a sufficient statistic

is defined as

Realizable mean parameter set

A convex subset of Convex hull for discrete case

Convex polytope when is finite

Convex Polytope

Convex hull representation

Half-plane based representation Minkowski-Weyl Theorem:

any polytope can be characterized by a finite collection of linear inequality constraints

Conjugate Duality Duality between MLE and Max-Ent:

For all , a unique canonical parameter satisfying

The log-partition function has the variational form

For all , the supremum in (*) is attained uniquely at specified by the moment-matching conditions

Bijection for minimal exponential family

Variational Inference In General An umbrella term that refers to various mathematical tools for

optimization-based formulations of problems, as well as associated techniques for their solution

General idea: Express a quantity of interest as the solution of an optimization problem

The optimization problem can be relaxed in various ways Approximate the functions to be optimized Approximate the set over which the optimization takes place

Goes in parallel with MCMC

Bethe Variational Problem (BVP)

We already have: a convex (polyhedral) outer bound

the Bethe approximate entropy

Combining the two ingredients, we have

a simple structured problem (differentiable & constraint set is a simple polytope)

Max-product is the solver!

Connection to Sum-Product Alg. Lagrangian method for BVP:

Sum-product and Bethe Variational (Yedidia et al., 2002) For any graph G, any fixed point of the sum-product updates

specifies a pair of such that

For a tree-structured MRF, the solution is unique, where correspond to the exact singleton and pairwise marginal distributions of the MRF, and the optimal value of BVP is equal to

Inexactness of Bethe and Sum-Product

From Bethe entropy approximation Example

From pseudo-marginal outer bound strict inclusion

Kikuchi Approximation Recall: Bethe variational method uses a tree-based (Bethe)

approximation to entropy, and a tree-based outer bound on the marginal polytope

Kikuchi method extends these tree-based approximations to more general hyper-trees

Generalized pseudomarginal set

Normalization Marginalization

Hyper-tree based approximate entropy

Hyper-tree based generalization of BVP

Summary So Far

Formulate the inference problem as a variational optimization problem

The Bethe and Kikuchi free energy are approximations to the negative entropy

Next Step …

We will develop a set of lower-bound methods

Tractable Subgraph

Given a GM with a graph G, a subgraph F is tractable if We can perform exact inference on it

Example:

Mean Parameterization

For an exponential family GM defined with graph G and sufficient statistics , the realizable mean parameter set

For a given tractable subgraph F, a subset of mean parameters is of interest

Inner Approximation

Optimizing a Lower Bound

Any mean parameter yields a lower bound on the log-partition function

Moreover, equality holds iff and are dually coupled, i.e.,

Proof Idea: (Jensen’s Inequality)

Optimizing the lower bound gives This is an inference!

Mean Field Methods In General

However, the lower bound can’t explicitly evaluated in general Because the dual function typically lacks an explicit form

Mean Field Methods Approximate the lower bound

Approximate the realizable mean parameter set

The MF optimization problem

Still a lower bound?

KL-divergence

Kullback-Leibler Divergence

For two exponential family distributions with the same STs:

Primal Form

Mixed Form

Dual Form

Mean Field and KL-divergence

Optimizing a lower bound

Equivalent to minimize a KL-divergence

Therefore, we are doing minimization

Naïve Mean Field

Fully factorized variational distribution

Naïve Mean Field for Ising Model

Sufficient statistics and Mean Parameters

Naïve Mean Field Realizable mean parameter subset

Entropy

Optimization Problem

Naïve Mean Field for Ising Model

Optimization Problem

Update Rule

resembles “message” sent from node to

forms the “mean field” applied to from its

neighborhood

Non-Convexity of Mean Field

Mean field optimization is always non-convex for any exponential family in which the state space is finite

Finite convex hull

contains all the extreme points

If is a convex set, then

Mean field has been used successfully

Structured Mean Field

Mean field theory is general to any tractable sub-graphs Naïve mean field is based on the fully unconnected sub-graph

Variants based on structured sub-graphs can be derived

Other Notations

Mean Parameterization Form

Distribution Form

Naïve Mean Field for Ising Model:

Examples to add

GMF for Ising Models

Factorial HMM

Bayesian Gaussian Model

Latent Dirichlet Allocation

Summary

Message-passing algorithms (e.g., belief propagation, mean field) are solving approximate versions of exact variational principle in exponential families

There are two distinct components to approximations: Can use either inner or outer bounds to Various approximation to the entropy function

BP: polyhedral outer bound and non-convex Bethe approximation MF: non-convex inner bound and exact form of entropy Kikuchi: tighter polyhedral outer bound and better entropy

approximation

School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models...

Documents

Transcript of School of Computer Science © Eric Xing @ CMU, 2005-2009 1 Probabilistic Graphical Models...

Eric Poe Xing Positions - Carnegie Mellon School of epxing/xing_cv_2017.pdf · PDF fileEric Poe Xing (412) 268-2559 (Ofﬁce) (412) 268-3431 (Fax) epxing@cs.cmu.edu ... (PI, with

P GRANADA • SPAINmedia.nips.cc/Conferences/2011/Poster/NIPS-2011-Poster.pdf · Massachusetts Amherst), Frank Wood (Columbia University), Eric Xing (Carnegie Mellon University),

Logbook Xing

XING Portfolio

Advanced Algorithms and Models for Computational Biology -- a machine learning approach Computational Genomics III: Motif Detection Eric Xing Lecture 6,

© Eric Xing @ CMU, 2006-2010 Machine Learning Introduction & Nonparametric Classifiers Eric Xing Lecture 1, August 12, 2010 Reading:

A Choice of Variational Stacks: Exploring Variational Data ...web.engr.oregonstate.edu/~walkiner/papers/vamos17-variational-stacks.pdfA Choice of Variational Stacks: Exploring Variational

eLearning & XING

VARIATIONAL METHODS IN RELATIVISTIC QUANTUM MECHANICS · VARIATIONAL METHODS IN RELATIVISTIC QUANTUM MECHANICS MARIA J. ESTEBAN, MATHIEU LEWIN, AND ERIC SER´ E´ Abstract. This review

© Eric Xing @ CMU, 2006-20101 Machine Learning Generative verses discriminative classifier Eric Xing Lecture 2, August 12, 2010 Reading:

Multiple Xing

Eric Poe Xing Positions - Carnegie Mellon School of ...epxing/xing_cv_2014.pdf · Eric Poe Xing (412) 268-2559 (Ofﬁce) (412) 268-3431 ... Member of the DARPA Information Science

Variational Autoencodersechu/assets/projects/6882/6... · 2020. 7. 9. · Variational Autoencoders Eric Chu 6.882: Bayesian Modeling and Inference Abstract The ability of variational

Gunhee Kim Leonid Sigal Eric P. Xing

Learning theory: generalization and VC dimensionyifengt/courses/machine-learning/slides/lecture5... · [Slide from Eric Xing] Prototypical concept learning task oBinary classification

Le Song lesong@cs.cmu.edu Joint work with Mladen Kolar and Eric Xing KELLER: Estimating Time Evolving Interactions Between Genes.

Eric Xing © Eric Xing @ CMU, 2006-20101 Machine Learning Learning Graphical Models Eric Xing Lecture 12, August 14, 2010 Reading:

Variational Learning and Variational Inference · The variational approach • Variational inference: Find q(h) by solving • Variational learning: Alternate between running variational

© Eric Xing @ CMU, 2006-20101 Machine Learning Overfitting and Model Selection Eric Xing Lecture 6, August 13, 2010 Reading:

Amr Ahmed and Eric P. Xing