Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of...

76
Approximate Inference: Variational Inference CMSC 678 UMBC

Transcript of Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of...

Page 1: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Approximate Inference:Variational Inference

CMSC 678UMBC

Page 2: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 3: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Recap from last time…

Page 4: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Graphical Models

𝑝𝑝 π‘₯π‘₯1, π‘₯π‘₯2, π‘₯π‘₯3, … , π‘₯π‘₯𝑁𝑁 = �𝑖𝑖

𝑝𝑝 π‘₯π‘₯𝑖𝑖 πœ‹πœ‹(π‘₯π‘₯𝑖𝑖))

Directed Models (Bayesian networks)

Undirected Models (Markov random fields)

𝑝𝑝 π‘₯π‘₯1, π‘₯π‘₯2, π‘₯π‘₯3, … , π‘₯π‘₯𝑁𝑁 =1𝑍𝑍�𝐢𝐢

πœ“πœ“πΆπΆ π‘₯π‘₯𝑐𝑐

Page 5: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Markov Blanket

x

Markov blanket of a node x is its parents, children, and

children's parents

𝑝𝑝 π‘₯π‘₯𝑖𝑖 π‘₯π‘₯𝑗𝑗≠𝑖𝑖 =𝑝𝑝(π‘₯π‘₯1, … , π‘₯π‘₯𝑁𝑁)

∫ 𝑝𝑝 π‘₯π‘₯1, … , π‘₯π‘₯𝑁𝑁 𝑑𝑑π‘₯π‘₯𝑖𝑖

=βˆπ‘˜π‘˜ 𝑝𝑝(π‘₯π‘₯π‘˜π‘˜|πœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ )

∫ βˆπ‘˜π‘˜ 𝑝𝑝 π‘₯π‘₯π‘˜π‘˜ πœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ )𝑑𝑑π‘₯π‘₯𝑖𝑖factor out terms not dependent on xi

factorization of graph

=βˆπ‘˜π‘˜:π‘˜π‘˜=𝑖𝑖 or π‘–π‘–βˆˆπœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ 𝑝𝑝(π‘₯π‘₯π‘˜π‘˜|πœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ )

∫ βˆπ‘˜π‘˜:π‘˜π‘˜=𝑖𝑖 or π‘–π‘–βˆˆπœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ 𝑝𝑝 π‘₯π‘₯π‘˜π‘˜ πœ‹πœ‹ π‘₯π‘₯π‘˜π‘˜ )𝑑𝑑π‘₯π‘₯𝑖𝑖

the set of nodes needed to form the complete conditional for a variable xi

Page 6: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Markov Random Fields withFactor Graph Notation

x: original pixel/state

y: observed (noisy)

pixel/state

factor nodes are added

according to maximal cliques

unaryfactor

variable

factor graphs are bipartite

binaryfactor

Page 7: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Two Problems for Undirected Models

Finding the normalizer

𝑍𝑍 = οΏ½π‘₯π‘₯

�𝑐𝑐

πœ“πœ“π‘π‘(π‘₯π‘₯𝑐𝑐)

Computing the marginals

𝑍𝑍𝑛𝑛(𝑣𝑣) = οΏ½π‘₯π‘₯:π‘₯π‘₯𝑛𝑛=𝑣𝑣

�𝑐𝑐

πœ“πœ“π‘π‘(π‘₯π‘₯𝑐𝑐)Q: Why are these difficult?

A: Many different combinations

Sum over all variable combinations, with the xn

coordinate fixed

𝑍𝑍2(𝑣𝑣) = οΏ½π‘₯π‘₯1

οΏ½π‘₯π‘₯3

�𝑐𝑐

πœ“πœ“π‘π‘(π‘₯π‘₯ = π‘₯π‘₯1, 𝑣𝑣, π‘₯π‘₯3 )

Example: 3 variables, fix the

2nd dimensionBelief propagation algorithms

β€’ sum-product (forward-backward in HMMs)

β€’ max-product/max-sum (Viterbi)

Page 8: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Sum-ProductFrom variables to factors

π‘žπ‘žπ‘›π‘›β†’π‘šπ‘š π‘₯π‘₯𝑛𝑛 = οΏ½π‘šπ‘šβ€²βˆˆπ‘€π‘€(𝑛𝑛)\π‘šπ‘š

π‘Ÿπ‘Ÿπ‘šπ‘šβ€²β†’π‘›π‘› π‘₯π‘₯𝑛𝑛

From factors to variables

π‘Ÿπ‘Ÿπ‘šπ‘šβ†’π‘›π‘› π‘₯π‘₯𝑛𝑛= οΏ½

π’˜π’˜π‘šπ‘š\𝑛𝑛

π‘“π‘“π‘šπ‘š π’˜π’˜π‘šπ‘š οΏ½π‘›π‘›β€²βˆˆπ‘π‘(π‘šπ‘š)\𝑛𝑛

π‘žπ‘žπ‘›π‘›β€²β†’π‘šπ‘š(π‘₯π‘₯𝑛𝑛𝑛)

n

m

n

m

set of variables that the mth factor depends on

set of factors in which variable n participates

sum over configuration of variables for the mth factor,

with variable n fixed

default value of 1 if empty product

Page 9: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 10: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Goal: Posterior Inference

Hyperparameters αUnknown parameters ΘData:

Likelihood model:

p( | Θ )

pα( Θ | )

we’re going to be Bayesian (perform Bayesian inference)

Page 11: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Posterior Classification vs.Posterior Inference

β€œFrequentist” methods

prior over labels (maybe), not weights

Bayesian methods

Θ includes weight parameters

pα( Θ | )pα,w ( y| )

Page 12: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

(Some) Learning Techniques

MAP/MLE: Point estimation, basic EM

Variational Inference: Functional Optimization

Sampling/Monte Carlo

today

next class

what we’ve already covered

Page 13: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 14: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Page 15: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Support functionβ€’ Formally necessary, in practice

irrelevant

Page 16: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Distribution Parametersβ€’ Natural parametersβ€’ Feature weights

Page 17: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Feature function(s)β€’ Sufficient statistics

Page 18: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Log-normalizer

Page 19: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Form

Log-normalizer

Page 20: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Discrete (Finite distributions)

Page 21: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

β€’ Gaussian

https://kanbanize.com/blog/wp-content/uploads/2014/07/Standard_deviation_diagram.png

Page 22: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Dirichlet (Distributions over (finite) distributions)

Page 23: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? Capture Common Distributions

Discrete (Finite distributions)

Dirichlet (Distributions over (finite) distributions)

Gaussian

Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log-Normal,…

Page 24: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Gradients

Observed feature countsCount w.r.t. empirical distribution

Expected feature countsCount w.r.t. current model parameters

(we’ve already seen this with maxent models)

Page 25: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Expectations

expectation of the sufficient

statistics

gradient of the log normalizer

Page 26: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Posterior Inference

Page 27: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Posterior Inference

p is the conjugate prior for q

Page 28: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

Page 29: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

All exponential family models have a conjugate prior (in theory)

Page 30: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Why? β€œEasy” Posterior Inference

p is the conjugate prior for q

Posterior p has same form as prior p

Posterior Likelihood Prior

Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta)

Normal Normal (fixed var.) Normal

Gamma Exponential Gamma

Page 31: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 32: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Goal: Posterior Inference

Hyperparameters αUnknown parameters ΘData:

Likelihood model:

p( | Θ )

pα( Θ | )

Page 33: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

(Some) Learning Techniques

MAP/MLE: Point estimation, basic EM

Variational Inference: Functional Optimization

Sampling/Monte Carlo

today

next class

what we’ve already covered

Page 34: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Page 35: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Minimize the β€œdifference”

by changing Ξ»

Easy(ier) to compute

q(ΞΈ): controlled by parameters Ξ»

Page 36: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Difficult to compute

Easy(ier) to compute

Minimize the β€œdifference”

by changing Ξ»

Page 37: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value Ξ»t

Until converged:1. Get value y t = F(q(β€’;Ξ»t))2. Get gradient g t = F’(q(β€’;Ξ»t))3. Get scaling factor ρ t4. Set Ξ»t+1 = Ξ»t + ρt*g t5. Set t += 1

Page 38: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value Ξ»t

Until converged:1. Get value y t = F(q(β€’;Ξ»t))2. Get gradient g t = F’(q(β€’;Ξ»t))3. Get scaling factor ρ t4. Set Ξ»t+1 = Ξ»t + ρt*g t5. Set t += 1

Page 39: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Posterior of desired model

Any easy-to-compute distribution

Page 40: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Posterior of desired model

Any easy-to-compute distribution

Find the best distribution (calculus of variations)

Page 41: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Parameters for desired model

Page 42: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Variational parameters for ΞΈ

Parameters for desired model

Page 43: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:The Function to Optimize

Find the best distribution

Variationalparameters for ΞΈ

Parameters for desired model

KL-Divergence (expectation)

DKL π‘žπ‘ž πœƒπœƒ || 𝑝𝑝(πœƒπœƒ|π‘₯π‘₯) =

π”Όπ”Όπ‘žπ‘ž πœƒπœƒ logπ‘žπ‘ž πœƒπœƒπ‘π‘(πœƒπœƒ|π‘₯π‘₯)

Page 44: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Find the best distribution

Variational parameters for ΞΈ

Parameters for desired model

Page 45: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Exponential Family Recap: β€œEasy” Expectations

Exponential Family Recap: β€œEasy” Posterior Inference

p is the conjugate prior for Ο€

Page 46: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference

Find the best distribution

When p and q are the same exponential family form, the variational update q(ΞΈ) is (often) computable (in closed form)

Page 47: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value Ξ»tLetF(q(β€’;Ξ»t)) = KL[q(β€’;Ξ»t) || p(β€’)]

Until converged:1. Get value y t = F(q(β€’;Ξ»t))2. Get gradient g t = F’(q(β€’;Ξ»t))3. Get scaling factor ρ t4. Set Ξ»t+1 = Ξ»t + ρt*g t5. Set t += 1

Page 48: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference:Maximization or Minimization?

Page 49: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 π‘₯π‘₯ = log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒ π‘‘π‘‘πœƒπœƒ

Page 50: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 π‘₯π‘₯ = log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒ π‘‘π‘‘πœƒπœƒ

= log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒπ‘žπ‘ž πœƒπœƒπ‘žπ‘ž(πœƒπœƒ)

π‘‘π‘‘πœƒπœƒ

Page 51: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 π‘₯π‘₯ = log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒ π‘‘π‘‘πœƒπœƒ

= log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒπ‘žπ‘ž πœƒπœƒπ‘žπ‘ž(πœƒπœƒ)

π‘‘π‘‘πœƒπœƒ

= logπ”Όπ”Όπ‘žπ‘ž πœƒπœƒπ‘π‘ π‘₯π‘₯,πœƒπœƒπ‘žπ‘ž πœƒπœƒ

Page 52: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Evidence Lower Bound (ELBO)

log𝑝𝑝 π‘₯π‘₯ = log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒ π‘‘π‘‘πœƒπœƒ

= log∫ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒπ‘žπ‘ž πœƒπœƒπ‘žπ‘ž(πœƒπœƒ)

π‘‘π‘‘πœƒπœƒ

= logπ”Όπ”Όπ‘žπ‘ž πœƒπœƒπ‘π‘ π‘₯π‘₯,πœƒπœƒπ‘žπ‘ž πœƒπœƒ

β‰₯ π”Όπ”Όπ‘žπ‘ž πœƒπœƒ 𝑝𝑝 π‘₯π‘₯,πœƒπœƒ βˆ’ π”Όπ”Όπ‘žπ‘ž πœƒπœƒ π‘žπ‘ž πœƒπœƒ= β„’(π‘žπ‘ž)

Page 53: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Outline

Recap of graphical models & belief propagation

Posterior inference (Bayesian perspective)

Math: exponential family distributions

Variational InferenceBasic TechniqueExample: Topic Models

Page 54: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Bag-of-Items Models

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …

p( ) Three: 1,people: 2,attack: 2,

…p( )=Unigram counts

Page 55: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Bag-of-Items Models

Three people have been fatally shot, and five people, including a mayor, were seriously wounded as a result of a Shining Path attack today against a community in Junindepartment, central Peruvian mountain region . …

p( ) Three: 1,people: 2,attack: 2,

…pΟ†,Ο‰( )=

Unigram counts

Global (corpus-level) parameters interact with local (document-level) parameters

Page 56: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (unigram) word counts

Page 57: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (unigram) word counts

Count of word j in document i

j

i

Page 58: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

Count of word j in document i

j

i

K topics

Page 59: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document (latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

~ Multinomial ~ Dirichlet ~ Dirichlet

(regularize/place priors)

Count of word j in document i

j

i

K topics

Page 60: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

Page 61: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 62: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 63: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 64: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 65: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Latent Dirichlet Allocation(Blei et al., 2003)

Per-document

(latent) topic usage

Per-document (unigram) word counts

Per-topic word usage

d

Page 66: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

Topic usage

Per-document (unigram) word counts

Topic words

p: True model

πœ™πœ™π‘˜π‘˜ ∼ Dirichlet(𝜷𝜷)𝑀𝑀(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ™πœ™π‘§π‘§ 𝑑𝑑,𝑛𝑛 )

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

Page 67: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

Topic usage

Per-document (unigram) word counts

Topic words

p: True model q: Mean-field approximation

πœ™πœ™π‘˜π‘˜ ∼ Dirichlet(𝜷𝜷)𝑀𝑀(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ™πœ™π‘§π‘§ 𝑑𝑑,𝑛𝑛 )

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœ™πœ™π‘˜π‘˜ ∼ Dirichlet(π€π€π’Œπ’Œ)

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

Page 68: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value Ξ»tLetF(q(β€’;Ξ»t)) = KL[q(β€’;Ξ»t) || p(β€’)]

Until converged:1. Get value y t = F(q(β€’;Ξ»t))2. Get gradient g t = F’(q(β€’;Ξ»t))3. Get scaling factor ρ t4. Set Ξ»t+1 = Ξ»t + ρt*g t5. Set t += 1

Page 69: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼

Page 70: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼 =

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) 𝛼𝛼 βˆ’ 1 𝑇𝑇 logπœƒπœƒ(𝑑𝑑) + 𝐢𝐢

exponential family form of Dirichlet

𝑝𝑝 πœƒπœƒ =Ξ“(βˆ‘π‘˜π‘˜ π›Όπ›Όπ‘˜π‘˜)βˆπ‘˜π‘˜ Ξ“ π›Όπ›Όπ‘˜π‘˜

οΏ½π‘˜π‘˜

πœƒπœƒπ‘˜π‘˜π›Όπ›Όπ‘˜π‘˜βˆ’1

params = π›Όπ›Όπ‘˜π‘˜ βˆ’ 1 π‘˜π‘˜suff. stats.= logπœƒπœƒπ‘˜π‘˜ π‘˜π‘˜

Page 71: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼 =

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) 𝛼𝛼 βˆ’ 1 𝑇𝑇 logπœƒπœƒ(𝑑𝑑) + 𝐢𝐢

expectation of sufficient statistics of q distribution

params = π›Ύπ›Ύπ‘˜π‘˜ βˆ’ 1 π‘˜π‘˜

suff. stats. = logπœƒπœƒπ‘˜π‘˜ π‘˜π‘˜

Page 72: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼 =

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) 𝛼𝛼 βˆ’ 1 𝑇𝑇 logπœƒπœƒ(𝑑𝑑) + 𝐢𝐢 =expectation of the

sufficient statistics is the gradient of the

log normalizer

𝛼𝛼 βˆ’ 1 π‘‡π‘‡π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) logπœƒπœƒ(𝑑𝑑) + 𝐢𝐢

Page 73: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼 =

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) 𝛼𝛼 βˆ’ 1 𝑇𝑇 logπœƒπœƒ(𝑑𝑑) + 𝐢𝐢 =expectation of the

sufficient statistics is the gradient of the

log normalizer

𝛼𝛼 βˆ’ 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 βˆ’ 1 + 𝐢𝐢

Page 74: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

π”Όπ”Όπ‘žπ‘ž(πœƒπœƒ(𝑑𝑑)) log𝑝𝑝 πœƒπœƒ(𝑑𝑑) | 𝛼𝛼 = 𝛼𝛼 βˆ’ 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 βˆ’ 1 + 𝐢𝐢

β„’ �𝛾𝛾𝑑𝑑

= 𝛼𝛼 βˆ’ 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 βˆ’ 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑there’s more math

to do!

Page 75: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: A Gradient-Based Optimization Technique

Set t = 0Pick a starting value Ξ»tLetF(q(β€’;Ξ»t)) = KL[q(β€’;Ξ»t) || p(β€’)]

Until converged:1. Get value y t = F(q(β€’;Ξ»t))2. Get gradient g t = F’(q(β€’;Ξ»t))3. Get scaling factor ρ t4. Set Ξ»t+1 = Ξ»t + ρt*g t5. Set t += 1

Page 76: Approximate Inference: Variational InferenceVariational Inference CMSC 678 UMBC Outline Recap of graphical models & belief propagation Posterior inference (Bayesian perspective) Math:

Variational Inference: LDirA

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(𝜢𝜢)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœƒπœƒ(𝑑𝑑))

πœƒπœƒ(𝑑𝑑) ∼ Dirichlet(πœΈπœΈπ’…π’…)𝑧𝑧(𝑑𝑑,𝑛𝑛) ∼ Discrete(πœ“πœ“(𝑑𝑑,𝑛𝑛))

p: True model q: Mean-field approximation

β„’ �𝛾𝛾𝑑𝑑

= 𝛼𝛼 βˆ’ 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑𝐴𝐴 𝛾𝛾𝑑𝑑 βˆ’ 1 + 𝑀𝑀 𝛾𝛾𝑑𝑑

𝛻𝛻𝛾𝛾𝑑𝑑ℒ �𝛾𝛾𝑑𝑑= 𝛼𝛼 βˆ’ 1 𝑇𝑇𝛻𝛻𝛾𝛾𝑑𝑑

2 𝐴𝐴 𝛾𝛾𝑑𝑑 βˆ’ 1 + 𝛻𝛻𝛾𝛾𝑑𝑑𝑀𝑀 𝛾𝛾𝑑𝑑