Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO...

29
Introduction Stochastic Variational Inference Stochastic Variational Inference in Topic Models Some Bibliograpy Stochastic Variational Inference Jesus Fernandez Bes Machine Learning Group March 27, 2014 Jesus Fernandez Bes Stochastic Variational Inference

Transcript of Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO...

Page 1: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Stochastic Variational Inference

Jesus Fernandez Bes

Machine Learning Group

March 27, 2014

Jesus Fernandez Bes Stochastic Variational Inference

Page 2: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

http://arxiv.org/abs/1206.7051

Page 3: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

1 Introduction

2 Stochastic Variational InferenceModels with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

3 Stochastic Variational Inference in Topic ModelsTopic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

4 Some Bibliograpy

Jesus Fernandez Bes Stochastic Variational Inference

Page 4: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

MotivationMain Ideas

Challenges of modern data analysis

Massive

Complex

High-dimensional

Probability Models (and Graphical Models) deal with complexity.Scale is the problem.

“Traditional” Variational Inference

1 Inference =⇒ High-dimensional optimization.

2 Solved using Coordinate ascent algorithms.

Analyze ALL the data.Re-estimate hidden structure.Analyze ALL the data.. . .

DO NOT SCALE WITH BIG DATA

Jesus Fernandez Bes Stochastic Variational Inference

Page 5: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

MotivationMain Ideas

How to make a general Variational method that scales.

Use Stochastic Optimization. Follow cheap noisy estimates ofthe gradient.

Use Natural Gradient. Stochastic Variational Inference has anattractive form.

Structure of SVI

1 Subsample one or more data points from the data.

2 Analyze the subsample using current variational parameters.

3 Implement a closed-form update of the parameters.

4 Repeat.

Jesus Fernandez Bes Stochastic Variational Inference

Page 6: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

p(x, z, β|α) = p(β|α)

N∏n=1

p(xn, zn|β)

N observations x = x1:N .

Vector of global hidden variables β.

N local hidden variables z = z1:N each is a collection of Jvariables zn = zn,1:J .

Vector of fixed parameters α.

Jesus Fernandez Bes Stochastic Variational Inference

Page 7: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Complete Conditional assumption

Complete conditionals are in the exponential family

p(β|x, z, α) = h(β) exp{ηg(x, z, α)T t(β)− ag(ηg(x, z, α))}p(znj |xn, zn,−j , β) = h(znj) exp{ηl(xn, zn,−j , β)T t(znj)−al(ηl(xn, zn,−j , β))}

h(·) is the base measure.

a(·) is the log normalizer.

η(·) is the natural parameter vectors.

t(·) are the sufficient statistics.

Several distributions in the exponential family

Bernoulli, Gaussian, Multinomial, Dirichlet, Gamma, Poisson,Beta,...

Jesus Fernandez Bes Stochastic Variational Inference

Page 8: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Examples of this kind of model

Bayesian Mixture Models.

Latent Dirichlet Allocation.

Hidden Markov Models (+ many variants).

Kalman filters (+ many variants).

Hierarchical linear regression models.

Hierarchical probit classification models.

Probabilistic factor analysis/matrix factorization models.

Certain Bayesian nonparametric mixture models.

Jesus Fernandez Bes Stochastic Variational Inference

Page 9: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

GOAL

Approximate the posterior distribution of hidden variables given theobservations.

p(z, β|x) =p(x, z, β)∫

p(x, z, β)dzdβ

The problem with the denominator. Intractable to compute.

Jesus Fernandez Bes Stochastic Variational Inference

Page 10: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

The evidence lower bound (ELBO)

log p(x) = log

∫p(x, z, β)dzdβ

= log

∫p(x, z, β)

q(z, β)

q(z, β)dzdβ

= log

(Eq[p(x, z, β)

q(z, β)

])≥ Eq [log p(x, z, β)]− Eq [log q(z, β)]

, L(q).

KL(q(z, β)‖p(z, β|x)) = −L(q) + const.

Jesus Fernandez Bes Stochastic Variational Inference

Page 11: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Mean-field Approximation

Assumption on q(z, β):

q(z, β) = q(β|λ)

N∏n=1

J∏j=1

q(znj |φnj)

with q(β|λ) and q(znj |φnj) in the same exponential family as thecomplete conditionals.

q(β|λ) = h(β) exp{λT t(β)− ag(λ)}q(znj |φnj) = h(znj) exp{φTnjt(znj)− al(φnj)}

Easy coordinate ascent algorithm.

Jesus Fernandez Bes Stochastic Variational Inference

Page 12: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Gradient of the ELBO and Coordinate Ascent Inference

∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)

∇φnjL = ∇2

φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)

Both of them equal 0 by setting

λ = Eq [ηg(x, z, α)]

φn,j = Eq [ηl(xn, zn,−j , β)]

Jesus Fernandez Bes Stochastic Variational Inference

Page 13: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Gradient, if exists, points to the direction of steepest ascent,

arg maxdλ

f(λ− dλ) subject to ‖dλ‖2 < ε

for small ε. Gradient depends on euclidean distance metric in theparameter space.

In probability distributions euclidean metric can be a bad metric.Jesus Fernandez Bes Stochastic Variational Inference

Page 14: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Natural gradient accounts for the information geometry of itsparameter space.

Symmetrized KL divergence

Natural measure of dissimilarity between probability distributions

DsymKL (λ, λ′) = Eλ

[log

q(β|λ)

q(β|λ′)

]+ Eλ′

[log

q(β|λ′)q(β|λ)

]Using this distance, the direction of steepest ascent is

arg maxdλ

f(λ+ dλ) subject to DsymKL (λ, λ+ dλ) < ε

Jesus Fernandez Bes Stochastic Variational Inference

Page 15: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Natural Gradient

Natural Gradient points in the direction of steeped ascent inthe Riemannian space.

∇̂λf(λ) = G(λ)−1∇λf(λ)

where G(λ) = Eλ[(∇λ log q(β, λ))(∇λ log q(β, λ))T

]is the

fisher information matrix of q(λ).

For exponential family: G(λ) = ∇2λag(λ)

For our mean-field model:

∇̂λL = Eφ [ηq(x, z, α)]− λ

∇̂φnjL = Eλ,φn,−j

[ηl(xn, zn,−j , β)]− φnj

Jesus Fernandez Bes Stochastic Variational Inference

Page 16: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Why Natural Gradients?

Traditional Gradients

∇λL = ∇2λag(λ)(Eq [ηq(x, z, α)]− λ)

∇φnjL = ∇2

φnjal(φnj)(Eq [ηl(xn, zn,−j , β)]− φnj)

Natural Gradients

∇̂λL = Eφ [ηq(x, z, α)]− λ∇̂φnj

L = Eλ,φn,−j[ηl(xn, zn,−j , β)]− φnj

Coordinate ascent is equal to taking a natural gradient step oflength one.

Easier to compute. Use them to develop scalable variationalinferece algorithms.

Jesus Fernandez Bes Stochastic Variational Inference

Page 17: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Stochastic Optimization

We have a random function B(λ) with Eq [B(λ)] = ∇λf(λ). Wecan optimize f(λ) iteratively as,

λ(t) = λ(t−1) + ρtbt(λ(t−1))

where bt is an independent draw from B. The sequence of ρt mustsatisfy Robbins-Monro conditions.

Follow noisy estimates of the gradient with a decreasing stepsize.If gradient can be written as a sum of terms (one per datapoint) a fast noisy approximation can be computed bysubsampling data.λ(t) will converge to the optimal λ∗ (if f is convex) or a localoptimum of f (if not convex *).

Jesus Fernandez Bes Stochastic Variational Inference

Page 18: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

L(λ) =

global︷ ︸︸ ︷Eq [log p(β)]− Eq [log q(β)]

+

N∑n=1

maxφn

(Eq [log p(xn, zn|β)]− Eq [log q(zn)])︸ ︷︷ ︸sum of local

We choose I ∼ Unif(1, · · · , N) and define LI(λ) as the randomfunction

LI(λ) = Eq [log p(β)]− Eq [log q(β)]

+ N maxφI

(Eq [log p(xI , zI |β)]− Eq [log q(zI)])

Expectation of LI is equal to the objective, and consequently∇̂λLI is a noisy but unbiased estimate of the natural gradient ofthe objective.

Jesus Fernandez Bes Stochastic Variational Inference

Page 19: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Stochastic Optimization for global parameters

∇̂λLi = Eq[ηg(x

(N)i , z

(N)i , α)

]− λ

ηg(x(N)i , z

(N)i , α) = α+N · (t(xn, zn), 1)

∇̂λLi = α+N · (Eq [t(xn, zn)] , 1)− λ

Using Stochastic optimization

λ̂t , α+NEφ(λ) [(t(xi, zi), 1)]

λ(t) = λ(t−1) + ρt

(λ̂t − λ(t−1)

)= (1− ρt)λ(t−1) + ρtλ̂t

Jesus Fernandez Bes Stochastic Variational Inference

Page 20: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Stochastic Variational Inference

Jesus Fernandez Bes Stochastic Variational Inference

Page 21: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Models with local and global hidden variablesMean-field variational inferenceThe natural gradient of the ELBOStochastic Variational Inference

Extensions

Minibatches

Pick more than one data point each time,

λ(t) = (1− ρt)λ(t−1) +ρtS

∑s

λ̂s.

Empirical Bayes estimation of hyperparameters

Get a point estimate of the value of hyperparameters α

α(t) = α(t−1) + ρt∇αLt(λ(t−1), φ, α(t−1)).

Jesus Fernandez Bes Stochastic Variational Inference

Page 22: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Topic Models

Observations:

Words wdn is the nth word in the dth document. Element of afixed vocabulary of V terms.

Latent Variables:

A topic βk is a distribution over the vocabulary. Point inV − 1-simplex.Topic proportions θd are asociated to each document.Distribution over topics.Each word in each document comes from a single topic. TopicAssignment zdn are topic indexes.

Consider two models: Latent Dirichlet Allocation (LDA) has afixed number of K topics. Hierarchical Dirichlet Process (HDP)has infinite number of topics.

Jesus Fernandez Bes Stochastic Variational Inference

Page 23: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Analyzing the documents

Posterior inference of p(β, θ, z|w)

Jesus Fernandez Bes Stochastic Variational Inference

Page 24: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Generative model

1 Draw topics bk ∼ Dirichlet(η, · · · , η).2 For each document d ∈ {1, · · · , D}:

1 Draw topic proportions θ ∼ Dirichlet(α, · · · , α).2 For each word w ∈ {1, · · · , N}:

1 Draw topic assignment zdn ∼ Multinomial(θd).2 Draw word wdn ∼ Multinomial(βzdn).

Jesus Fernandez Bes Stochastic Variational Inference

Page 25: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Variational Inference in LDA

Mean-field for LDA

q(zdn) = Multinomial(φdn)

q(θd) = Dirichlet(γd)

q(βk) = Dirichlet(λk)

1 Update per-document d local variational parameters

φkdn ∝ exp{Ψ(γdk) + Ψ(λk,wdn)−Ψ(

∑v

λkv)} for n ∈ {1, · · · , N}

γd = α+

N∑n=1

φdn

2 Update global parameters λk = η +∑D

d=1

∑Nn=1 φ

kdnwdn

Jesus Fernandez Bes Stochastic Variational Inference

Page 26: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Stochastic Variational Inference in LDA

Jesus Fernandez Bes Stochastic Variational Inference

Page 27: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Results LDA

DATA

Nature: 350k docs, 58M words, 4200 terms.

New York Times: 1.8M docs, 461M words, 8000 terms.

Wikipedia: 3.8M docs,482M words, 7700 terms.

* Batch Variational uses a subset of 100k docs.Jesus Fernandez Bes Stochastic Variational Inference

Page 28: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Topic ModelsLatent Diriclet AllocationHierarchichal Dirichlet Process

Results HDP

Jesus Fernandez Bes Stochastic Variational Inference

Page 29: Stochastic Variational Inference - UC3Mjesusfbes/MLG_SVI.pdfThe natural gradient of the ELBO Stochastic Variational Inference 3 Stochastic Variational Inference in Topic Models Topic

IntroductionStochastic Variational Inference

Stochastic Variational Inference in Topic ModelsSome Bibliograpy

Some Bibliograpy

Main Paper

Hoffman, M. D., and Blei, D. M., and Wang, C., andPaisley, J. (2013). “Stochastic variational inference”. The Journalof Machine Learning Research, 14(1), 1303-1347.

Other References

Blei, D. M.. “Variational Inference”. Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/

cos597C/lectures/variational-inference-i.pdf.

Blei, D. M., “Exponential Families,” Lecture Notes ofCOS597C: Advanced Methods in Probabilistic Modeling,Princeton University, fall 2011,www.cs.princeton.edu/courses/archive/fall11/

cos597C/lectures/exponential-families.pdf.

Jesus Fernandez Bes Stochastic Variational Inference