Download - Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

Transcript
Page 1: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

Stochastic Gradient VBand the Variational Auto-Encoder

Stochastic Gradient VBand the Variational Auto-Encoder

Durk KingmaPh.D. Candidate (2nd year) advised by Max Welling

Kingma, Diederik P., and Max Welling."Stochastic Gradient VB and the variational auto-encoder." (arXiv)

Quite similar:Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra.“Stochastic back propagation and variational inference in deep latent gaussian models. ” (arXiv)

Page 2: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

2D.P. Kingma

ContentsContents

● Stochastic Variational Inference and learning

– SGVB algorithm● Variational auto-encoder

– Experiments● Reparameterizations

– Effect on posterior correlations

Page 3: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

4D.P. Kingma

General setupGeneral setup

● Setup:

– x : observed variables

– z : unobserved/latent variables

– θ : model parameters

– pθ(x,z): joint PDF

● Factorized, differentiable● Factors can be anything, e.g. neural nets

● We want:

– Fast approximate posterior inference p(z|x)

– Learn the parameters θ (e.g. MAP estimate)

Example:

Page 4: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

5D.P. Kingma

ExampleExample

Page 5: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

6D.P. Kingma

LearningLearning

– Regular EM requires tractable pθ(z|x)

– Monte Carlo EM (MCEM) requires sampling from the posterior (slow....)

– Mean-field VB requires certain closed-form solutions to certain expectations of the joint PDF

Page 6: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

7D.P. Kingma

Naive pure MAP optimization approachNaive pure MAP optimization approach

Overfits with high dimensionality of latent space

Page 7: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

8D.P. Kingma

Novel approach:Stochastic Gradient VB

Novel approach:Stochastic Gradient VB

● Optimizes a lower bound of the marginal likelihood of the data● Scales to very large datasets● Scales to high-dimensional latent space● Simple● Fast!● Applies to almost any normalized model with continuous latent

variables

Page 8: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

The Variational BoundThe Variational Bound

● We introduce the variational approximation:

– Distribution can be almost anything (we use Gaussian)

– Will approximate the true (but intractable) posterior

Marginal likelihood can be written as:

This bound is exactly we want to optimize!(w.r.t. φ and θ)

Page 9: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

“Naive” Monte Carlo estimator of the bound“Naive” Monte Carlo estimator of the bound

Problem: not appropriate for differentiation w.r.t. φ!(Cannot differentiate through sampling process).

Recently proposed solutions (2013)

– Michael Jordan / David Blei (very high variance)

– Tim Salimans (2013): (only applies to Exponential Family q)

– Rajesh Ranganath et al,“Black Box Variational Inference”, arXiv 2014

– Andriy Mnih & Karol Gregor, “Neural Variational Inference and Learning”, arXiv 2014

Page 10: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

11D.P. Kingma

Key “reparameterization trick”Key “reparameterization trick”

Example qφ(z) p(ε) g(φ, ε) Also...

Normal dist. z ~ N(μ,σ) ε ~ N(0,1) z = μ + σ * ε Location-scale familie: Laplace, Elliptical, Student’s t, Logistic, Uniform, Triangular, ...

Exponential z ~ exp(λ) ε ~ U(0,1) z = -log(1 – ε)/λ Invertible CDF: Cauchy, Logistic, Rayleigh, Pareto,Weibull, Reciprocal, Gompertz, Gumbel and Erlan, ...

Other z ~ logN(μ,σ) ε ~ N(0,1) z = exp(μ + σ * ε) Gamma, Dirichlet, Beta, Chi-Squared, and F distributions

Alternative way of sampling from qφ(z):

1. Choose some ε ~ p(ε) (independent of φ!)

2. Choose some z = g(φ, ε)

Such that z ~ qφ(z) (the correct distribution)

Page 11: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

SGVB estimatorSGVB estimator

Really simple and appropriate for differentiation w.r.t. φ and θ!

Page 12: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

13D.P. Kingma

Basic SGVB Algorithm (L=1)Basic SGVB Algorithm (L=1)

repeat

until convergence

Torch7Theano

Page 13: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

“Auto-Encoding” VB:efficient on-line version of SGVB

“Auto-Encoding” VB:efficient on-line version of SGVB

● Special case of SGVB:

Large i.i.d. dataset (large N)=> Many variational parameters to learn

● Solution:

– Use conditional:● Neural network

– Doubly stochastic optimization

Avoid local parameters!

Page 14: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

16D.P. Kingma

“Auto-encoding” Stochastic VB (L=1)“Auto-encoding” Stochastic VB (L=1)

repeat

until convergence

Scales to very large datasets!

Page 15: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

17D.P. Kingma

Experiments with “variational auto-encoder”Experiments with “variational auto-encoder”

x p(x|z)

z = g(φ,ε,x)

x

h2

ε

h1

zε ~ p(ε)

Generative model p(x|z) (neural net)

Posterior approximation q(z|x) (neural net)

Objective:

(noisy) negative reconstruction error regularization term

Page 16: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

18D.P. Kingma

ExperimentsExperiments

Page 17: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

19D.P. Kingma

Results:Marginal likelihood lower bound

Results:Marginal likelihood lower bound

Page 18: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

20D.P. Kingma

Results: Marginal log-likelihoodResults: Marginal log-likelihood

MCEM does not scale well to large datasets

Page 19: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

21D.P. Kingma

Robustness to high-dimensional latent space

Robustness to high-dimensional latent space

Page 20: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

22D.P. Kingma

Page 21: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

23D.P. Kingma

learned 2D manifoldslearned 2D manifolds

Page 22: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

24D.P. Kingma

learned 3D manifoldlearned 3D manifold

Page 23: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

25D.P. Kingma

Samples from MNISTSamples from MNIST

Page 24: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

26D.P. Kingma

Reparameterizations of latent variablesReparameterizations of latent variables

Page 25: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

27D.P. Kingma

Reparameterization of continuous latent variables

Reparameterization of continuous latent variables

● Alternative parameterization of latent variables.

Choose some:

– ε ~ p(ε)

– z = g(φ, ε) (invertible)

– z | pa ~ p(z|pa)(the correct distribution)

● z's become determinstic given ε's

● ε's are a priori independent

Large difference in posterior dependencies, efficiency

Centered form Non-centered form(Neural net with injected noise)

Page 26: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

Experiment: MCMC sampling in DBNExperiment: MCMC sampling in DBN

Centered form Non-centered formSamples Autocorrelation Samples Autocorrelation

Fast mixingTerribly slow mixing

Page 27: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

D.P. Kingma

“Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets”

Diederik P Kingma, Max Welling

For more information and analysis see:For more information and analysis see:

Page 28: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

30D.P. Kingma

ConclusionConclusion

● SGVB: efficient stochastic variational algorithm for inference and learning with continuous latent variables.

Theano and pure numpy implementations:

https://github.com/y0ast/Variational-Autoencoder.git

(includes scikit-learn wrappers)

Thanks!

Page 29: Stochastic Gradient VB and the Variational Auto …..."Stochastic Gradient VB and the variational auto-encoder." (arXiv) Quite similar: Danilo Jimenez Rezende, Shakir Mohamed, and

31D.P. Kingma

AppendixThe regular SVB gradient estimator

AppendixThe regular SVB gradient estimator