Auto encoding-variational-bayes
date post
08-Jan-2017Category
Science
view
571download
0
Embed Size (px)
Transcript of Auto encoding-variational-bayes
Auto-encoding variational Bayes
Diederik P Kingma1 Max Welling2
Presented by : Mehdi Cherti (LAL/CNRS)
9th May 2015
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
What is a generative model ?
A model of how the data X was generated
Typically, the purpose is to nd a model for : p(x) or p(x , y)
y can be a set of latent (hidden) variables or a set of outputvariables, for discriminative problems
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Training generative models
Typically, we assume a parametric form of the probabilitydensity :
p(x |)
Given an i.i.d dataset : X = (x1, x2, ..., xN), we typically do :
Maximum likelihood (ML) : argmaxp(X |)Maximum a posteriori (MAP) : argmaxp(X |)p()Bayesian inference : p(|X ) = p(x|)p()
p(x|)p()d
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
The problem
let x be the observed variables
we assume a latent representation z
we dene p(z) and p(x |z)
We want to design a generative model where:
p(x) =p(x |z)p(z)dz is intractable
p(z |x) = p(x |z)p(z)/p(x) is intractablewe have large datasets : we want to avoid sampling basedtraining procedures (e.g MCMC)
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
The proposed solution
They propose:
a fast training procedure that estimates the parameters : fordata generation
an approximation of the posterior p(z |x) : for datarepresentation
an approximation of the marginal p(x) : for modelevaluation and as a prior for other tasks
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Formulation of the problem
the process of generation consists of sampling z from p(z) then xfrom p(x |z).Let's dene :
a prior over over the latent representation p(z),
a decoder p(x |z)We want to maximize the log-likelihood of the data(x (1), x (2), ..., x (N)):
logp(x(1), x (2), ..., x (N)) =
i
logp(xi )
and be able to do inference : p(z |x)
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
The variational lower bound
We will learn an approximate of p(z |x) : q(z |x) bymaximizing a lower bound of the log-likelihood of the data
We can write :logp(x) = DKL(q(z |x)||p(z |x)) + L(, , x) where:
L(,, x) = Eq(z|x)[logp(x , z) logq(z |x)]
L(,, x)is called the variational lower bound, and the goal isto maximize it w.r.t to all the parameters (,)
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Estimating the lower bound gradients
We need to compute L(,,x) ,L(,,x)
to apply gradientdescent
For that, we use the reparametrisation trick : we samplefrom a noise variable p() and apply a determenistic functionto it so that we obtain correct samples from q(z |x), meaning:
if p() we nd g so that if z = g(x , , ) then z q(z |x)g can be the inverse CDF of q(z |x) if is uniform
With the reparametrisation trick we can rewrite L:
L(,, x) = Ep()[logp(x , g(x , , )) logq(g(x , , )|x)]
We then estimate the gradients with Monte Carlo
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
A connection with auto-encoders
Note that L can also be written in this form:
L(, , x) = DKL(q(z |x)||p(z)) + Eq(z|x)[logp(x |z)]
We can interpret the rst term as a regularizer : it forcesq(z |x) to not be too divergent from the prior p(z)We can interpret the (-second term) as the reconstructionerror
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
The algorithm
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Variational auto-encoders
It is a model example which uses the procedure describedabove to maximize the lower bound
In V.A, we choose:
p(z) = N(0, I)p(x |z) :
is normal distribution for real data, we have neural network
decoder that computes and of this distribution from zis multivariate bernoulli for boolean data, we have neural
network decoder that computes the probability of 1 from z
q(z |x) = N((x), (x)I) : we have a neural networkencoder that computes and of q(z |x) from x N(0, I) and z = g(x , , ) = (x) + (x)
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Experiments (1)
Samples from MNIST:
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Experiments (2)
2D-Latent space manifolds from MNIST and Frey datasets
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Experiments (3)
Comparison of the lower bound with the Wake-sleep algorithm :
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Experiments (4)
Comparison of the marginal log-likelihood with Wake-Sleep andMonte Carlo EM (MCEM):
Diederik P Kingma, Max Welling Auto-encoding variational Bayes
Implementation : https://github.com/mehdidc/lasagnekit
Diederik P Kingma, Max Welling Auto-encoding variational Bayes