Auto encoding-variational-bayes

Click here to load reader

  • date post

    08-Jan-2017
  • Category

    Science

  • view

    571
  • download

    0

Embed Size (px)

Transcript of Auto encoding-variational-bayes

  • Auto-encoding variational Bayes

    Diederik P Kingma1 Max Welling2

    Presented by : Mehdi Cherti (LAL/CNRS)

    9th May 2015

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • What is a generative model ?

    A model of how the data X was generated

    Typically, the purpose is to nd a model for : p(x) or p(x , y)

    y can be a set of latent (hidden) variables or a set of outputvariables, for discriminative problems

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Training generative models

    Typically, we assume a parametric form of the probabilitydensity :

    p(x |)

    Given an i.i.d dataset : X = (x1, x2, ..., xN), we typically do :

    Maximum likelihood (ML) : argmaxp(X |)Maximum a posteriori (MAP) : argmaxp(X |)p()Bayesian inference : p(|X ) = p(x|)p()

    p(x|)p()d

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • The problem

    let x be the observed variables

    we assume a latent representation z

    we dene p(z) and p(x |z)

    We want to design a generative model where:

    p(x) =p(x |z)p(z)dz is intractable

    p(z |x) = p(x |z)p(z)/p(x) is intractablewe have large datasets : we want to avoid sampling basedtraining procedures (e.g MCMC)

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • The proposed solution

    They propose:

    a fast training procedure that estimates the parameters : fordata generation

    an approximation of the posterior p(z |x) : for datarepresentation

    an approximation of the marginal p(x) : for modelevaluation and as a prior for other tasks

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Formulation of the problem

    the process of generation consists of sampling z from p(z) then xfrom p(x |z).Let's dene :

    a prior over over the latent representation p(z),

    a decoder p(x |z)We want to maximize the log-likelihood of the data(x (1), x (2), ..., x (N)):

    logp(x(1), x (2), ..., x (N)) =

    i

    logp(xi )

    and be able to do inference : p(z |x)

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • The variational lower bound

    We will learn an approximate of p(z |x) : q(z |x) bymaximizing a lower bound of the log-likelihood of the data

    We can write :logp(x) = DKL(q(z |x)||p(z |x)) + L(, , x) where:

    L(,, x) = Eq(z|x)[logp(x , z) logq(z |x)]

    L(,, x)is called the variational lower bound, and the goal isto maximize it w.r.t to all the parameters (,)

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Estimating the lower bound gradients

    We need to compute L(,,x) ,L(,,x)

    to apply gradientdescent

    For that, we use the reparametrisation trick : we samplefrom a noise variable p() and apply a determenistic functionto it so that we obtain correct samples from q(z |x), meaning:

    if p() we nd g so that if z = g(x , , ) then z q(z |x)g can be the inverse CDF of q(z |x) if is uniform

    With the reparametrisation trick we can rewrite L:

    L(,, x) = Ep()[logp(x , g(x , , )) logq(g(x , , )|x)]

    We then estimate the gradients with Monte Carlo

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • A connection with auto-encoders

    Note that L can also be written in this form:

    L(, , x) = DKL(q(z |x)||p(z)) + Eq(z|x)[logp(x |z)]

    We can interpret the rst term as a regularizer : it forcesq(z |x) to not be too divergent from the prior p(z)We can interpret the (-second term) as the reconstructionerror

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • The algorithm

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Variational auto-encoders

    It is a model example which uses the procedure describedabove to maximize the lower bound

    In V.A, we choose:

    p(z) = N(0, I)p(x |z) :

    is normal distribution for real data, we have neural network

    decoder that computes and of this distribution from zis multivariate bernoulli for boolean data, we have neural

    network decoder that computes the probability of 1 from z

    q(z |x) = N((x), (x)I) : we have a neural networkencoder that computes and of q(z |x) from x N(0, I) and z = g(x , , ) = (x) + (x)

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Experiments (1)

    Samples from MNIST:

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Experiments (2)

    2D-Latent space manifolds from MNIST and Frey datasets

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Experiments (3)

    Comparison of the lower bound with the Wake-sleep algorithm :

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Experiments (4)

    Comparison of the marginal log-likelihood with Wake-Sleep andMonte Carlo EM (MCEM):

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes

  • Implementation : https://github.com/mehdidc/lasagnekit

    Diederik P Kingma, Max Welling Auto-encoding variational Bayes