Download - Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Transcript
Page 1: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Variational Autoencoders - An Introduction

Devon Graham

University of British [email protected]

Oct 31st, 2017

Page 2: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Table of contents

Introduction

Deep Learning Perspective

Probabilistic Model Perspective

Applications

Conclusion

Page 3: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction

I Auto-Encoding Variational Bayes, Diederik P. Kingma andMax Welling, ICLR 2014

I Generative model

I Running example: Want to generate realistic-looking MNISTdigits (or celebrity faces, video game plants, cat pictures, etc)

I https://jaan.io/

what-is-variational-autoencoder-vae-tutorial/

I Deep Learning perspective and Probabilistic Model perspective

3 / 30

Page 4: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction

I Auto-Encoding Variational Bayes, Diederik P. Kingma andMax Welling, ICLR 2014

I Generative model

I Running example: Want to generate realistic-looking MNISTdigits (or celebrity faces, video game plants, cat pictures, etc)

I https://jaan.io/

what-is-variational-autoencoder-vae-tutorial/

I Deep Learning perspective and Probabilistic Model perspective

3 / 30

Page 5: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction

I Auto-Encoding Variational Bayes, Diederik P. Kingma andMax Welling, ICLR 2014

I Generative model

I Running example: Want to generate realistic-looking MNISTdigits (or celebrity faces, video game plants, cat pictures, etc)

I https://jaan.io/

what-is-variational-autoencoder-vae-tutorial/

I Deep Learning perspective and Probabilistic Model perspective

3 / 30

Page 6: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction

I Auto-Encoding Variational Bayes, Diederik P. Kingma andMax Welling, ICLR 2014

I Generative model

I Running example: Want to generate realistic-looking MNISTdigits (or celebrity faces, video game plants, cat pictures, etc)

I https://jaan.io/

what-is-variational-autoencoder-vae-tutorial/

I Deep Learning perspective and Probabilistic Model perspective

3 / 30

Page 7: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction

I Auto-Encoding Variational Bayes, Diederik P. Kingma andMax Welling, ICLR 2014

I Generative model

I Running example: Want to generate realistic-looking MNISTdigits (or celebrity faces, video game plants, cat pictures, etc)

I https://jaan.io/

what-is-variational-autoencoder-vae-tutorial/

I Deep Learning perspective and Probabilistic Model perspective

3 / 30

Page 8: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction - Autoencoders

I

I Attempt to learn identity function

I Constrained in some way (e.g., small latent vectorrepresentation)

I Can generate new images by giving different latent vectors totrained network

I Variational: use probabilistic latent encoding

4 / 30

Page 9: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction - Autoencoders

I

I Attempt to learn identity function

I Constrained in some way (e.g., small latent vectorrepresentation)

I Can generate new images by giving different latent vectors totrained network

I Variational: use probabilistic latent encoding

4 / 30

Page 10: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction - Autoencoders

I

I Attempt to learn identity function

I Constrained in some way (e.g., small latent vectorrepresentation)

I Can generate new images by giving different latent vectors totrained network

I Variational: use probabilistic latent encoding

4 / 30

Page 11: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction - Autoencoders

I

I Attempt to learn identity function

I Constrained in some way (e.g., small latent vectorrepresentation)

I Can generate new images by giving different latent vectors totrained network

I Variational: use probabilistic latent encoding

4 / 30

Page 12: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Introduction - Autoencoders

I

I Attempt to learn identity function

I Constrained in some way (e.g., small latent vectorrepresentation)

I Can generate new images by giving different latent vectors totrained network

I Variational: use probabilistic latent encoding

4 / 30

Page 13: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Deep Learning Perspective

5 / 30

Page 14: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Deep Learning Perspective

I Goal: Build a neural network that generates MNIST digitsfrom random (Gaussian) noise

I Define two sub-networks: Encoder and Decoder

I Define a Loss Function

6 / 30

Page 15: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Deep Learning Perspective

I Goal: Build a neural network that generates MNIST digitsfrom random (Gaussian) noise

I Define two sub-networks: Encoder and Decoder

I Define a Loss Function

6 / 30

Page 16: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Deep Learning Perspective

I Goal: Build a neural network that generates MNIST digitsfrom random (Gaussian) noise

I Define two sub-networks: Encoder and Decoder

I Define a Loss Function

6 / 30

Page 17: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Encoder

I A neural network qθ(z |x)

I Input: datapoint x (e.g. 28× 28-pixel MNIST digit)

I Output: encoding z , drawn from Gaussian density withparameters θ

I |z | � |x |

I

7 / 30

Page 18: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Encoder

I A neural network qθ(z |x)

I Input: datapoint x (e.g. 28× 28-pixel MNIST digit)

I Output: encoding z , drawn from Gaussian density withparameters θ

I |z | � |x |

I

7 / 30

Page 19: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Encoder

I A neural network qθ(z |x)

I Input: datapoint x (e.g. 28× 28-pixel MNIST digit)

I Output: encoding z , drawn from Gaussian density withparameters θ

I |z | � |x |

I

7 / 30

Page 20: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Encoder

I A neural network qθ(z |x)

I Input: datapoint x (e.g. 28× 28-pixel MNIST digit)

I Output: encoding z , drawn from Gaussian density withparameters θ

I |z | � |x |

I

7 / 30

Page 21: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Encoder

I A neural network qθ(z |x)

I Input: datapoint x (e.g. 28× 28-pixel MNIST digit)

I Output: encoding z , drawn from Gaussian density withparameters θ

I |z | � |x |

I

7 / 30

Page 22: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Decoder

I A neural network pφ(x |z), parameterized by φ

I Input: encoding z , output from encoder

I Output: reconstruction x̃ , drawn from distribution of the data

I E.g., output parameters for 28× 28 Bernoulli variables

I

8 / 30

Page 23: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Decoder

I A neural network pφ(x |z), parameterized by φ

I Input: encoding z , output from encoder

I Output: reconstruction x̃ , drawn from distribution of the data

I E.g., output parameters for 28× 28 Bernoulli variables

I

8 / 30

Page 24: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Decoder

I A neural network pφ(x |z), parameterized by φ

I Input: encoding z , output from encoder

I Output: reconstruction x̃ , drawn from distribution of the data

I E.g., output parameters for 28× 28 Bernoulli variables

I

8 / 30

Page 25: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Decoder

I A neural network pφ(x |z), parameterized by φ

I Input: encoding z , output from encoder

I Output: reconstruction x̃ , drawn from distribution of the data

I E.g., output parameters for 28× 28 Bernoulli variables

I

8 / 30

Page 26: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Decoder

I A neural network pφ(x |z), parameterized by φ

I Input: encoding z , output from encoder

I Output: reconstruction x̃ , drawn from distribution of the data

I E.g., output parameters for 28× 28 Bernoulli variables

I

8 / 30

Page 27: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I x̃ is reconstructed from z where |z | � |x̃ |

I How much information is lost when we go from x to z to x̃?

I Measure this with reconstruction log-likelihood: log pφ(x |z)

I Measures how effectively the decoder has learned toreconstruct x given the latent representation z

9 / 30

Page 28: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I x̃ is reconstructed from z where |z | � |x̃ |I How much information is lost when we go from x to z to x̃?

I Measure this with reconstruction log-likelihood: log pφ(x |z)

I Measures how effectively the decoder has learned toreconstruct x given the latent representation z

9 / 30

Page 29: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I x̃ is reconstructed from z where |z | � |x̃ |I How much information is lost when we go from x to z to x̃?

I Measure this with reconstruction log-likelihood: log pφ(x |z)

I Measures how effectively the decoder has learned toreconstruct x given the latent representation z

9 / 30

Page 30: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I x̃ is reconstructed from z where |z | � |x̃ |I How much information is lost when we go from x to z to x̃?

I Measure this with reconstruction log-likelihood: log pφ(x |z)

I Measures how effectively the decoder has learned toreconstruct x given the latent representation z

9 / 30

Page 31: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Loss function is negative reconstruction log-likelihood +regularizer

I Loss decomposes into term for each datapoint:

L(θ, φ) =N∑i=1

li (θ, φ)

I Loss for datapoint xi :

li (θ, φ) = −Ez∼qθ(z|xi )[

log pφ(xi |z)]

+ KL(qθ(z |xi )||p(z)

)

10 / 30

Page 32: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Loss function is negative reconstruction log-likelihood +regularizer

I Loss decomposes into term for each datapoint:

L(θ, φ) =N∑i=1

li (θ, φ)

I Loss for datapoint xi :

li (θ, φ) = −Ez∼qθ(z|xi )[

log pφ(xi |z)]

+ KL(qθ(z |xi )||p(z)

)

10 / 30

Page 33: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Loss function is negative reconstruction log-likelihood +regularizer

I Loss decomposes into term for each datapoint:

L(θ, φ) =N∑i=1

li (θ, φ)

I Loss for datapoint xi :

li (θ, φ) = −Ez∼qθ(z|xi )[

log pφ(xi |z)]

+ KL(qθ(z |xi )||p(z)

)

10 / 30

Page 34: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Negative reconstruction log-likelihood:

−Ez∼qθ(z|xi )[

log pφ(xi |z)]

I Encourages decoder to learn to reconstruct the data

I Expectation taken over distribution of latent representations

11 / 30

Page 35: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Negative reconstruction log-likelihood:

−Ez∼qθ(z|xi )[

log pφ(xi |z)]

I Encourages decoder to learn to reconstruct the data

I Expectation taken over distribution of latent representations

11 / 30

Page 36: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I Negative reconstruction log-likelihood:

−Ez∼qθ(z|xi )[

log pφ(xi |z)]

I Encourages decoder to learn to reconstruct the data

I Expectation taken over distribution of latent representations

11 / 30

Page 37: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]

I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 38: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 39: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 40: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 41: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 42: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 43: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Loss Function

I KL Divergence as regularizer:

KL(qθ(z |xi )||p(z)

)= Ez∼qθ(z|xi )

[log qθ(z |xi )− log p(z)

]I Measures information lost when using qθ to represent p

I We will use p(z) = N (0, I)

I Encourages encoder to produce z ’s that are close to standardnormal distribution

I Encoder learns a meaningful representation of MNIST digits

I Representation for images of the same digit are close togetherin latent space

I Otherwise could “memorize” the data and map each observeddatapoint to a distinct region of space

12 / 30

Page 44: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

MNIST latent variable space

13 / 30

Page 45: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � εI ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 46: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � εI ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 47: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � ε

I ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 48: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � εI ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 49: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � εI ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 50: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Reparameterization trick

I We want to use gradient descent to learn the model’sparameters

I Given z drawn from qθ(z |x), how do we take derivatives of (afunction of) z w.r.t. θ?

I We can reparameterize: z = µ+ σ � εI ε ∼ N (0, I), and � is element-wise product

I Can take derivatives of (functions of) z w.r.t. µ and σ

I Output of qθ(z |x) is vector of µ’s and vector of σ’s

14 / 30

Page 51: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Summary

I Deep Learning objective is to minimize the loss function:

L(θ, φ) =N∑i=1

(− Ez∼qθ(z|xi )

[log pφ(xi |z)

]+ KL

(qθ(z |xi )||p(z)

))

15 / 30

Page 52: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

16 / 30

Page 53: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Data x and latent variables z

I Joint pdf of the model: p(x , z) = p(x |z)p(z)

I Decomposes into likelihood: p(x |z), and prior: p(z)

I Generative process:Draw latent variables zi ∼ p(z)Draw datapoint xi ∼ p(x |z)

I Graphical model:

17 / 30

Page 54: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Data x and latent variables z

I Joint pdf of the model: p(x , z) = p(x |z)p(z)

I Decomposes into likelihood: p(x |z), and prior: p(z)

I Generative process:Draw latent variables zi ∼ p(z)Draw datapoint xi ∼ p(x |z)

I Graphical model:

17 / 30

Page 55: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Data x and latent variables z

I Joint pdf of the model: p(x , z) = p(x |z)p(z)

I Decomposes into likelihood: p(x |z), and prior: p(z)

I Generative process:Draw latent variables zi ∼ p(z)Draw datapoint xi ∼ p(x |z)

I Graphical model:

17 / 30

Page 56: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Data x and latent variables z

I Joint pdf of the model: p(x , z) = p(x |z)p(z)

I Decomposes into likelihood: p(x |z), and prior: p(z)

I Generative process:Draw latent variables zi ∼ p(z)Draw datapoint xi ∼ p(x |z)

I Graphical model:

17 / 30

Page 57: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Data x and latent variables z

I Joint pdf of the model: p(x , z) = p(x |z)p(z)

I Decomposes into likelihood: p(x |z), and prior: p(z)

I Generative process:Draw latent variables zi ∼ p(z)Draw datapoint xi ∼ p(x |z)

I Graphical model:

17 / 30

Page 58: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 59: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 60: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 61: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 62: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 63: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /

I Intractable

18 / 30

Page 64: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Suppose we want to do inference in this model

I We would like to infer good values of z , given observed data

I Then we could use them to generate real-looking MNISTdigits

I We want to calculate the posterior:

p(z |x) =p(x |z)p(z)

p(x)

I Need to calculate evidence: p(x) =∫p(x |z)p(z)dz

I Integral over all configurations of latent variables /I Intractable

18 / 30

Page 65: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Variational inference to the rescue!

I Let’s approximate the true posterior p(z |x) with the ‘best’distribution from some family qλ(z |x)

I Which choice of λ gives the ‘best’ qλ(z |x)?

I KL divergence measures information lost when using qλ toapproximate p

I Choose λ to minimize KL(qλ(z |x)||p(z |x)

)= KL

(qλ||p

)

19 / 30

Page 66: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Variational inference to the rescue!

I Let’s approximate the true posterior p(z |x) with the ‘best’distribution from some family qλ(z |x)

I Which choice of λ gives the ‘best’ qλ(z |x)?

I KL divergence measures information lost when using qλ toapproximate p

I Choose λ to minimize KL(qλ(z |x)||p(z |x)

)= KL

(qλ||p

)

19 / 30

Page 67: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Variational inference to the rescue!

I Let’s approximate the true posterior p(z |x) with the ‘best’distribution from some family qλ(z |x)

I Which choice of λ gives the ‘best’ qλ(z |x)?

I KL divergence measures information lost when using qλ toapproximate p

I Choose λ to minimize KL(qλ(z |x)||p(z |x)

)= KL

(qλ||p

)

19 / 30

Page 68: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Variational inference to the rescue!

I Let’s approximate the true posterior p(z |x) with the ‘best’distribution from some family qλ(z |x)

I Which choice of λ gives the ‘best’ qλ(z |x)?

I KL divergence measures information lost when using qλ toapproximate p

I Choose λ to minimize KL(qλ(z |x)||p(z |x)

)= KL

(qλ||p

)

19 / 30

Page 69: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Variational inference to the rescue!

I Let’s approximate the true posterior p(z |x) with the ‘best’distribution from some family qλ(z |x)

I Which choice of λ gives the ‘best’ qλ(z |x)?

I KL divergence measures information lost when using qλ toapproximate p

I Choose λ to minimize KL(qλ(z |x)||p(z |x)

)= KL

(qλ||p

)

19 / 30

Page 70: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I

KL(qλ||p

):= Ez∼qλ

[log qλ(z |x)− log p(z |x)

]= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

I Still contains p(x) term! So cannot compute directly

I But p(x) does not depend on λ, so still hope

20 / 30

Page 71: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I

KL(qλ||p

):= Ez∼qλ

[log qλ(z |x)− log p(z |x)

]= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

I Still contains p(x) term! So cannot compute directly

I But p(x) does not depend on λ, so still hope

20 / 30

Page 72: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I

KL(qλ||p

):= Ez∼qλ

[log qλ(z |x)− log p(z |x)

]= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

I Still contains p(x) term! So cannot compute directly

I But p(x) does not depend on λ, so still hope

20 / 30

Page 73: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Define Evidence Lower BOund:

ELBO(λ) := Ez∼qλ[

log p(x , z)]− Ez∼qλ

[log qλ(z |x)

]

I Then

KL(qλ||p

)= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

= −ELBO(λ) + log p(x)

I So minimizing KL(qλ||p

)w.r.t. λ is equivalent to maximizing

ELBO(λ)

21 / 30

Page 74: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Define Evidence Lower BOund:

ELBO(λ) := Ez∼qλ[

log p(x , z)]− Ez∼qλ

[log qλ(z |x)

]I Then

KL(qλ||p

)= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

= −ELBO(λ) + log p(x)

I So minimizing KL(qλ||p

)w.r.t. λ is equivalent to maximizing

ELBO(λ)

21 / 30

Page 75: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Define Evidence Lower BOund:

ELBO(λ) := Ez∼qλ[

log p(x , z)]− Ez∼qλ

[log qλ(z |x)

]I Then

KL(qλ||p

)= Ez∼qλ

[log qλ(z |x)

]− Ez∼qλ

[log p(x , z)

]+ log p(x)

= −ELBO(λ) + log p(x)

I So minimizing KL(qλ||p

)w.r.t. λ is equivalent to maximizing

ELBO(λ)

21 / 30

Page 76: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Since no two datapoints share latent variables, we can write:

ELBO(λ) =N∑i=1

ELBOi (λ)

I Where

ELBOi (λ) = Ez∼qλ(z|xi )[

log p(xi , z)]− Ez∼qλ(z|xi )

[log qλ(z |xi )

]

22 / 30

Page 77: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I Since no two datapoints share latent variables, we can write:

ELBO(λ) =N∑i=1

ELBOi (λ)

I Where

ELBOi (λ) = Ez∼qλ(z|xi )[

log p(xi , z)]− Ez∼qλ(z|xi )

[log qλ(z |xi )

]

22 / 30

Page 78: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I We can rewrite the term ELBOi (λ):

ELBOi (λ) = Ez∼qλ(z|xi )[

log p(xi , z)]− Ez∼qλ(z|xi )

[log qλ(z |xi )

]= Ez∼qλ(z|xi )

[log p(xi |z) + log p(z)

]− Ez∼qλ(z|xi )

[log qλ(z |xi )

]= Ez∼qλ(z|xi )

[log p(xi |z)

]− Ez∼qλ(z|xi )

[log qλ(z |xi )− log p(z)

]= Ez∼qλ(z|xi )

[log p(xi |z)

]− KL

(qλ(z |xi )||p(z)

)

23 / 30

Page 79: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I How do we relate λ to φ and θ seen earlier?

I We can parameterize approximate posterior qθ(z |x , λ) by anetwork that takes data x and outputs parameters λ

I Parameterize the likelihood p(x |z) with a network that takeslatent variables and outputs parameters to the datadistribution pφ(x |z)

I So we can re-write

ELBOi (θ, φ) = Ez∼qθ(z|xi )[

log pφ(xi |z)]− KL

(qθ(z |xi )||p(z)

)

24 / 30

Page 80: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I How do we relate λ to φ and θ seen earlier?

I We can parameterize approximate posterior qθ(z |x , λ) by anetwork that takes data x and outputs parameters λ

I Parameterize the likelihood p(x |z) with a network that takeslatent variables and outputs parameters to the datadistribution pφ(x |z)

I So we can re-write

ELBOi (θ, φ) = Ez∼qθ(z|xi )[

log pφ(xi |z)]− KL

(qθ(z |xi )||p(z)

)

24 / 30

Page 81: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I How do we relate λ to φ and θ seen earlier?

I We can parameterize approximate posterior qθ(z |x , λ) by anetwork that takes data x and outputs parameters λ

I Parameterize the likelihood p(x |z) with a network that takeslatent variables and outputs parameters to the datadistribution pφ(x |z)

I So we can re-write

ELBOi (θ, φ) = Ez∼qθ(z|xi )[

log pφ(xi |z)]− KL

(qθ(z |xi )||p(z)

)

24 / 30

Page 82: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Perspective

I How do we relate λ to φ and θ seen earlier?

I We can parameterize approximate posterior qθ(z |x , λ) by anetwork that takes data x and outputs parameters λ

I Parameterize the likelihood p(x |z) with a network that takeslatent variables and outputs parameters to the datadistribution pφ(x |z)

I So we can re-write

ELBOi (θ, φ) = Ez∼qθ(z|xi )[

log pφ(xi |z)]− KL

(qθ(z |xi )||p(z)

)

24 / 30

Page 83: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Objective

I Recall the Deep Learning objective derived earlier. We wantto minimize:

L(θ, φ) =N∑i=1

(− Ez∼qθ(z|xi )

[log pφ(xi |z)

]+ KL

(qθ(z |xi )||p(z)

))

I The objective just derived for the Probabilistic Model was tomaximize:

ELBO(θ, φ) =N∑i=1

(Ez∼qθ(z|xi )

[log pφ(xi |z)

]− KL

(qθ(z |xi )||p(z)

))I They are equivalent!

25 / 30

Page 84: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Probabilistic Model Objective

I Recall the Deep Learning objective derived earlier. We wantto minimize:

L(θ, φ) =N∑i=1

(− Ez∼qθ(z|xi )

[log pφ(xi |z)

]+ KL

(qθ(z |xi )||p(z)

))

I The objective just derived for the Probabilistic Model was tomaximize:

ELBO(θ, φ) =N∑i=1

(Ez∼qθ(z|xi )

[log pφ(xi |z)

]− KL

(qθ(z |xi )||p(z)

))I They are equivalent!

25 / 30

Page 85: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Applications - Image generation

A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. arXivpreprint arXiv :1602.02644, 2016.

26 / 30

Page 86: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Applications - Caption generation

Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning ofimages, labels and captions. In NIPS, 2016.

27 / 30

Page 87: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Applications - Semi-/Un-supervised document classification

Z. Yang, Z. Hu, R. Salakhutdinov, and T. Berg-Kirkpatrick. Improved variational autoencoders for text modelingusing dilated convolutions. In Proceedings of The 34rd International Conference on Machine Learning, 2017.

28 / 30

Page 88: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Applications - Pixel art videogame characters

https://mlexplained.wordpress.com/category/generative-models/vae/.

29 / 30

Page 89: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30

Page 90: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30

Page 91: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30

Page 92: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30

Page 93: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30

Page 94: Variational Autoencoders - An IntroductionIntroduction I Auto-Encoding Variational Bayes, Diederik P. Kingma and Max Welling, ICLR 2014 I Generative model I Running example: Want to

Conclusion

I We derived the same objective from

I 1) A deep learning point of view, and

I 2) A probabilistic models point of view

I Showed they are equivalent

I Saw some applications

I Thank you. Questions?

30 / 30