Download - CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Transcript
Page 1: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

CS 4803 / 7643: Deep Learning

Dhruv Batra Georgia Tech

Topics: – Variational Auto-Encoders (VAEs)

– Reparameterization trick

Page 2: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Administrativia• HW4 Grades Released

– Regrade requests close: 12/03, 11:55pm– Please check solutions first!

• Grade histogram: 7643– Max possible: 100 (regular credit) + 40 (extra credit)

(C) Dhruv Batra 2

Page 3: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Administrativia• HW4 Grades Released

– Regrade requests close: 12/03, 11:55pm– Please check solutions first!

• Grade histogram: 4803– Max possible: 100 (regular credit) + 40 (extra credit)

(C) Dhruv Batra 3

Page 4: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Recap from last time

(C) Dhruv Batra 4

Page 5: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Autoencoders (VAE)

Page 6: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

So far...

VAEs define intractable density function with latent z:

PixelCNNs define tractable density function, optimize likelihood of training data:

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 7: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Auto EncodersVAEs are a combination of the following ideas:

1. Auto Encoders

2. Variational Approximation• Variational Lower Bound / ELBO

3. Amortized Inference Neural Networks

4. “Reparameterization” Trick

(C) Dhruv Batra 7

Page 8: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder

Input data

Features

Decoder

Reconstructed input data

Reconstructed data

Input data

Encoder: 4-layer convDecoder: 4-layer upconv

L2 Loss function: Train such that features can be used to reconstruct original data

Doesn’t use labels!

Autoencoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 9: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder

Input data

Features

Decoder

Reconstructed input data

Autoencoders can reconstruct data, and can learn features to initialize a supervised model

Features capture factors of variation in training data. Can we generate new images from an autoencoder?

Autoencoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 10: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Autoencoders

Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

q𝜙 𝑧 𝑥 p𝜃 𝑥 𝑧

Page 11: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Auto EncodersVAEs are a combination of the following ideas:

1. Auto Encoders

2. Variational Approximation• Variational Lower Bound / ELBO

3. Amortized Inference Neural Networks

4. “Reparameterization” Trick

(C) Dhruv Batra 11

Page 12: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Key problem• P(z|x)

(C) Dhruv Batra 12

Page 13: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

What is Variational Inference?• Key idea

– Reality is complex– Can we approximate it with something “simple”?– Just make sure simple thing is “close” to the complex thing.

(C) Dhruv Batra 13

Page 14: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Intuition

(C) Dhruv Batra 14

Page 15: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

• Marginal likelihood – x is observed, z is missing:

The general learning problem with missing data

15(C) Dhruv Batra

ll(� : D) = logNY

i=1

P (xi | �)

=NX

i=1

logP (xi | �)

=NX

i=1

logX

z

P (xi, z | �)

Page 16: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Jensen’s inequality

• Use: log åz P(z) g(z) ≥ åz P(z) log g(z)

16(C) Dhruv Batra

Page 17: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Applying Jensen’s inequality

• Use: log åz P(z) g(z) ≥ åz P(z) log g(z)

17(C) Dhruv Batra

Page 18: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Evidence Lower Bound

• Define potential function F(q,Q):

18(C) Dhruv Batra

ll(� : D) � F (�, Qi) =NX

i=1

X

z

Qi(z) logP (xi, z | �)

Qi(z)

Page 19: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

ELBO: Factorization #1 (GMMs)

19(C) Dhruv Batra

ll(� : D) � F (�, Qi) =NX

i=1

X

z

Qi(z) logP (xi, z | �)

Qi(z)

Page 20: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

ELBO: Factorization #2 (VAEs)

20(C) Dhruv Batra

ll(� : D) � F (�, Qi) =NX

i=1

X

z

Qi(z) logP (xi, z | �)

Qi(z)

Page 21: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Auto EncodersVAEs are a combination of the following ideas:

1. Auto Encoders

2. Variational Approximation• Variational Lower Bound / ELBO

3. Amortized Inference Neural Networks

4. “Reparameterization” Trick

(C) Dhruv Batra 21

Page 22: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Amortized Inference Neural Networks

(C) Dhruv Batra 22

Page 23: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

VAEs

(C) Dhruv Batra 23Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

Page 24: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Autoencoders

Probabilistic spin on autoencoders - will let us sample from the model to generate data!

Image Credit: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

q𝜙 𝑧 𝑥 p𝜃 𝑥 𝑧

Page 25: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Input Data

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 26: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 27: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Sample z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 28: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Decoder network

Sample z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 29: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Decoder network

Sample z from

Sample x|z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Maximize likelihood of original input being reconstructed

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 30: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Decoder network

Sample z from

Sample x|z from

Use decoder network. Now sample z from prior! Data manifold for 2-d z

Vary z1

Vary z2

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Variational Auto Encoders: Generating Data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 31: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Vary z1

Vary z2

Degree of smile

Head pose

Diagonal prior on z=> independent latent variables

Different dimensions of z encode interpretable factors of variation

Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014

Variational Auto Encoders: Generating Data

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 32: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Plan for Today• VAEs

– Reparameterization trick

(C) Dhruv Batra 32

Page 33: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Auto EncodersVAEs are a combination of the following ideas:

1. Auto Encoders

2. Variational Approximation• Variational Lower Bound / ELBO

3. Amortized Inference Neural Networks

4. “Reparameterization” Trick

(C) Dhruv Batra 33

Page 34: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Decoder network

Sample z from

Sample x|z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Maximize likelihood of original input being reconstructed

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 35: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 36: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Basic Problem

(C) Dhruv Batra 36

min✓

Ez⇠p✓(z)[f(z)]

Page 37: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Basic Problem

• Goal

(C) Dhruv Batra 37

min✓

Ez⇠p✓(z)[f(z)]

Page 38: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Basic Problem

• Goal

• Need to compute:

(C) Dhruv Batra 38

min✓

Ez⇠p✓(z)[f(z)]

r✓ Ez⇠p✓(z)[f(z)]

Page 39: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Basic Problem

• Need to compute:

(C) Dhruv Batra 39

r✓ Ez⇠p✓(z)[f(z)]

Page 40: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Example

(C) Dhruv Batra 40

Page 41: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Does this happen in supervised learning?

• Goal

(C) Dhruv Batra 41

min✓

Ez⇠p✓(z)[f(z)]

Page 42: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

But what about other kinds of learning?

• Goal

(C) Dhruv Batra 42

min✓

Ez⇠p✓(z)[f(z)]

Page 43: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Two Options• Score Function based Gradient Estimator

aka REINFORCE (and variants)

• Path Derivative Gradient Estimator aka “reparameterization trick”

(C) Dhruv Batra 43

Page 44: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Option 1• Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 44

Page 45: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

45

= r✓

Z⇡✓(⌧)R(⌧)d⌧

<latexit sha1_base64="4gyGx/NjBUPBtNX1H/STwXdkv6o=">AAAD/nicjVNLixNBEJ6d8bGOj82qNy+NIZBoCDOroJfAogjiYVllX5CeDD2dTtJsz4PuGtkwDvhXvHhQxKu/w5v/xu7MxM1DxIIZqr/6quqrojvKBFfgeb+2bOfK1WvXt2+4N2/dvrPT2L17otJcUnZMU5HKs4goJnjCjoGDYGeZZCSOBDuNzl+a+Ol7JhVPkyOYZSyIySThY04JaCjcte+3UB/hmMA0iopXZViQELDiMc54G9NRCh9UCJ0uUmEBj/3ShFB2GelqeqfEgo1hgFUea1bfK4dHVUlKRPGubC94WPLJFAKEsdtCb9oYpgxIZ60/BpJXXcKK0DbInx5LZef4oqauiEmWyfQC4bEktPDL4qCsJPG+Xw4PFvL8TXkF70LZRaRyOm6r6jx8ZLQROdHsC6PMgOWl2NWxTd1/TG0EZryeyRwSEglSn1e3sRL5n91sriVwNwvxBJYk1OtbTxyZf9hoej1vbmjT8WunadV2GDZ+4lFK85glQAVRauB7GQQFkcCpYKWLc8UyQs/JhA20m5CYqaCYX98StTQyQuNU6k9LnKPLGQWJlZrFkWYatWo9ZsC/xQY5jJ8HBU+yHFhCq0bjXCBIkXkLaMQloyBm2iFUcq0V0SnRlwf0i3H1Evz1kTedk72e/6S39/Zpc/9FvY5t64H10GpbvvXM2rdeW4fWsUXtwv5kf7G/Oh+dz84353tFtbfqnHvWijk/fgNGClAV</latexit>

Recall: Policy Gradients

r✓J(✓) = r✓E⌧⇠p✓(⌧)[R(⌧)]<latexit sha1_base64="Ak6iFp2ndaPkRSt0gfcdrY+Q7cQ=">AAAEBXicjVNNb9NAEN3afBTzlcKxlxVRJAeiyi5IcIlUgZAQh6qgpq2UTaz1ZpOs6i/tjlEjsxcu/BUuHECIK/+BG/+Gdey0TYJQR7I0fvP2zZvRbphFQoHn/dmw7GvXb9zcvOXcvnP33v3G1oMjleaS8R5Lo1SehFTxSCS8BwIifpJJTuMw4sfh6auyfvyBSyXS5BBmGR/EdJKIsWAUDBRsWdst3MUkpjANw+K1DgoaAFEiJplwCRul8FEF0O5gFRTwxNdlCWcXlY6htzWJ+Bj6ROWxYXU9PTysJBmNivfaXfCIFJMpDDAhTgu/dQlMOdD2Sn8CNK+6BBXBLZHzHpdk5/hC0ygSmmUyPcNkLCkrfF3s68qS6Pp6uL+w56/bK0QHdAfTKmk7rarz8HHpjcqJYZ+VzkpQX5hdHrvU/c/UpcFM1DM5JKFhROu/5V0sVa6ymfWllM2uuNVzjrum0g4aTW/HmwdeT/w6aaI6DoLGbzJKWR7zBFhEler7XgaDgkoQLOLaIbniGWWndML7Jk1ozNWgmN9ijVsGGeFxKs2XAJ6jl08UNFZqFoeGWTpVq7US/Fetn8P4xaAQSZYDT1jVaJxHGFJcPgk8EpIziGYmoUwK4xWzKTV3CMzDccwS/NWR15Oj3R3/6c7uu2fNvZf1OjbRNnqEXOSj52gPvUEHqIeY9cn6Yn2zvtuf7a/2D/tnRbU26jMP0VLYv/4CxxJTew==</latexit>

=

Zr✓⇡✓(⌧)R(⌧)d⌧

<latexit sha1_base64="NIR6vyQDDXONwIZJVMqpd8Xs+Xk=">AAAD/nicjVNdixMxFJ2d8WMdv7rqmy/BUmi1lJlV0JfCogjiw7LKdnehmQ6ZNG3DzheTO7IlBvwrvvigiK/+Dt/8Nyad6e62FTEww825J+eee0miPOYCPO/3lu1cuXrt+vYN9+at23fuNnbuHYmsLCgb0CzOipOICBbzlA2AQ8xO8oKRJIrZcXT6yuSPP7BC8Cw9hHnOgoRMUz7hlICGwh37QQv1EU4IzKJIvlahJCFgwROc8zam4ww+ihA6XSRCCU98ZVIov8h0Nb2jcMwmMMSiTDSr76nRYSVJSSzfq/aShws+nUGAMHZb6G0bw4wB6azVx0DKqkpYEdoGOa9xSXaBLzW1IiZ5XmRnCE8KQqWv5L6qLPG+r0b7S3v+pj3Ju6C6iFRBx21VlUePjTdSTDX7zDgzoLowu9q20f1H18ZgzuuezCYlUUzq/eo0VjL/M5vNsQSuEeIprKudW6jHt35wbP5ho+n1vMVCm4FfB02rXgdh4xceZ7RMWAo0JkIMfS+HQJICOI2ZcnEpWE7oKZmyoQ5TkjARyMX1VailkTGaZIX+tOEFevmEJIkQ8yTSTONWrOcM+LfcsITJi0DyNC+BpbQqNCljBBkybwGNecEoxHMdEFpw7RXRGdGXB/SLcfUQ/PWWN4Oj3Z7/tLf77llz72U9jm3rofXIalu+9dzas95YB9bAora0P9tf7W/OJ+eL8935UVHtrfrMfWtlOT//AER1UBU=</latexit>

=

Zr✓⇡✓(⌧) ·

⇡✓(⌧)

⇡✓(⌧)· R(⌧)d⌧

<latexit sha1_base64="6Cf4ltUuB2RixiA9bfabPClb2TA=">AAAEMXicjVPLbtNAFHVtHsU8msKSzYgoUgJRZRck2ESqQAjEoiqoaStlHGs8GSej+iXPNWpk5pfY8CeITRcgxJafYCZ22sZBiJFs3Tn3zL3nHnuCLOICHOd8w7SuXb9xc/OWffvO3Xtbre37RyItcsqGNI3S/CQggkU8YUPgELGTLGckDiJ2HJy+0vnjjywXPE0OYZ4xLybThIecElCQv22+6aABwjGBWRCUr6VfEh+w4DHOeBfTSQqfhA+9PhJ+CU9cqVMou8z0Fb0nccRCGGFRxIo1cOT4sCpJSVR+kN0lD+d8OgMPYWx30LsuhhkD0mv0x0CKqotfEboauehxpewCX9ZUFTHJsjw9QzjMCS1dWe7LShIfuHK8v5TnrssreR9kH5Eq6NmdqvP4sdZG8qlin2llGpSXYlfH1nX/MbUWmPF6Jr1JSBCRer/qxkrmf7xZt8WzdSGeQLPahYTaPv0hK8OaKbmOVOxmr4l++622s+MsFloP3DpoG/U68Ftf8SSlRcwSoBERYuQ6GXglyYHTiEkbF4JlhJ6SKRupMCExE165+OMl6ihkgsI0V4+acYFePVGSWIh5HCimViuaOQ3+LTcqIHzhlTzJCmAJrRqFRYQgRfr6oAnPGYVorgJCc660Ijojyj5Ql8xWJrjNkdeDo90d9+nO7vtn7b2XtR2bxkPjkdE1XOO5sWe8NQ6MoUHNz+Y387v5w/pinVs/rV8V1dyozzwwVpb1+w8M52YB</latexit>

=

Z⇡✓(⌧)r✓ log ⇡✓(⌧)R(⌧)d⌧

<latexit sha1_base64="81H4Szz9KP1lTiTaRTFADwQ1hG8=">AAAEgnicjVNdb9MwFM3aACN8dfDIi0VVqR1Vl2xI8EClCYSEeJgGWrdJdRc5rtNacz4U36BVwf+D38UbvwbsJt3WdEJYSnR97vE9517LQSq4BNf9vdVo2vfuP9h+6Dx6/OTps9bO81OZ5BllI5qIJDsPiGSCx2wEHAQ7TzNGokCws+Dyo8mffWeZ5El8AouUTSIyi3nIKQEN+TuNnx00RDgiMA+C4pPyC+IDljzCKe9iOk3gh/Sh10fSL+C1p0wKpTeZvqb3FBYshDGWeaRZQ1ddnJQlKRHFN9Vd8XDGZ3OYIIydDvrSxTBnQHo1fQwkL1X8ktA1yLXGrbJLfFVTV8QkTbPkCuEwI7TwVHGkSkt86KmLo5U9b9Newfug+oiUQc/plMoXu8YbyWaafWWcGVDdmF1v29T9R9fGYMqrnswmJoEg1X59GmuZ/5nN5li0nC7EY6hXu7ZQjc9cZDmwekptIiW7rjU1f2clVztT1xfJbMPDnQX9VtsduMuFNgOvCtpWtY791i88TWgesRioIFKOPTeFSUEy4FQw5eBcspTQSzJjYx3GJGJyUiyfkEIdjUxRmGT6010s0dsnChJJuYgCzTRuZT1nwLty4xzCd5OCx2kOLKalUJgLBAky7xFNecYoiIUOCM249oronOj7AP1qHT0Er97yZnC6P/AOBvtf37QPP1Tj2LZeWq+sruVZb61D67N1bI0s2vjT7DQHzT3btndtzz4oqY2t6swLa23Z7/8CAmB+6g==</latexit>

= E⌧⇠p✓(⌧)[r✓ log ⇡✓(⌧)R(⌧)]<latexit sha1_base64="68w1wxgh2qI5Wm1k19LrzwOBZpQ=">AAAE33icjVTPb9MwFM7WAqP86uDIxaKq1EJVJQMJLpUmEBLiMA20bpPqNnJcp7Xm/FD8glYFX7hwACGu/Fvc+EO4YzfptiYTqqVEz997/r7PL3a8WHAJtv1na7tWv3Hz1s7txp279+4/aO4+PJZRmlA2pJGIklOPSCZ4yIbAQbDTOGEk8AQ78c7emPzJJ5ZIHoVHsIjZOCCzkPucEtCQu7v9t40GCAcE5p6XvVVuRlzAkgc45h1MpxF8li50e0i6GTxzlEmh+DLT0+VdhQXzYYRlGuiqga0mRzklJSL7qDqrOpzw2RzGCONGG73vYJgzIN2SPgaS5ipuXtAxyIXGFdolvuLUjJjEcRKdI+wnhGaOyg5UbokPHDU5WNlzqvYy3gPVQyQPuo12rjx5aryRZKarz40zA6pLs+vbNrz/2bUxGPNiT2YSEk+QYr7ejbXMJr2ptkXLaSIeQpntwkLRPvMh84aVU6qK5NVlral5N5bnaClYWlV2IKJZxcX1lJsdjNEa/0b0Y7fZsvv2cqBq4BRByyrGodv8jacRTQMWAhVEypFjxzDOSAKcCqYaOJUsJvSMzNhIhyEJmBxny/upUFsjU+RHiX50g5bo1RUZCaRcBJ6uNE5lOWfA63KjFPxX44yHcQospLmQnwoEETKXHU15wiiIhQ4ITbj2iuic6I8N+pfQ0E1wyluuBsd7fed5f+/Di9b+66IdO9Zj64nVsRzrpbVvvbMOraFFa7j2pfat9r1O6l/rP+o/89LtrWLNI2tt1H/9A7FZpC8=</latexit>

Expand expectation

Exchange integration and expectation

r✓ log ⇡(⌧) =r✓⇡(⌧)

⇡(⌧)<latexit sha1_base64="PFBv8gP4ATPO1SJnvIETFRlOHmU=">AAAFOnicjVTLbtNAFHWbAMW8WliyGVFFSiCq4oIEm0gVCAmxqALqS8qk1ngySUYdPzRzjVKZ+S42fAU7FmxYgBBbPoAZ220Tu6o6kq3rc+/cc+7x2EEiuIJe7/vKaqN54+attdvunbv37j9Y33h4oOJUUrZPYxHLo4AoJnjE9oGDYEeJZCQMBDsMTt7Y/OEnJhWPoz04TdgoJNOITzglYCB/ozFooT7CIYFZEGRvtZ8RH7DiIU54G9NxDJ+VD50uUn4GzzxtUyi5yHRNeUdjwSYwxCoNTVW/p4/3ipaUiOyjbp/VYcmnMxghjN0Wet/GMGNAOhV+DCQtWPyioG2Rc46Ftjl+1tN0xCRJZDxHeCIJzTyd7epCEu97+nj3TJ5Xl5fxLuguIkXQcVsF8/FTq43IqameW2UW1Bdil8e2fa+Y2gpMeDmTfYhIIEj5vOzGUuY63tRtMXSmEY+g2u1cQmmffZGFYdWUriNFdZVrbO9ufo5ywsquqgIRT2sqrmh5rfEXGa5FMHIxsDnkH1Am2dgMe5nKcoJ+eaZqXi44VYbaX9/sbfXyheqBVwabTrkG/vo3PI5pGrIIqCBKDb1eAqOMSOBUMO3iVLGE0BMyZUMTRiRkapTlyjVqGWSMJrE0l/E+Rxd3ZCRU6jQMTKW1QFVzFrwsN0xh8mqU8ShJgUW0IJqkAkGM7H8EjblkFMSpCQiV3GhFdEaMSWD+Nq4xwauOXA8Otre851vbH15s7rwu7VhzHjtPnLbjOS+dHeedM3D2Hdr40vjR+NX43fza/Nn80/xblK6ulHseOUur+e8/uA7IlQ==</latexit>

Page 46: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Example

(C) Dhruv Batra 46

Page 47: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Mental Break!• VAE Demo

– https://www.siarez.com/projects/variational-autoencoder

(C) Dhruv Batra 47

Page 48: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Two Options• Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 48

• Path Derivative Gradient Estimator aka “reparameterization trick”

Page 49: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Option 2

(C) Dhruv Batra 49

• Path Derivative Gradient Estimator aka “reparameterization trick”

Page 50: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Option 2

(C) Dhruv Batra 50

• Path Derivative Gradient Estimator aka “reparameterization trick”

Page 51: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Reparameterization Intuition

(C) Dhruv Batra 51

z = µ+ �2✏i

✏i ⇠ p(✏)

�2

Figure Credit: http://blog.shakirm.com/2015/10/machine-learning-trick-of-the-day-4-reparameterisation-tricks/

Page 52: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Reparameterization Intuition

(C) Dhruv Batra 52Image Credit: https://www.kaggle.com/rvislaywade/visualizing-mnist-using-a-variational-autoencoder

Page 53: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Example

(C) Dhruv Batra 53

Page 54: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Two Options• Score Function based Gradient Estimator

aka REINFORCE (and variants)

(C) Dhruv Batra 54

• Path Derivative Gradient Estimator aka “reparameterization trick”

Page 55: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Example

(C) Dhruv Batra 55Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/

Page 56: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Example

(C) Dhruv Batra 56Figure Credit: http://gokererdogan.github.io/2016/07/01/reparameterization-trick/

Page 57: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Variational Auto EncodersVAEs are a combination of the following ideas:

1. Auto Encoders

2. Variational Approximation• Variational Lower Bound / ELBO

3. Amortized Inference Neural Networks

4. “Reparameterization” Trick

(C) Dhruv Batra 57

Page 58: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Input Data

Putting it all together: maximizing the likelihood lower bound

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 59: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 60: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Sample z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Page 61: CS 4803 / 7643: Deep Learning · Kingma and Welling, “Auto-Encoding Variational Bayes”, ICLR 2014 VariationalAuto Encoders: Generating Data Slide Credit: Fei-FeiLi, Justin Johnson,

Encoder network

Decoder network

Sample z from

Input Data

Putting it all together: maximizing the likelihood lower bound

Make approximate posterior distribution close to prior

Variational Auto Encoders

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n