CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a...
Transcript of CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a...
![Page 1: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/1.jpg)
CSC412: Variational Autoencoders
David Duvenaud
![Page 2: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/2.jpg)
Class Projects
• Focus is on research skills, providing context for results, evidence for claims
• Excuse to start larger project
![Page 3: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/3.jpg)
Variational Inference• Directly optimize the parameters phi of an
approximate distribution q(z|x, phi) to match p(z|x, theta)
• What if there is a local latent variable per-datapoint, and some global parameters? e.g. Bayesian PCA, generative image models, topic models
• Directly optimize the parameters phi_i of each approximate distribution q(z_i|x_i, phi_i) to match p(z_i|x_i, theta)
![Page 4: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/4.jpg)
SVI Algorithm:
• 1. Sample z from q(z|x, phi)
• 2. Return log p(z, x | theta) - log q(z | x, phi)
• In this setting, only one set of latent params z, as in a Bayesian neural net
![Page 5: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/5.jpg)
SVI w/ per-datapoint latents:
1. Sample x_i from dataset
2. Optimize phi_i to minimize KL(q(z_i | x_i, phi_i)|| p(z_i | x_i, theta))
3. Sample z from q(z_i|x_i, phi_i)
4. Return log p(z_i, x_i | theta) - log q(z_i | x_i, phi_i)
![Page 6: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/6.jpg)
Variational Autoencoder:
1. Sample x_i from dataset
2. Compute phi_i as a function of x, and recognition params phi_r: phi_i = f(x, phi_r)
3. Sample z from q(z_i|x_i, phi_i)
4. Return log p(z_i, x_i | theta) - log q(z_i | x_i, phi_i)
![Page 7: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/7.jpg)
Consequences of using a recognition network
• Don’t need to re-optimize phi_i each time theta changes - much faster
• Recognition net won’t necessary give optimal phi_i
• Can have fast test-time inference (vision)
• Can train recognition net with gradient descent
• Could also differentiate through optimization
![Page 8: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/8.jpg)
Simple but not obvious
• It took a long time get here!
• Independently developed as denoising autoencoders (Bengio et al.) and amortized inference (many others)
• Helmholtz machine - same idea in 1995 but used discrete latent variables
![Page 9: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/9.jpg)
![Page 10: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/10.jpg)
Autoencoder Motivation
• Want compact representation of data
• x = dec(enc(x))
• Need to prevent enc = dec = identity
• So, add noise to encoding
• Gives VAE bound but with a free parameter
![Page 11: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/11.jpg)
Benefits of compact latent code
• http://www.dpkingma.com/sgvb_mnist_demo/demo.html
• Nearby z’s give similar x
• Recent work on ‘disentangling’ latent rep
![Page 12: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/12.jpg)
Code examples
• Show VAE code
![Page 13: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/13.jpg)
Variations: Decoder
• Orginally, p(x|z) = N(x | dec(z, theta), sigma I)
• Final step has independence assumption, causes blurry samples
• p(x|z) can be anything: rnn, pixelRNN, real NVP, de_convolutional net
![Page 14: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/14.jpg)
![Page 15: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/15.jpg)
Variations• Decoder often looks like inverse of encoder
• Encoders can come from supervised learning
Learning Deconvolution Network for Semantic Segmentation http://arxiv.org/abs/1505.04366.
![Page 16: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/16.jpg)
Text autoencoders
• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio
![Page 17: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/17.jpg)
TextVAE-Interpola*on
![Page 18: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/18.jpg)
What is a molecule?Graph SMILES string
![Page 19: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/19.jpg)
Repurposing text autoencoders
c1ccccc1 c1ccccc1
ENCODERNeural Network
DECODERNeural Network
CONTINUOUS MOLECULARREPRESENTATION
Latent Space
Discrete Structure SMILES
Discrete Structure SMILES
Can be trained on unlabeled data
![Page 20: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/20.jpg)
Map of 220,000 Drugs
![Page 21: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/21.jpg)
Map of 100,000 OLEDs
![Page 22: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/22.jpg)
Random Organic LEDs
F S S F O OO S
N N N
HS S S S SH
HO SHH2N NH2
O
FHNH2N
NH3
F F
HO F
N NH2
OHSHSO
F S FH2S
OHHO OH
FHS
NH2
OH
O
N SH
O
NH2H2N FHF
SHS
S
F
H2O
SHSHS F
CH4
H2N OHN
F F
HO S S S S S S S S S S S SH
N
S N
NO
O
N
N
N N
N
O O
OO
S O
N
N N
NN N
N
NN
N
NN N
NN
NS
N
N N
N
N
N N
N
N
N
O
O
NN
S N
NNN
N
N NN
N
O
NN N
N
N
NNO
NN
NO
NN
N
NNN
N
Standard autoencoderVariational autoencoder
![Page 23: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/23.jpg)
Molecules near
O
NHN+
O
-O
Cl
Cl
NH
NH
N+
O
O-Cl
Cl
Br
N
N+
O
-OCl BrCl
O
NH
N+
O
O-Cl
Cl
SHCl
ON+
O
-O
NH
Cl
Cl
O
NH
N
N+
O
-OCl
F
O
ON+
O
-OCl
Cl
HN
O
NH
N+
O
O-Br
Cl O
ON+
O
O-Cl
Br O
NH
NH
N+
O
O-Cl
Cl
O
O
HN
N+
O
-O
HN Cl
Br
O
NH
N+
O
O-HO
Cl
Cl
ON+
O
O-Cl
Cl Br
ON+
O
O-Br
Cl
Cl
HN
N+
O
O-
Cl
Cl
O
O
NH
N
N+
O
O-Cl
Cl
Br
O
NH
N+
O
O-F
NH
Cl
O
O
HN
N+
O
O-HO
F
F
NHN+
O
-OF
NH
Cl
BrO
NH
N+
O
O-Br
Cl
Cl
O
O
N+
O
O-Cl
Cl Br
O
NH
N+
O
O-Cl
Br
O
ON+
O
O-Cl
Cl O
O
NH
N+
O
O-Cl
Cl
O
ON+
O
O-Cl
F
Cl
O
NH
N+
O
O-
Cl
Cl
Cl
O
NH
N+
O
O-Cl
Cl O
ON+
O
O-Cl
Cl
O
O
NH
N+
O
O-Cl
Cl
Br
O
NH
N
N+
O
O-Cl
Cl
Cl
O
NN+
O
O-
Br
Cl
Cl
O
NH
N+
O
O-Cl
O Cl
ON+
O
O-Cl
Cl
Cl
O
NH
N+
O
O-Cl
Cl Cl
O NH
N+O O-
Cl
N
O NH
N+O O-
Br
O
NH
N+
O
O-Br
Cl Cl
O
NH
N+
O
O-
NH
Cl
Br
O
N+
O
-O
N NH
Cl
O
O
NH
N+
O
O-Cl
NCl
Cl
O
O
N+
O
O-
NH
Cl Cl
O
NH
N+
O
O-Cl
Cl Br
HN
N+
O
O-
Cl
Cl
Br
O
NH
N+
O
O-Br
Cl
Br
O
NH
N+
O
O-
Cl
Cl
O
NH
N+
O
O-Cl
OCl
ON+
O
O-Cl
Cl
Br
O
NH
N
N+
O
O-Cl
Cl
Cl
O
NN+
O
O-
Cl
Cl Cl
O
NH
N+
O
O-Cl
Cl
OH
![Page 24: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/24.jpg)
NH2 O
N ONH
O
O
HN
HN O
O
O
O
N
O
O
O
HN
O
O
O
NH NO
O
O
ONH2
NHO
HN
HN
O
O
HN N
NHO
O
O
O
NH
O
O
O
O
NO
O
N
HN O
O
O O
NH
HNN
O
O
O
N
N
OO
O
NH2
N
HNO
O
OO
N
O
NHO
S
O
N
O
O
OO
OH
NH
O
OO
O
H2NS
HNO
O
O
OHS
NH
O
O
O
HO
SHN
HN O
O
O
O
NH2
NH
N
O
OH
OO
N
HN
O
O
O
F
N
N
O
O
O N
HN
NHOO
O
N
OO
HN
N
O
N
HNN
HNO
O
N
NO
OH
OO
NH
OHN
O
OO
H2N NN
HNH2NO
O
O
NH2
N
NHO
HN
Cl
O
O
N NH
OO
NHS
OH
F
N N
O
O
HN
HN
NHO
O
O
O
NN
NO
BrO
ONH
NN
OO
NH
O
OO
HO
HN NOO
O
N
NHO
O
N
Cl
Cl
O
O
ONH
HN
O
OO
O
HN
NHO
O
O
O
O
N
HO
HN
N O
O
HN
HN
NH
O
O
NH2
O
NH
N
HN
O
O
O
O
N
OHN
O
O
F
O
N
O
OO
NN
NH2
OO
NH2
NH
N
Cl
O
O
F
N
O
O
O
O
Molecules near
![Page 25: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/25.jpg)
![Page 26: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/26.jpg)
![Page 27: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/27.jpg)
![Page 28: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/28.jpg)
![Page 29: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/29.jpg)
No chemistry-specific design!
![Page 30: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/30.jpg)
Gradient-based optimization
![Page 31: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/31.jpg)
Gradient-based optimization
• Can’t necessarily start from given molecule, need to encode/decode
• Can’t go too far from start, wander into ‘holes’ or empty regions
![Page 32: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/32.jpg)
Encoder can look at decoder
• https://www.youtube.com/watch?v=Zt-7MI9eKEo
![Page 33: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/33.jpg)
Recent Extensions• Importance-Weighted Autoencoders (IWAE) Burda,
Grosse, Salakhudtinov
• Mixture distributions in posterior
• GAN-style ideas to avoid evaluating q(z|x), p(x|z), even p(z)
• Normalizing flows: Produce arbitrarily-complicated q(z|x)
• Incorporate HMC or local optimization to define q(z|x)
![Page 34: CSC412: Variational Autoencodersduvenaud/courses/csc412/...• Generating Sentences from a Continuous Space. Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz,](https://reader033.fdocuments.in/reader033/viewer/2022050204/5f57d0348a7a022f942324a0/html5/thumbnails/34.jpg)
Generative Adversarial Networks
• Also a latent-variable model: x = f(z), z from N(0,I)
• Trained adversarially, can also optimize p(x)
• Recent work on adversarially-trained VAEs:
• Match p(z, x) to q(z, x)
• Sample z from p(z), x from p(x|z)
• Sample x from data, z from q(z|x)