Generative Models - Hasso-Plattner-Institut · 2018-06-19 · Deep Learning for Computer Vision...
Transcript of Generative Models - Hasso-Plattner-Institut · 2018-06-19 · Deep Learning for Computer Vision...
Deep Learning for Computer Vision
Generative Models
Mina Rezaei, Goncalo Mordido
Deep Learning for Computer Vision Slide #2
Content
1. Why unsupervised learning, and why generative models? (Selected
slides from Stanford University-SS2017 Generative Model)
2. What is a variational autoencoder? (Jaan Altosaar’s blog & OpenAI
blog & Victoria University-Generative Model )
Generative Models
Deep Learning for Computer Vision Slide #3
Supervised Learning
Supervised Learning
Data: (x, y) where x is data, y is label
Goal: Learn a function to map x→ y
Examples: Classification, Object Detection,
Semantic segmentation, Image captioning
Generative Models
Deep Learning for Computer Vision Slide #4
Supervised Learning
Supervised Learning
Data: (x, y) where x is data, y is label
Goal: Learn a function to map x→ y
Examples: Classification, Object Detection,
Semantic segmentation, Image captioning
Generative Models
0.85
Deep Learning for Computer Vision Slide #5
Supervised Learning
Supervised Learning
Data: (x, y) where x is data, y is label
Goal: Learn a function to map x→ y
Examples: Classification, Object Detection,
Semantic segmentation, Image captioning
Generative Models
Deep Learning for Computer Vision Slide #6
Unsupervised Learning
Unspervised Learning
Data: x, NO labels!!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, Dimensionality reduction,
Feature learning, Density estimation
Generative Models
K-means clustering
Deep Learning for Computer Vision Slide #7
Unsupervised Learning
Unspervised Learning
Data: x, NO labels!!
Goal:Learn some underlying hidden structure of
the data
Examples: Clustering, Dimensionality reduction,
Feature learning, Density estimation
Generative Models
Principal Component Analysis
Deep Learning for Computer Vision Slide #8
Unsupervised Learning
Unspervised Learning
Data: x, NO labels!!
Goal:Learn some underlying hidden structure of the data
Examples: Clustering, Dimensionality reduction,
Feature learning, Density estimation
Generative Models
Density estimation
Deep Learning for Computer Vision Slide #9
Supervised vs Unsupervised Learning
Unsupervised Learning
Data: xJust data, no labels!
Goal: Learn some underlying hidden structure of the data
Examples: Clustering, dimensionality reduction, feature learning, density estimation, etc.
Supervised Learning
Data: (x, y)x is data, y is label
Goal: Learn a function to map x -> y
Examples: Classification, Object detection , Semantic segmentation, Image captioning, etc.
Solve unsupervised learning =>
understand structure of visual world
Training data is cheap
Generative Models
Deep Learning for Computer Vision Lecture 13 -
Generative Models
Given training data, generate new samples from same distribution
Generative Models
Training data ~ pdata(x) Generated samples ~ pmodel(x)
Want to: learn pmodel(x) similar to pdata(x)
Addresses density estimation which is a core problem in unsupervised learning
Slide #10
Deep Learning for Computer Vision
Generative Models
Given training data, generate new samples from same distribution
Generative Models
Training data ~ pdata(x) Generated samples ~ pmodel(x)
Want to: learn pmodel(x) similar to pdata(x)
Addresses density estimation which is a core problem in unsupervised learning
Slide #11
• Explicit density estimation: explicitly define and solve for pmodel(x)
• Implicit density estimation: learn model that can sample from pmodel(x) without explicitly defining it
Deep Learning for Computer Vision
Why Generative Model?
Generative Models Slide #12
Deep Learning for Computer Vision
Why Generative Model?
• Increasing dataset, realistic samples for artwork, super-resolution, colorization, etc.
• Generative models of time-series data can be used for simulation and planning.
Generative Models Slide #13
Deep Learning for Computer Vision Slide #14
Taxonomy of Generative Models
Generative ModelsFigure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets • PixelRNN• PixelCNN
✓ Change of variables models
Variational Autoencoder
Generative Model
Explicit Density Implicit Density
DirectTractable Density Approximate Density Markov chain
Variational Markov chain
Boltzmann Machine
GAN GSN
Deep Learning for Computer Vision
Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product of 1-d distributions:
Generative Models Slide #15
Likelihood of
image x
Probability of i’th pixel value
given all previous pixels
Then maximize likelihood of training data
Deep Learning for Computer Vision
Fully visible belief network
Explicit density model
Use chain rule to decompose likelihood of an image x into product of 1-d distributions:
Generative Models Slide #16
Likelihood of
image x
Probability of i’th pixel value
given all previous pixels
Then maximize likelihood of training data
Will need to define ordering of “previous pixels”
Complex distribution over pixel values => Express using a neural network!
Deep Learning for Computer Vision
PixelRNN [van der oord et al.2016]
Dependency on previous pixels modeled using an RNN (LSTM)
Generate image pixels starting from corner
Generative Models Slide #17
Deep Learning for Computer Vision
PixelRNN [van der oord et al.2016]
Dependency on previous pixels modeled using an RNN (LSTM)
Generate image pixels starting from corner
Generative Models Slide #18
Deep Learning for Computer Vision
PixelRNN [van der oord et al.2016]
Dependency on previous pixels modeled using an RNN (LSTM)
Generate image pixels starting from corner
Generative Models Slide #19
Deep Learning for Computer Vision
PixelRNN [van der oord et al.2016]
Dependency on previous pixels modeled using an RNN (LSTM)
Generate image pixels starting from corner
Generative Models Slide #20
Drawback: sequential generation is slow!
Deep Learning for Computer Vision
PixelCNN
Generative Models Slide #21
Still generate image pixels starting from corner
Dependency on previous pixels now modeled using a CNN over context region
Training: maximize likelihood of training images
Generation must still proceed sequentially=> still slow
Deep Learning for Computer Vision
PixelCNN vs PixelRNN
Generative Models Slide #22
Improving PixelCNN performance
• Gated convolutional layers• Short-cut connections• Discretized logistic loss• Multi-scale• Training tricks• Etc…
See• Van der Oord et al. NIPS 2016• Salimans et al. 2017 :PixelCNN++
Pros:
• Can explicitly compute likelihood p(x)• Explicit likelihood of training data gives
good evaluation metric• Good samples
Con:• Sequential generation => slow
Deep Learning for Computer Vision Slide #23
Taxonomy of Generative Models
Generative ModelsFigure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets • PixelRNN• PixelCNN
✓ Change of variables models
Variational Autoencoder
Generative Model
Explicit Density Implicit Density
DirectTractable Density Approximate Density Markov chain
Variational Markov chain
Boltzmann Machine
GAN GSN
Deep Learning for Computer Vision
Autoencoders
Generative Models Slide #24
Deep Learning for Computer Vision
Autoencoders
Generative Models Slide #25
Encoder DecoderLatent
Deep Learning for Computer Vision
Denoised Autoencoder
Generative Models Slide #26
Deep Learning for Computer Vision
Autoencoder Application
Generative Models Slide #27
Neural Inpainting
Semantic Segmentation
Deep Learning for Computer Vision
Variational Autoencoders (VAE)
Generative Models Slide #28
Reconstruction loss
Stay close to normal(0,1)
Deep Learning for Computer Vision
Variational Autoencoders (VAE)
Generative Models Slide #29
Z=µ+σΘε
Where ε ~ normal(0,1)
Deep Learning for Computer Vision
• Model: Latent-variable model p(x|z, theta) usually specified by a neural network
• Inference: Recognition network for q(z|x, theta) usually specified by a neural network
• Training objective: Simple Monte Carlo for unbiased estimate of Variational lower bound
• Optimization method: Stochastic gradient ascent,
with automatic differentiation for gradients
Generative Models Slide #31
Variational Autoencoders (VAE)
Deep Learning for Computer Vision
Pros
• Flexible generative model
• End-to-end gradient training
• Measurable objective (and lower bound - model is at
• least this good)
• Fast test-time inference
Cons:
• sub-optimal variational factors
• limited approximation to true posterior (will revisit)
• Can have high-variance gradients
Generative Models Slide #32
Variational Autoencoders (VAE)
Deep Learning for Computer Vision Slide #32
Taxonomy of Generative Models
Generative ModelsFigure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017.
Fully Visible Belief Nets • PixelRNN• PixelCNN
✓ Change of variables models
Variational Autoencoder
Generative Model
Explicit Density Implicit Density
DirectTractable Density Approximate Density Markov chain
Variational Markov chain
Boltzmann Machine
GAN GSN
Deep Learning for Computer Vision
Generative Adversarial Networks (Goodfellow et al., 2014)
Generative Models Slide #35
▪ GANs or GAN for short
▪ Active research topic
▪ Have shown great improvements in image generation
https://github.com/hindupuravinash/the-gan-zoo
Radford et al., 2016
Deep Learning for Computer Vision
Generative Adversarial Networks (Goodfellow et al., 2014)
▪ Generator (G) that learns the real data distribution to generate fake samples
▪ Discriminator (D) that attributes a probability p of confidence of a sample being real (i.e. coming from the training data)
GeneratorG
DiscriminatorD
Training data
Noise Fake sample
Real sample
pIs sample real?
Generative Models Slide #36
Deep Learning for Computer Vision
Generative Adversarial Networks (Goodfellow et al., 2014)
▪ Both models are trained together (minimax game):▪ G: Increase the probability of D making mistakes▪ D: Classify real samples with greater confidence
▪ G slightly changes the generated data based on D’s feedback
▪
▪ Ideal scenario (equilibrium): G will eventually produce such realistic samples that D attributes p = 0.5 (i.e. cannot distinguish real and fake samples)
Generative Models Slide #37
Deep Learning for Computer Vision
Generative Adversarial Networks (Goodfellow et al., 2014)
Generative Models Slide #38
Deep Learning for Computer Vision
Conditional GANs (CGAN) (Mirza et al., 2014)
▪ G and D can be conditioned by additional information y▪ Adding y as an input of both networks will condition their outputs▪ y can be external information or data from the training set
GeneratorG
DiscriminatorD
Training data
Noise
Fake sample
Real sample
pIs sample real, given y?
y
y
Generative Models Slide #39
Deep Learning for Computer Vision
Conditional GANs (CGAN) (Mirza et al., 2014)
Gauthier, 2015
y = Senior
y = Mouth open
Generative Models Slide #40
Deep Learning for Computer Vision
Conditional GANs (CGAN) (Mirza et al., 2014)
Gauthier, 2015
y = Senior
y = Mouth open
Generative Models Slide #41
Deep Learning for Computer Vision
Conditional GANs (CGAN) (Mirza et al., 2014)
Gauthier, 2015
y = Senior
y = Mouth open
Generative Models Slide #42
Deep Learning for Computer Vision
Limitations of GANs
Chart 41
1. Training instability
○ Good sample generation requires reaching Nash Equilibrium in the game, which might not always happen
2. Mode collapse
○ When G is able to fool D by generating similarly looking samples from the same data mode
3. GANs were original made to work only with real-valued, continuous data (e.g. images)
○ Slight changes in discrete data (e.g. text) are impractical
Chart 41
Generative Models
Deep Learning for Computer Vision
Evaluation metrics
▪ What makes a good generative model?
■ Each generated sample is indistinguishable from a real sample
■ Generated samples should have variety
Images from Karras et al., 2017Generative Models Slide #44
Deep Learning for Computer Vision
Evaluation metrics
▪ How to evaluate the generated samples?
■ Cannot rely on the models’ loss :-(
■ Human evaluation :-/
■ Use a pre-trained model :-)
Generative Models Slide #45
Deep Learning for Computer Vision
Evaluation metrics
▪ Inception Score (IS) [Salimans et al., 2016]
■ Inception model (Szegedy et al., 2015) trained on ImageNet
■ Given generated image x, assigned the label y by model p:
low entropy (one class)
■ The distribution over all generated images should be spread (evaluating mode collapse)
high entropy (many classes)
■ Combining the above, we get the final metric:
https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions
Generative Models Slide #46
Deep Learning for Computer Vision
Evaluation metrics
▪ Fréchet Inception Distance (FID) [Heusel et al., 2017]
■ Calculates the distance between real and fake data (lower the better)
■ Uses the embeddings of the real and fake data from the last pooling layer of Inception v3.
■ Converts the embeddings into continuous distributions and uses the mean and covariance of each to calculate their distance.
Generative Models Slide #47
Deep Learning for Computer Vision
Evaluation metrics
▪ IS vs FID
✓ FID considers the real dataset
✓ FID requires less sampling (faster)(~10k instead of 50k in IS)
✓ FID more robust to noise and human judgement
✓ FID also sensitive to mode collapse
FID (lower is better) IS (higher is better)
Images from Lucic et al., 2017 and Heusel et al., 2017Generative Models
Deep Learning for Computer Vision
▪ MNIST (handwritten dataset)
▪ Condition the number generation per row
https://github.com/gftm/Class_Generative_Networks
Practical scenario
GAN CGAN
Generative Models Slide #49
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #50
▪ Task 1 - Add label as input to both models (plus the combined model)
▪ Task 2 - Get labels (y) from dataset
▪ Task 3 - Add labels to the models’ losses
▪ Task 4 - Generate specific numbers for each row
https://github.com/gftm/Class_Generative_Networks
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #51
▪ Task 1 - Add label as input to both models (plus the combined model)
■ def __init__(self):
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #52
▪ Task 1 - Add label as input to both models (plus the combined model)
■ def build_generator(self):
■ def build_discriminator(self):
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #53
▪ Task 2 - Get labels (y) from dataset
■ def train(self, epochs, batch_size=128, sample_interval=50):
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #54
▪ Task 3 - Add labels to the models’ losses
■ def train(self, epochs, batch_size=128, sample_interval=50):
Deep Learning for Computer Vision
Practical scenario
Generative Models Slide #55
▪ Task 4 - Generate specific numbers for each row
■ def sample_images(self, epoch):
Deep Learning for Computer Vision
References
■ A. Radford, L. Mety, and S. Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.
■ I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative Adversarial Nets.
■ M. Mirza, and S. Osindero. 2014. Conditional Generative Adversarial Nets.
■ J. Gauthier. 2015. Conditional Generative Adversarial Nets for Convolutional Face Generation.
■ T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen. 2016. Improved Techniques for Training GANs.
■ M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
■ T. Karras, T. Aila, S. Laine, J. Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation.
■ C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. 2015. Rethinking the Inception Architecture for Computer Vision.
Generative Models Slide #56