Unsupervised learning represenation with DCGAN

29
UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS Alec Radford, Luke Metz, and Soumith Chintala (indico Research, Facebook AI Research) Accepted paper of ICLR 2016 HY587 Paper Presentation Shyam Krishna Khadka George Simantiris

Transcript of Unsupervised learning represenation with DCGAN

Page 1: Unsupervised learning represenation with DCGAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Alec Radford, Luke Metz, and Soumith Chintala

(indico Research, Facebook AI Research)

Accepted paper of ICLR 2016

HY587 Paper Presentation Shyam Krishna Khadka

George Simantiris

Page 2: Unsupervised learning represenation with DCGAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Introduced by Ian Goodfellow in 2014: Generative Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. (2014).

GANs are focused on the optimization of competing criteria:

“We simultaneously train two models: a generative model G and a discriminative model D. Eg: G: Forger that produces counterfeit money D: Police to identify whether it is true money or fake End goal: G produces money that is hard to be distinguished by D.

Page 3: Unsupervised learning represenation with DCGAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Unsupervised learning that actually works well to generate and discriminate!

Generated results are hard to believe, but qualitative experiments are convincing.

Page 4: Unsupervised learning represenation with DCGAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Main contribution: Extensive model exploration to identify a family of

architectures that resulted in stable training across a range of datasets and allowed for training higher

resolution and deeper generative models.

Other contributions: • use the trained discriminator for image classification • the generators have vector arithmetic properties

Page 6: Unsupervised learning represenation with DCGAN

GENERATED - IMAGENET

Page 7: Unsupervised learning represenation with DCGAN

GENERATED - FACES

Page 8: Unsupervised learning represenation with DCGAN

Overview of the Deep Convolutional Generative Adversarial Network (DCGAN)

Can be thought of as two separate networks

Page 9: Unsupervised learning represenation with DCGAN

Generator G(.) input = random numbers

output = generated image

Generated image G(z):

Uniform noise vector (random numbers, z = a 100-dimensional

vector from a uniform distribution) z is the

distribution that creates new images!

Page 10: Unsupervised learning represenation with DCGAN

Discriminator D(.) input = real/generated image

output = prediction of real image

Page 11: Unsupervised learning represenation with DCGAN

Generator G(.) Discriminator D(.)

Generator Goal: Fool D(G(z)) i.e., generate an image G(z) such that D(G(z)) is wrong, i.e., D(G(z)) = 1.

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image D(G(z))=0, where G(z) is a generated image.

Conflicting goals. Both goals are unsupervised. Optimal when D(.)=0.5 (i.e., cannot

tell the difference between real and generated images) and G(z)=learns the training images distribution.

Example Architecture:

Page 12: Unsupervised learning represenation with DCGAN

DCGAN Generator:

Fully-connected layer (composed of weights) reshaped to have width, height and feature

maps

Uses ReLU activation functions

Fractionally-strided convolutions: 8x8 input, 5x5 conv window = 16x16

output

Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini-

batch, but not in last layer (to prevent sample oscilation and model instability)

Uses Tanh to scale generated image output

between -1 and 1

No max pooling! Increases spatial dimensionality

through fractionally-strided convolutions

Page 13: Unsupervised learning represenation with DCGAN

Fractionally-strided convolution

Input = 5x5 with zero-padding at

border = 6x6 (stride=2)

Output = 3x3

Input = 3x3 Interlace zero-padding with

inputs = 7x7 (stride=1)

Output = 5x5

Filter size=3x3

Clear dashed squares = zero-padded inputs

Regular convolution

Page 14: Unsupervised learning represenation with DCGAN

DCGAN Discriminator:

Real image

Generated

Uses LeakyReLU activation functions

Batch Normalization

No max pooling! Reduces spatial dimensionality through strided

convolutions

Sigmoid (between 0-1)

Stride 2, padding 2

Page 15: Unsupervised learning represenation with DCGAN

ARCHITECTURE GUIDELINES FOR STABLE DEEP CONVOLUTIONAL GANS

Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).

Use batchnorm in both the generator and the discriminator.

Remove fully connected hidden layers for deeper architectures.

Use ReLU activation in generator.

Use LeakyReLU activation in the discriminator.

Page 16: Unsupervised learning represenation with DCGAN

DETAILS OF ADVERSARIAL TRAINING

Pre-processing: scale images between -1 and 1 (tanh range).

Minibatch SGD (m = 128).

Weight init.: zero-centered normal distribution (std. dev. = 0.02).

Leaky ReLU slope = 0.2.

Adam optimizer with tuned hyperparameters to accelerate training.

Learning rate = 0.0002.

Momentum term β1 = 0.5 to stabilize training.

DCGANs were trained on three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, Faces (newly assembled).

Page 17: Unsupervised learning represenation with DCGAN

GENERATED IMAGES AND SANITY CHECKS THAT IT'S NOT JUST MEMORIZING EXAMPLES…

Generated LSUN bedrooms after one (left) and five (right) epochs of training.

Page 18: Unsupervised learning represenation with DCGAN

SMOOTH TRANSITION OF SCENES PRODUCED BY INTERPOLATION BETWEEN A SERIES OF RANDOM POINTS IN Z

Page 19: Unsupervised learning represenation with DCGAN

Average 4 vectors from exemplar faces looking left and 4 looking right.

Interpolate between the left and right vectors creates a "turn vector“.

Page 20: Unsupervised learning represenation with DCGAN

(Top) Unmodified sample generated images

(Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformed.

The overall scene stays the same, indicating the generator has separated objects (windows) from the scene.

MANIPULATING THE GENERATOR REPRESENTATION (FORGETTING TO DRAW CERTAIN OBJECTS)

Page 21: Unsupervised learning represenation with DCGAN

Find 3 exemplar images (e.g., 3 smiling women)

Average their Z vector

Other images produced by adding small uniform noise to the new vector!

Generate an image based on this new vector!!!

Do simple vector arithmetic operations

Arithmetic in pixel space

VECTOR ARITHMETIC ON FACE SAMPLES

Page 22: Unsupervised learning represenation with DCGAN

GANS AS FEATURE EXTRACTOR

CIFAR-10

1) Train on ImageNet 2) Get all the responses from the Discriminator's layers 3) Max-pool each layer to get a 4x4 spatial grid 4) Flatten to form feature vector 5) Train a regularized linear L2-SVM classifier for CIFAR-10 (note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)

Page 23: Unsupervised learning represenation with DCGAN

SUMMARY

Unsupervised learning that really seems to work. Visualizations indicate that the Generator is learning something

close to the true distribution of real images. Classification performance using the Discriminator features

indicates that features learned are discriminative of the underlying classes.

Page 24: Unsupervised learning represenation with DCGAN
Page 25: Unsupervised learning represenation with DCGAN

APPENDIX:

OPTIMIZING A GENERATIVE ADVERSARIAL NETWORK (GAN)

Gradient w.r.t the parameters of the

Discriminator

Gradient w.r.t the parameters of the

Generator

maximize

minimize

Loss function to maximize for the

Discriminator

Loss function to minimize for the

Generator

Interpretation: compute the gradient of the loss function, and then update the parameters to

min/max the loss function (gradient descent/ascent)

Page 26: Unsupervised learning represenation with DCGAN

EXAMPLE 1:

Uniform noise vector (random numbers)

Real images

minimize

Imagine for a real image D(x) scores 0.8 it is a real image (correct)

D(x) = 0.8 log(0.8) = -0.2

D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2

Then for a generated image, D(G(z)) scores 0.2 it is a generated image (correct)

We add them together and this gives us a fairly high (-0.4) loss. We ascend so we want to maximize this). Note that we are adding two negative numbers so 0 is the upper bound.

Page 27: Unsupervised learning represenation with DCGAN

EXAMPLE 1 (continued):

D(G(z)) scores 0.2 a generated image is a generated image bad, D(.) wasn’t fooled. Assigned loss is -0.2. Note that we want to minimize this loss function.

D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2

Page 28: Unsupervised learning represenation with DCGAN

EXAMPLE 2:

minimize

For a real image D(x) scores 0.2 it is a generated image (wrong)

D(x) = 0.2 log(0.8) = -1.6

D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6

Then for a generated image, D(G(z)) scores 0.8 it is a real image (wrong)

These bad predictions combined give a loss of -3.2. A lower value compared to the loss to when we had good predictions (Ex. 1). Remember the goal is to maximize!

Page 29: Unsupervised learning represenation with DCGAN

EXAMPLE 2 (continued):

D(G(z)) scores 0.8 a generated image is a real image good, D(.) was fooled. Assigned loss is -1.6. Compare to the previous loss and remember that we want to minimize this loss function!

D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6