Unsupervised learning represenation with DCGAN

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

Alec Radford, Luke Metz, and Soumith Chintala

(indico Research, Facebook AI Research)

Accepted paper of ICLR 2016

HY587 Paper Presentation Shyam Krishna Khadka

George Simantiris


Introduced by Ian Goodfellow in 2014: Generative Adversarial Nets. Advances in Neural Information Processing Systems, 2672–2680. Goodfellow I., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A., Bengio Y. (2014).

GANs are focused on the optimization of competing criteria:

“We simultaneously train two models: a generative model G and a discriminative model D. Eg: G: Forger that produces counterfeit money D: Police to identify whether it is true money or fake End goal: G produces money that is hard to be distinguished by D.


Unsupervised learning that actually works well to generate and discriminate!

Generated results are hard to believe, but qualitative experiments are convincing.


Main contribution: Extensive model exploration to identify a family of

architectures that resulted in stable training across a range of datasets and allowed for training higher

resolution and deeper generative models.

Other contributions: • use the trained discriminator for image classification • the generators have vector arithmetic properties

Images generated from this method:

References:

https://github.com/Newmu/dcgan_code

http://bamos.github.io/2016/08/09/deep-completion/

https://github.com/Newmu/dcgan_code





GENERATED - IMAGENET

GENERATED - FACES

Overview of the Deep Convolutional Generative Adversarial Network (DCGAN)

Can be thought of as two separate networks

Generator G(.) input = random numbers

output = generated image

Generated image G(z):

Uniform noise vector (random numbers, z = a 100-dimensional

vector from a uniform distribution) z is the

distribution that creates new images!

Discriminator D(.) input = real/generated image

output = prediction of real image

Generator G(.) Discriminator D(.)

Generator Goal: Fool D(G(z)) i.e., generate an image G(z) such that D(G(z)) is wrong, i.e., D(G(z)) = 1.

Discriminator Goal: discriminate between real and generated images i.e., D(x)=1, where x is a real image D(G(z))=0, where G(z) is a generated image.

Conflicting goals. Both goals are unsupervised. Optimal when D(.)=0.5 (i.e., cannot

tell the difference between real and generated images) and G(z)=learns the training images distribution.

Example Architecture:

DCGAN Generator:

Fully-connected layer (composed of weights) reshaped to have width, height and feature

maps

Uses ReLU activation functions

Fractionally-strided convolutions: 8x8 input, 5x5 conv window = 16x16

output

Batch Normalization: normalize responses to have zero mean and unit variance over the entire mini-

batch, but not in last layer (to prevent sample oscilation and model instability)

Uses Tanh to scale generated image output

between -1 and 1

No max pooling! Increases spatial dimensionality

through fractionally-strided convolutions

Fractionally-strided convolution

Input = 5x5 with zero-padding at

border = 6x6 (stride=2)

Output = 3x3

Input = 3x3 Interlace zero-padding with

inputs = 7x7 (stride=1)

Output = 5x5

Filter size=3x3

Clear dashed squares = zero-padded inputs

Regular convolution

DCGAN Discriminator:

Real image

Generated

Uses LeakyReLU activation functions

Batch Normalization

No max pooling! Reduces spatial dimensionality through strided

convolutions

Sigmoid (between 0-1)

Stride 2, padding 2

ARCHITECTURE GUIDELINES FOR STABLE DEEP CONVOLUTIONAL GANS

Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).

Use batchnorm in both the generator and the discriminator.

Remove fully connected hidden layers for deeper architectures.

Use ReLU activation in generator.

Use LeakyReLU activation in the discriminator.

DETAILS OF ADVERSARIAL TRAINING

Pre-processing: scale images between -1 and 1 (tanh range).

Minibatch SGD (m = 128).

Weight init.: zero-centered normal distribution (std. dev. = 0.02).

Leaky ReLU slope = 0.2.

Adam optimizer with tuned hyperparameters to accelerate training.

Learning rate = 0.0002.

Momentum term β1 = 0.5 to stabilize training.

DCGANs were trained on three datasets: Large-scale Scene Understanding (LSUN), Imagenet-1k, Faces (newly assembled).

GENERATED IMAGES AND SANITY CHECKS THAT IT'S NOT JUST MEMORIZING EXAMPLES…

Generated LSUN bedrooms after one (left) and five (right) epochs of training.

SMOOTH TRANSITION OF SCENES PRODUCED BY INTERPOLATION BETWEEN A SERIES OF RANDOM POINTS IN Z

Average 4 vectors from exemplar faces looking left and 4 looking right.

Interpolate between the left and right vectors creates a "turn vector“.

(Top) Unmodified sample generated images

(Bottom) Samples generated after dropping out "window" concept. Some windows are removed or transformed.

The overall scene stays the same, indicating the generator has separated objects (windows) from the scene.

MANIPULATING THE GENERATOR REPRESENTATION (FORGETTING TO DRAW CERTAIN OBJECTS)

Find 3 exemplar images (e.g., 3 smiling women)

Average their Z vector

Other images produced by adding small uniform noise to the new vector!

Generate an image based on this new vector!!!

Do simple vector arithmetic operations

Arithmetic in pixel space

VECTOR ARITHMETIC ON FACE SAMPLES

GANS AS FEATURE EXTRACTOR

CIFAR-10

1) Train on ImageNet 2) Get all the responses from the Discriminator's layers 3) Max-pool each layer to get a 4x4 spatial grid 4) Flatten to form feature vector 5) Train a regularized linear L2-SVM classifier for CIFAR-10 (note: while other approaches achieve higher performance, this network was not trained on CIFAR-10!)

SUMMARY

Unsupervised learning that really seems to work. Visualizations indicate that the Generator is learning something

close to the true distribution of real images. Classification performance using the Discriminator features

indicates that features learned are discriminative of the underlying classes.

APPENDIX:

OPTIMIZING A GENERATIVE ADVERSARIAL NETWORK (GAN)

Gradient w.r.t the parameters of the

Discriminator

Gradient w.r.t the parameters of the

Generator

maximize

minimize

Loss function to maximize for the

Discriminator

Loss function to minimize for the

Generator

Interpretation: compute the gradient of the loss function, and then update the parameters to

min/max the loss function (gradient descent/ascent)

EXAMPLE 1:

Uniform noise vector (random numbers)

Real images

minimize

Imagine for a real image D(x) scores 0.8 it is a real image (correct)

D(x) = 0.8 log(0.8) = -0.2

D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2

Then for a generated image, D(G(z)) scores 0.2 it is a generated image (correct)

We add them together and this gives us a fairly high (-0.4) loss. We ascend so we want to maximize this). Note that we are adding two negative numbers so 0 is the upper bound.

EXAMPLE 1 (continued):

D(G(z)) scores 0.2 a generated image is a generated image bad, D(.) wasn’t fooled. Assigned loss is -0.2. Note that we want to minimize this loss function.

D(G(z)) = 0.2 log(1-0.2) = log(0.8) = -0.2

EXAMPLE 2:

minimize

For a real image D(x) scores 0.2 it is a generated image (wrong)

D(x) = 0.2 log(0.8) = -1.6

D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6

Then for a generated image, D(G(z)) scores 0.8 it is a real image (wrong)

These bad predictions combined give a loss of -3.2. A lower value compared to the loss to when we had good predictions (Ex. 1). Remember the goal is to maximize!

EXAMPLE 2 (continued):

D(G(z)) scores 0.8 a generated image is a real image good, D(.) was fooled. Assigned loss is -1.6. Compare to the previous loss and remember that we want to minimize this loss function!

D(G(z)) = 0.8 log(1-0.8) = log(0.2) = -1.6

Unsupervised learning represenation with DCGAN

Education

Transcript of Unsupervised learning represenation with DCGAN