Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a...

4
Generative Adversarial Networks Nikhil Sardana and Justin Zhang May 2018 1 Introduction At this point in your AI class, you’re probably learning about Neural Networks. We’re here to talk to you about Generative Adversarial Networks (GANs), which are a much more complex type of neural network. As you’ll soon find out, GANs are used for an entirely different purpose than standard networks. We won’t be covering the GAN architecture in too much detail, as it’s a fairly complex topic which you could spend hours and hours learning. Instead, we’ll present a high-level overview of how GANs work before getting to the fun examples. 2 Neural Network Review Before we begin, we’ll give you a high-level refresher on neural networks. Most networks you’ve seen so far probably look something like this: x 0 x 1 W 1 x 2 W 2 We have some input nodes, a hidden layer, and output nodes. Each node is connected to every node in the next layer by a weight. In addition, there is a bias at each node. A neural network learns when it is given training data and labels. The data (inputs) can be in the form of text, images, numbers, etc. The label is the ground truth, the correct answer for the given input. Given enough data-label pairs, a network can learn to generalize the relationship between the data and label. Initially, the network spits out random guesses; the values of the output nodes are nowhere near the labels of the dataset. However, over time, a neural network “learns” by adjusting the weights and biases to minimize the error when training. The algorithm used to adjust the weights and biases is called backpropagation. After training, a neural network is tested or validated on a set of data it has never seen before (i.e. data not part of the training set). This validation accuracy shows just how well a network has learned to generalize through training. Neural networks are classifiers. They serve to categorize inputs. For example, let’s say I wanted to use a neural network to predict a certain college’s acceptances. I might use a student’s SAT score, GPA, and number of AP exams as inputs, as whether they were accepted or not (0 or 1) as an output. Given the past decade of TJ students’ results, I could train this network and then use it to help next year’s seniors. But there are plenty of problems out there that aren’t classification at all! Let’s say, using my previous example, that this is a very selective college, and there aren’t many students accepted. So few, in fact, that 1

Transcript of Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a...

Page 1: Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a discriminator. The generator takes in a sample from a random distribution (a noise

Generative Adversarial Networks

Nikhil Sardana and Justin Zhang

May 2018

1 Introduction

At this point in your AI class, you’re probably learning about Neural Networks. We’re here to talk to youabout Generative Adversarial Networks (GANs), which are a much more complex type of neural network.As you’ll soon find out, GANs are used for an entirely different purpose than standard networks. We won’tbe covering the GAN architecture in too much detail, as it’s a fairly complex topic which you could spendhours and hours learning. Instead, we’ll present a high-level overview of how GANs work before getting tothe fun examples.

2 Neural Network Review

Before we begin, we’ll give you a high-level refresher on neural networks. Most networks you’ve seen so farprobably look something like this:

x0 x1

W1

x2

W2

We have some input nodes, a hidden layer, and output nodes. Each node is connected to every node inthe next layer by a weight. In addition, there is a bias at each node.

A neural network learns when it is given training data and labels. The data (inputs) can be in theform of text, images, numbers, etc. The label is the ground truth, the correct answer for the given input.Given enough data-label pairs, a network can learn to generalize the relationship between the data and label.Initially, the network spits out random guesses; the values of the output nodes are nowhere near the labels ofthe dataset. However, over time, a neural network “learns” by adjusting the weights and biases to minimizethe error when training. The algorithm used to adjust the weights and biases is called backpropagation.

After training, a neural network is tested or validated on a set of data it has never seen before (i.e.data not part of the training set). This validation accuracy shows just how well a network has learned togeneralize through training.

Neural networks are classifiers. They serve to categorize inputs. For example, let’s say I wanted to usea neural network to predict a certain college’s acceptances. I might use a student’s SAT score, GPA, andnumber of AP exams as inputs, as whether they were accepted or not (0 or 1) as an output. Given the pastdecade of TJ students’ results, I could train this network and then use it to help next year’s seniors.

But there are plenty of problems out there that aren’t classification at all! Let’s say, using my previousexample, that this is a very selective college, and there aren’t many students accepted. So few, in fact, that

1

Page 2: Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a discriminator. The generator takes in a sample from a random distribution (a noise

my neural network doesn’t have enough data to learn from. Well, what if I could just create a fake TJstudent? If I took all the students that were historically accepted, and somehow used them to generate newstudents that would also be accepted, then I could train my original network with no issues. GenerativeAdversarial Networks tackle this problem of generating information, whether it’s student scores or imagesor text.

3 GAN Overview

I like cat pictures. The Internet is full of cat pictures, but I decide there aren’t enough cat pictures! Onlyproblem is, I don’t own a cat. So, I take the only other option and write a GAN to generate some brandnew cat pictures.

Generative Adversarial Networks have two parts: a generator and a discriminator. The generator takesin a sample from a random distribution (a noise vector), and outputs a fake image. The discriminator triesto tell the difference between the generator’s output (the fake images) and real images.

In the beginning, my generator will spit out images that look nothing like cats. However, my discriminatorwill also be extremely stupid, so it will have a difficult time telling apart these random images from realsamples of cats. Every time my discriminator makes a mistake and misclassifies an image (says a fake cat is areal one or vice versa), it will be penalized, and my generator will be rewarded. If the opposite happens, andthe discriminator correctly identifies a fictitious feline, then my generator will be penalized, as it couldn’tcreate a convincing enough cat. Because the generator is rewarded what the discriminator is penalized,Generative Adversarial Networks are a two-player zero-sum game.

We train the generator and the discriminator together, so they both get better over time. As long as thegenerator and discriminator get better at roughly the same rate, we will eventually end up with a generatorthat can create quite realistic cat images and a discriminator that can accurately tell fake images from realones.

3.1 A More In-Depth Look

The objective of a GAN is to generate realistic images. Given some data distribution, we want our gen-erator’s output distribution to match this distribution exactly. When we talk about our true distribution,we refer to the set of real images. If we have a set of a thousand cat pictures, each picture will be slightly

2

Page 3: Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a discriminator. The generator takes in a sample from a random distribution (a noise

different, but at peach pixel our dataset will have some mean and median and standard deviation value.This distribution of cat data is very high-dimensional and hopelessly complex. The generator of a GAN usesa neural network to map a much simpler distribution which we can easily sample from (the two-dimensionalGaussian distribution) to this complex cat distribution.

Hopefully you’ve seen minimax before when working with Othello, and its the same idea here:

minG

maxD

V (D,G) = Ex∼pdata(x)[log(D(x)] + Ez∼pdata(z)[log(1−D(G(z))]

Let’s break down this fairly complex objective function. Consider the first term from the discriminator’sperspective:

Ex∼pdata(x)[log(D(x)]

Here, pdata is the dataset’s distribution (the distribution of real cat images). x is a sample from thisdistribution, a real cat image. D(x) is the output of the discriminator given some real image input. Rememberthat D(x) outputs a 1 when it thinks the input image x is real, and 0 when it thinks the input is fake. Thus,we wish to maximize D(x), to get it closer to 1. Maximizing D(x) also maximizes log(D(x)). Ex∼pdata(x)

simply means the expected value (E) given some x sample from the pdata distribution.Now, lets consider the second term from the generator’s perspective:

Ez∼pdata(z)[log(1−D(G(z))]

Here, z represents a sample of random noise from the random distribution pdata(z). This random noise isfed into the generator G(z). The output of the generator (a fake image) is then fed through the discriminatorD(G(z)). Again, D(G(z)) will output 0 when the discriminator thinks the image is fake, and 1 when itthinks it’s real. Since the generator is creating fake data, and it wants to trick the discriminator, it wantsD(G(z)) = 1. Since the generator wishes to maximize D(G(z) (get it as close to 1 as possible), and thuswants to maximize log(D(G(z))), it wants to minimize log(1 −D(G(z)). We can think of log(1 −D(G(z))as the log probability that the discriminator thinks the generated output is fake. Ez∼pdata(z) simply meansthe expected value E given some z sample from the random pdata(z) distribution.

To sum these terms up: the discriminator wishes to maximize the first term, and the generator wishesto minimize the second term.

3.2 Some Issues

Note that GANs are notoriously difficult to train. This is because GANs are highly unstable; in order totrain correctly, we need the generator and discriminator to be roughly on equal levels throughout the trainingprocess. If the discriminator overpowers the generator, there will be little gradient for the generator to learnupon; vice-versa, and we run into mode collapse, where the generator produces outputs with extremelylow variety.

3

Page 4: Generative Adversarial NetworksGenerative Adversarial Networks have two parts: a generator and a discriminator. The generator takes in a sample from a random distribution (a noise

4 Applications

GANs have a variety of novel applications—that’s what makes them so exciting. besides just standard imagegeneration, GANs can also be used for image generation from captions, image mapping, super-resolutionand style transfer.

5 CycleGAN

The goal of the CycleGAN is to learn a mapping G that translates a source domain X to a target domainY given unpaired images.

However, the problem of learning G such that G(X)→ Y is that the mapping G is very under-constrained(in other words, there are many ways that G can minimize loss of the dataset as a whole while producingqualitatively not-very-good individual images). The basic approach of the CycleGAN, thus, is to ensure cyclicconsistency by introducing an inverse mapping F and a cyclic consistency loss that enforces F (G(X)) ≈ Xand G(F (Y )) ≈ Y .

As usual, the loss for G is

minGmaxDYlogDY (xi) + log(1−DY (G(zi))

An analogous loss is used for F .However, a new term is introduced for cyclic consistency:

λ||F (G(x))− x||1 + ||G(F (y))− y||1

The L1 norm was selected based on empirical data.The final loss, then, is the sum of these three individual losses.The CycleGAN’s architecture is based on pix2pix’s PatchGAN, which essentially uses a discriminator

that classifies NxN patches. This helps to preserve smaller details, such as texture and style.

4