SeqGAN - Lantao Yu

23
SeqGAN Sequence Generative Adversarial Nets with Policy Gradient Feb. 7 2017 at AAAI Lantao Yu , Weinan Zhang , Jun Wang , Yong Yu Shanghai Jiao Tong University, University College London Email: [email protected]

Transcript of SeqGAN - Lantao Yu

Page 1: SeqGAN - Lantao Yu

SeqGANSequence Generative Adversarial

Nets with Policy Gradient

Feb.72017 atAAAI

Lantao Yu

†, Weinan Zhang

†, Jun Wang

‡, Yong Yu

††Shanghai Jiao Tong University,

‡University College London

Email: [email protected]

Page 2: SeqGAN - Lantao Yu

ProblemDefinition• Givenadataset of real-world structured sequences,train a -parameterized generative model toproduce sequences that mimic the real ones.

• In other words, we want to fit the unknown truedata distribution , which is onlyrevealed by the given dataset .

✓ G✓

G✓ptrue(yt|Y1:t�1)

D = {Y1:T }

Page 3: SeqGAN - Lantao Yu

• Traditionalobjective:maximumlikelihoodestimation(MLE)

• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel

• Suffer from so-called in the inference stag:exposure bias

Training InferenceWhen generating the nexttoken , sample from:yt

max

✓EY⇠ptrue

X

t

logG✓(yt|Y1:t�1) G✓(yt|Y1:t�1)

Update the model as follows:

The real prefix The guessed prefix

max

1

|D|X

Y1:T2D

X

t

log[G✓(yt|Y1:t�1)]

Page 4: SeqGAN - Lantao Yu

A promising method:GenerativeAdversarialNets(GANs)

• Discriminatortriestocorrectlydistinguishthetruedataandthefakemodel-generateddata• Generatortriestogeneratehigh-qualitydatatofooldiscriminator• Ideally,whenDcannotdistinguishthetrueandgenerateddata,Gnicelyfitsthetrueunderlyingdatadistribution

GD

RealWorld

Generator

Discriminator

Data

[Goodfellow I,Pouget-Abadie J,MirzaM,etal. 2014.Generativeadversarialnets.InNIPS2014.]

Page 5: SeqGAN - Lantao Yu

GeneratorNetworkinGANs

• Mustbedifferentiable• Popularimplementation:multi-layerperceptron• Linkedwiththediscriminatorandgetguidancefromit

P (t rue)P (t rue)

G

D

x = G(z; ✓(G))

min

Gmax

DEx⇠pdata(x)[logD(x)] + E

z⇠pz(z)[log(1�D(G(z))]

Page 6: SeqGAN - Lantao Yu

ProblemforDiscreteData• Oncontinuousdata,thereisdirectgradient

• Guidethegeneratorto(slightly)modifytheoutput

• Nodirectgradientondiscretedata• Textgenerationexample

• “Icaughtapenguininthepark”• FromIanGoodfellow:“Ifyououtputtheword‘penguin’,youcan'tchangethatto"penguin+.001"onthenextstep,becausethereisnosuchwordas"penguin+.001".Youhavetogoallthewayfrom"penguin"to"ostrich".”

P (t rue)P (t rue)

G

D

[https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/]

r✓(G)

1

m

mX

i=1

log(1�D(G(z(i))))

Page 7: SeqGAN - Lantao Yu

SeqGAN

• Generatorisareinforcementlearningpolicyofgeneratingasequence• decidethenextword togenerate (action)giventhepreviousonesas the state

• Discriminatorprovidesthereward(i.e.theprobabilityofbeingtruedata) forthe sequence

G✓(yt|Y1:t�1)

D�(Yn1:T )

Page 8: SeqGAN - Lantao Yu

SequenceGenerator• Objective:tomaximizetheexpectedreward

• State-actionvaluefunction istheexpectedaccumulativerewardthat• Startfromstates• Takingaction a• Andfollowingpolicy Guntiltheend

• Rewardisonlyoncompletedsequence(noimmediatereward)

J(✓) = E[RT |s0, ✓] =X

y12YG✓(y1|s0) ·QG✓

D�(s0, y1)

QG✓D�

(s, a)

QG✓D�

(s = Y1:T�1, a = yT ) = D�(Y1:T )

Page 9: SeqGAN - Lantao Yu

State-ActionValueSetting• Rewardisonlyoncompletedsequence• Noimmediatereward• Thenthelast-stepstate-actionvalue

• Forintermediatestate-actionvalue• UseMonteCarlosearchtoestimate

• Followingaroll-outpolicy GG

QG✓D�

(s = Y1:T�1, a = yT ) = D�(Y1:T )

QG✓D�

(s = Y1:t�1, a = yt) =⇢

1N

PNn=1 D�(Y n

1:T ), Y n1:T 2 MC

G�(Y1:t;N) for t < T

D�(Y1:t) for t = T ,

�Y 11:T , . . . , Y

N1:T

= MCG� (Y1:t;N)

Page 10: SeqGAN - Lantao Yu

TrainingSequenceDiscriminator• Objective:standardbi-classification

min

��EY⇠pdata [logD�(Y )]� EY⇠G✓ [log(1�D�(Y ))]

Page 11: SeqGAN - Lantao Yu

TrainingSequenceGenerator• Policygradient(REINFORCE)

[RichardSuttonetal.PolicyGradientMethodsforReinforcementLearningwithFunctionApproximation.NIPS1999.]

r✓J(✓) = EY1:t�1⇠G✓ [

X

yt2Yr✓G✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)]

' 1

T

TX

t=1

X

yt2Yr✓G✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)

=

1

T

TX

t=1

X

yt2YG✓(yt|Y1:t�1)r✓ logG✓(yt|Y1:t�1) ·QG✓

D�(Y1:t�1, yt)

=

1

T

TX

t=1

Eyt⇠G✓(yt|Y1:t�1)[r✓ logG✓(yt|Y1:t�1) ·QG✓D�

(Y1:t�1, yt)],

✓ ✓ + ↵hr✓J(✓)

Page 12: SeqGAN - Lantao Yu

OverallAlgorithm

Page 13: SeqGAN - Lantao Yu

SequenceGeneratorModel

• RNNwithLSTMcells

[Hochreiter,S.,andSchmidhuber,J.1997.Longshort-termmemory.Neuralcomputation9(8):1735–1780.]

Shanghai is incredibly

is incrediblySoftmax samplingovervocabulary

?

Page 14: SeqGAN - Lantao Yu

SequenceDiscriminatorModel

[Kim,Y.2014.Convolutionalneuralnetworksforsentenceclassification.EMNLP2014.]

Page 15: SeqGAN - Lantao Yu

InconsistencyofEvaluationandUse

• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel• Approximatedby

Evaluation Use

• Checkwhetheramodel-generateddataisconsideredasrealaspossible• Morestraightforwardbutitishardorimpossibletodirectlycalculate

• Givenagenerator withacertaingeneralizationabilityG✓

max

1

|D|X

x2D

[logG

(x)]

Ex⇠ptrue(x)[logG✓

(x)] Ex⇠G✓(x)[log ptrue(x)]

ptrue(x)

Page 16: SeqGAN - Lantao Yu

ExperimentsonSyntheticData• EvaluationmeasurewithOracle

• An oracle model (e.g. the randomly initialized LSTM)• Firstly, the oracle model produces some sequences astraining data for the generative model• Secondly the oracle model can be considered as thehuman observer to accurately evaluate the perceptualquality of the generative model

NLL

oracle

= �EY1:T⇠G✓

h TX

t=1

logGoracle

(yt|Y1:t�1

)

i

Page 17: SeqGAN - Lantao Yu

ExperimentsonSyntheticData• EvaluationmeasurewithOracle

NLL

oracle

= �EY1:T⇠G✓

h TX

t=1

logGoracle

(yt|Y1:t�1

)

i

Page 18: SeqGAN - Lantao Yu

ExperimentsonSyntheticData• The training strategy really matters.

Page 19: SeqGAN - Lantao Yu

ExperimentsonReal-WorldData• Chinesepoemgeneration

• Obamapoliticalspeechtextgeneration

• Midimusicgeneration

Page 20: SeqGAN - Lantao Yu

ExperimentsonReal-WorldData• Chinesepoemgeneration

南陌春风早,东邻去日斜。

紫陌追随日,青门相见时。

胡风不开花,四气多作雪。

山夜有雪寒,桂里逢客时。

此时人且饮,酒愁一节梦。

四面客归路,桂花开青竹。

Human Machine

Page 21: SeqGAN - Lantao Yu

ObamaSpeechTextGeneration• i stoodheretodayi haveoneandmostimportantthingthatnotonviolencethroughoutthehorizonisOTHERSamericanfireandOTHERSbutweneedyouareastrongsource

• forthisbusinessleadershipwillremembernowi can’taffordtostartwithjustthewayoureuropean supportfortherightthingtoprotectthoseamericanstoryfromtheworldand

• i wanttoacknowledgeyouweregoingtobeanoutstandingjobtimesforstudentmedicaleducationandwarmtherepublicanswholikemytimesifhesaidisthatbroughtthe

• whenhewastoldofthisextraordinaryhonorthathewasthemosttrustedmaninamerica

• butwealsorememberandcelebratethejournalismthatwalter practicedastandardofhonestyandintegrityandresponsibilitytowhichsomanyofyouhavecommittedyourcareers.it'sastandardthat'salittlebithardertofindtoday

• i amhonoredtobeheretopaytributetothelifeandtimesofthemanwhochronicledourtime.

Human Machine

Page 22: SeqGAN - Lantao Yu

Summary

• We proposed a sequence generation method, calledSeqGAN, toeffectivelytrainGenerativeAdversarialNetsfor discrete structured sequencesgenerationviapolicygradient.

• Design an experiment framework with oracle evaluationmetric to accurately evaluate the “perceptual quality”of model-generated sequences.

Page 23: SeqGAN - Lantao Yu

ThankYou

Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.

Lantao Yu

†, Weinan Zhang

†, Jun Wang

‡, Yong Yu

††Shanghai Jiao Tong University,

‡University College London

Email: [email protected]