Welcome to Yu Cai Elementary School. Location Teachers Center Yu Cai.
SeqGAN - Lantao Yu
Transcript of SeqGAN - Lantao Yu
![Page 1: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/1.jpg)
SeqGANSequence Generative Adversarial
Nets with Policy Gradient
Feb.72017 atAAAI
Lantao Yu
†, Weinan Zhang
†, Jun Wang
‡, Yong Yu
††Shanghai Jiao Tong University,
‡University College London
Email: [email protected]
![Page 2: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/2.jpg)
ProblemDefinition• Givenadataset of real-world structured sequences,train a -parameterized generative model toproduce sequences that mimic the real ones.
• In other words, we want to fit the unknown truedata distribution , which is onlyrevealed by the given dataset .
✓ G✓
G✓ptrue(yt|Y1:t�1)
D = {Y1:T }
![Page 3: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/3.jpg)
• Traditionalobjective:maximumlikelihoodestimation(MLE)
• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel
• Suffer from so-called in the inference stag:exposure bias
Training InferenceWhen generating the nexttoken , sample from:yt
max
✓EY⇠ptrue
X
t
logG✓(yt|Y1:t�1) G✓(yt|Y1:t�1)
Update the model as follows:
The real prefix The guessed prefix
max
✓
1
|D|X
Y1:T2D
X
t
log[G✓(yt|Y1:t�1)]
![Page 4: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/4.jpg)
A promising method:GenerativeAdversarialNets(GANs)
• Discriminatortriestocorrectlydistinguishthetruedataandthefakemodel-generateddata• Generatortriestogeneratehigh-qualitydatatofooldiscriminator• Ideally,whenDcannotdistinguishthetrueandgenerateddata,Gnicelyfitsthetrueunderlyingdatadistribution
GD
RealWorld
Generator
Discriminator
Data
[Goodfellow I,Pouget-Abadie J,MirzaM,etal. 2014.Generativeadversarialnets.InNIPS2014.]
![Page 5: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/5.jpg)
GeneratorNetworkinGANs
• Mustbedifferentiable• Popularimplementation:multi-layerperceptron• Linkedwiththediscriminatorandgetguidancefromit
P (t rue)P (t rue)
G
D
x = G(z; ✓(G))
min
Gmax
DEx⇠pdata(x)[logD(x)] + E
z⇠pz(z)[log(1�D(G(z))]
![Page 6: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/6.jpg)
ProblemforDiscreteData• Oncontinuousdata,thereisdirectgradient
• Guidethegeneratorto(slightly)modifytheoutput
• Nodirectgradientondiscretedata• Textgenerationexample
• “Icaughtapenguininthepark”• FromIanGoodfellow:“Ifyououtputtheword‘penguin’,youcan'tchangethatto"penguin+.001"onthenextstep,becausethereisnosuchwordas"penguin+.001".Youhavetogoallthewayfrom"penguin"to"ostrich".”
P (t rue)P (t rue)
G
D
[https://www.reddit.com/r/MachineLearning/comments/40ldq6/generative_adversarial_networks_for_text/]
r✓(G)
1
m
mX
i=1
log(1�D(G(z(i))))
![Page 7: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/7.jpg)
SeqGAN
• Generatorisareinforcementlearningpolicyofgeneratingasequence• decidethenextword togenerate (action)giventhepreviousonesas the state
• Discriminatorprovidesthereward(i.e.theprobabilityofbeingtruedata) forthe sequence
G✓(yt|Y1:t�1)
D�(Yn1:T )
![Page 8: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/8.jpg)
SequenceGenerator• Objective:tomaximizetheexpectedreward
• State-actionvaluefunction istheexpectedaccumulativerewardthat• Startfromstates• Takingaction a• Andfollowingpolicy Guntiltheend
• Rewardisonlyoncompletedsequence(noimmediatereward)
J(✓) = E[RT |s0, ✓] =X
y12YG✓(y1|s0) ·QG✓
D�(s0, y1)
QG✓D�
(s, a)
QG✓D�
(s = Y1:T�1, a = yT ) = D�(Y1:T )
![Page 9: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/9.jpg)
State-ActionValueSetting• Rewardisonlyoncompletedsequence• Noimmediatereward• Thenthelast-stepstate-actionvalue
• Forintermediatestate-actionvalue• UseMonteCarlosearchtoestimate
• Followingaroll-outpolicy GG
QG✓D�
(s = Y1:T�1, a = yT ) = D�(Y1:T )
QG✓D�
(s = Y1:t�1, a = yt) =⇢
1N
PNn=1 D�(Y n
1:T ), Y n1:T 2 MC
G�(Y1:t;N) for t < T
D�(Y1:t) for t = T ,
�Y 11:T , . . . , Y
N1:T
= MCG� (Y1:t;N)
![Page 10: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/10.jpg)
TrainingSequenceDiscriminator• Objective:standardbi-classification
min
��EY⇠pdata [logD�(Y )]� EY⇠G✓ [log(1�D�(Y ))]
![Page 11: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/11.jpg)
TrainingSequenceGenerator• Policygradient(REINFORCE)
[RichardSuttonetal.PolicyGradientMethodsforReinforcementLearningwithFunctionApproximation.NIPS1999.]
r✓J(✓) = EY1:t�1⇠G✓ [
X
yt2Yr✓G✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)]
' 1
T
TX
t=1
X
yt2Yr✓G✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)
=
1
T
TX
t=1
X
yt2YG✓(yt|Y1:t�1)r✓ logG✓(yt|Y1:t�1) ·QG✓
D�(Y1:t�1, yt)
=
1
T
TX
t=1
Eyt⇠G✓(yt|Y1:t�1)[r✓ logG✓(yt|Y1:t�1) ·QG✓D�
(Y1:t�1, yt)],
✓ ✓ + ↵hr✓J(✓)
![Page 12: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/12.jpg)
OverallAlgorithm
![Page 13: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/13.jpg)
SequenceGeneratorModel
• RNNwithLSTMcells
[Hochreiter,S.,andSchmidhuber,J.1997.Longshort-termmemory.Neuralcomputation9(8):1735–1780.]
Shanghai is incredibly
is incrediblySoftmax samplingovervocabulary
?
![Page 14: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/14.jpg)
SequenceDiscriminatorModel
[Kim,Y.2014.Convolutionalneuralnetworksforsentenceclassification.EMNLP2014.]
![Page 15: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/15.jpg)
InconsistencyofEvaluationandUse
• Checkwhetheratruedataiswithahighmassdensityofthelearnedmodel• Approximatedby
Evaluation Use
• Checkwhetheramodel-generateddataisconsideredasrealaspossible• Morestraightforwardbutitishardorimpossibletodirectlycalculate
• Givenagenerator withacertaingeneralizationabilityG✓
max
✓
1
|D|X
x2D
[logG
✓
(x)]
Ex⇠ptrue(x)[logG✓
(x)] Ex⇠G✓(x)[log ptrue(x)]
ptrue(x)
![Page 16: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/16.jpg)
ExperimentsonSyntheticData• EvaluationmeasurewithOracle
• An oracle model (e.g. the randomly initialized LSTM)• Firstly, the oracle model produces some sequences astraining data for the generative model• Secondly the oracle model can be considered as thehuman observer to accurately evaluate the perceptualquality of the generative model
NLL
oracle
= �EY1:T⇠G✓
h TX
t=1
logGoracle
(yt|Y1:t�1
)
i
![Page 17: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/17.jpg)
ExperimentsonSyntheticData• EvaluationmeasurewithOracle
NLL
oracle
= �EY1:T⇠G✓
h TX
t=1
logGoracle
(yt|Y1:t�1
)
i
![Page 18: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/18.jpg)
ExperimentsonSyntheticData• The training strategy really matters.
![Page 19: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/19.jpg)
ExperimentsonReal-WorldData• Chinesepoemgeneration
• Obamapoliticalspeechtextgeneration
• Midimusicgeneration
![Page 20: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/20.jpg)
ExperimentsonReal-WorldData• Chinesepoemgeneration
南陌春风早,东邻去日斜。
紫陌追随日,青门相见时。
胡风不开花,四气多作雪。
山夜有雪寒,桂里逢客时。
此时人且饮,酒愁一节梦。
四面客归路,桂花开青竹。
Human Machine
![Page 21: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/21.jpg)
ObamaSpeechTextGeneration• i stoodheretodayi haveoneandmostimportantthingthatnotonviolencethroughoutthehorizonisOTHERSamericanfireandOTHERSbutweneedyouareastrongsource
• forthisbusinessleadershipwillremembernowi can’taffordtostartwithjustthewayoureuropean supportfortherightthingtoprotectthoseamericanstoryfromtheworldand
• i wanttoacknowledgeyouweregoingtobeanoutstandingjobtimesforstudentmedicaleducationandwarmtherepublicanswholikemytimesifhesaidisthatbroughtthe
• whenhewastoldofthisextraordinaryhonorthathewasthemosttrustedmaninamerica
• butwealsorememberandcelebratethejournalismthatwalter practicedastandardofhonestyandintegrityandresponsibilitytowhichsomanyofyouhavecommittedyourcareers.it'sastandardthat'salittlebithardertofindtoday
• i amhonoredtobeheretopaytributetothelifeandtimesofthemanwhochronicledourtime.
Human Machine
![Page 22: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/22.jpg)
Summary
• We proposed a sequence generation method, calledSeqGAN, toeffectivelytrainGenerativeAdversarialNetsfor discrete structured sequencesgenerationviapolicygradient.
• Design an experiment framework with oracle evaluationmetric to accurately evaluate the “perceptual quality”of model-generated sequences.
![Page 23: SeqGAN - Lantao Yu](https://reader034.fdocuments.in/reader034/viewer/2022051513/627eca26f44f073cef25039e/html5/thumbnails/23.jpg)
ThankYou
Lantao Yu,WeinanZhang,JunWang,YongYu.SeqGAN:SequenceGenerativeAdversarialNetswithPolicyGradient.AAAI2017.
Lantao Yu
†, Weinan Zhang
†, Jun Wang
‡, Yong Yu
††Shanghai Jiao Tong University,
‡University College London
Email: [email protected]