No Metrics Are Perfect: Adversarial REwardLearning for Visual Storytelling · 2020-02-24 · Visual...
Transcript of No Metrics Are Perfect: Adversarial REwardLearning for Visual Storytelling · 2020-02-24 · Visual...
NoMetricsArePerfect:AdversarialREward Learning
forVisualStorytelling
XinWang*,Wenhu Chen*,Yuan-FangWang,WilliamWang
1
2
Caption:Twoyoungkidswithbackpackssittingontheporch.
Image Captioning
VisualStorytelling
3
Story:Thebrotherdidnotwanttotalktohissister.Thesiblingsmadeup.Theystartedtotalkandsmile.Theirparentsshowedup.Theywerehappytoseethem.
Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappy toseethem.
Imagination
Story:Thebrother didnotwanttotalktohissister.Thesiblings madeup.Theystartedtotalkandsmile.Theirparents showedup.Theywerehappytoseethem.
Emotion Subjectiveness
4
Story#2:Thebrotherandsisterwerereadyforthefirstdayofschool.Theywereexcitedtogototheirfirstdayandmeetnewfriends.Theytoldtheirmomhowhappytheywere.Theysaidtheyweregoingtomakealotofnewfriends.Thentheygotupandgotreadytogetinthecar.
VisualStorytelling
5
Behavioralcloningmethods(e.g.MLE)arenotgoodenoughforvisualstorytelling
ReinforcementLearning
6
o Directlyoptimizetheexistingmetrics• BLEU,METEOR,ROUGE,CIDEr• Reduceexposurebias
ReinforcementLearning(RL)
Environment
RewardFunction
OptimalPolicy
Rennie 2017, “Self-critical Sequence Training for Image Captioning”
7
We had a great time to have a lot of the. They were to be a of the. They were to be in the. The and it were to be the. The, and it were to be the.
AverageMETEORscore:40.2(SOTAmodel:35.0)
8
I had a great time at the restaurant today. The food was delicious. I had a lot of food. I had a great time.
BLEU-4score:0
9
NoMetricsArePerfect!
Inverse Reinforcement Learning
10
InverseReinforcementLearning(IRL)
Environment
RewardFunction
OptimalPolicy
ReinforcementLearning(RL)
Environment
RewardFunction
OptimalPolicy
Reinforcement Learning
AdversarialREward Learning(AREL)
11
Environment
AdversarialObjective PolicyModelRewardModel
Reward
GeneratedStory
ReferenceStory
RL
InverseRL
PolicyModel𝜋"
12
CNN
Mybrotherrecentlygraduatedcollege.
Itwasaformalcapandgownevent.
Mymomanddadattended.
Later,myauntandgrandmashowedup.
Whentheeventwasoverheevengotcongratulatedbythemascot.
Encoder Decoder
RewardModel𝑅$
13
Story Convolution FClayerPooling
CNN
mymomanddad
attended.
<EOS>
Reward
+
Kim 2014, “Convolutional Neural Networks for Sentence Classification”
AssociatingReward withStory
Approximatedatadistribution Partitionfunction
Story
Optimalrewardfunction𝑅$∗(𝑊) isachievedwhen
LeCun et al. 2006, “A tutorial on energy-based learning”
Energy-basedmodels associateanenergyvalue𝐸$(𝑥) withasample𝑥,modelingthedataasaBoltzmanndistribution
𝑝$ 𝑥 =exp(−𝐸$(𝑥))
𝑍
RewardFunctionRewardBoltzmannDistribution
ARELObjective
15
• TheobjectiveofRewardModel𝑅$:
Empiricaldistribution Policydistribution
RewardBoltzmanndistribution
• TheobjectiveofPolicyModel𝜋":
Therefore,wedefineanadversarialobjectivewithKL-divergence
RewardVisualization
16
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5
AutomaticEvaluation
17
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5
Method BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE CIDErSeq2seq (Huangetal.) - - - - 31.4 - -HierAttRNN (Yuetal.) - - 21.0 - 34.1 29.5 7.5XE 62.3 38.2 22.5 13.7 34.8 29.7 8.7BLEU-RL 62.1 38.0 22.6 13.9 34.6 29.0 8.9METEOR-RL 68.1 35.0 15.4 6.8 40.2 30.0 1.2ROUGE-RL 58.1 18.5 1.6 0.0 27.0 33.8 0.0CIDEr-RL 61.9 37.8 22.5 13.8 34.9 29.7 8.1GAN 62.8 38.8 23.0 14.0 35.0 29.5 9.0AREL(ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5
Huang et al. 2016, “Visual Storytelling” Yu et al. 2017, “Hierarchically-Attentive RNN for Album Summarization and Storytelling”
HumanEvaluation
18
0% 10% 20% 30% 40% 50%
XE BLEU-RL CIDEr-RL GAN AREL
TuringTest
Win Unsure
-17.5 -13.7-26.1
-6.3
HumanEvaluation
19
Relevance:thestoryaccuratelydescribeswhatishappeninginthephotostreamandcoversthemainobjects.Expressiveness:coherence,grammaticallyandsemanticallycorrect,norepetition,expressivelanguagestyle.Concreteness:thestoryshouldnarrateconcretelywhatisintheimagesratherthangivingverygeneraldescriptions.
PairwiseComparison
20
XE-ss Wetookatriptothemountains.
Thereweremanydifferentkindsofdifferentkinds.
Wehadagreattime.
Hewasagreattime.
Itwasabeautifulday.XE-ss Wetookatripto
themountains.
Thereweremanydifferentkindsofdifferentkinds.
Wehadagreattime.
Hewasagreattime.
Itwasabeautifulday.
ARELThefamilydecidedtotakeatriptothecountryside.
Thereweresomanydifferentkindsofthingstosee.
Thefamilydecidedtogoonahike. Ihadagreattime.
Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.
Human-createdStory
Wewentonahikeyesterday.
Therewerealotofstrangeplantsthere.
Ihadagreattime.Wedrankalotofwaterwhilewewerehiking.
Theviewwasspectacular.
XE-ss Wetookatriptothemountains.
Thereweremanydifferentkindsofdifferentkinds.
Wehadagreattime.
Hewasagreattime.
Itwasabeautifulday.
ARELThefamilydecidedtotakeatriptothecountryside.
Thereweresomanydifferentkindsofthingstosee.
Thefamilydecidedtogoonahike. Ihadagreattime.
Attheendoftheday,wewereabletotakeapictureofthebeautifulscenery.
Takeaway
21
o Generatingandevaluatingstoriesarebothchallengingduetothecomplicatednatureofstories
o Noexistingmetricsareperfectforeithertrainingortestingo ARELisabetterlearningframeworkforvisualstorytelling
§ Canbeappliedtoothergenerationtaskso Ourapproachismodel-agnostic
§ Advancedmodelsà betterperformance
Thanks!
22
Paper:https://arxiv.org/abs/1804.09160Code: https://github.com/littlekobe/AREL