Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo...
Transcript of Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo...
![Page 1: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/1.jpg)
CS839:ProbabilisticGraphicalModels
Lecture22:TheAttentionMechanismTheoRekatsinas
1
![Page 2: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/2.jpg)
WhyAttention?
2
• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?
• Thecatisblack->Lechatest noir
![Page 3: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/3.jpg)
WhyAttention?
3
• Considermachinetranslation:• Weneedtopayattentiontothewordwearecurrentlytranslating.Istheentiresequenceneededascontext?
• Thecatisblack->Lechatest noir
• RNNsarethede-factostandardformachinetranslation• Problem:translationreliesonreadingacompletesentenceandcompressesallinformationintoafixed-lengthvectorasentencewithhundredsofwordsrepresentedbyseveralwordswillsurelyleadtoinformationloss,inadequatetranslation,etc.
• Long-rangedependenciesaretricky.
![Page 4: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/4.jpg)
Basicencoder- decoder
4
![Page 5: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/5.jpg)
SoftAttentionforTranslation
5
![Page 6: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/6.jpg)
SoftAttentionforTranslation
“Ilovecoffee”->“Megustaelcafé”
Distributionoverinputwords
Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015
SoftAttentionforTranslation
![Page 7: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/7.jpg)
SoftAttentionforTranslation
“Ilovecoffee”->“Megustaelcafé”
Distributionoverinputwords
Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015
SoftAttentionforTranslation
![Page 8: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/8.jpg)
SoftAttentionforTranslation
“Ilovecoffee”->“Megustaelcafé”
Distributionoverinputwords
Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015
SoftAttentionforTranslation
![Page 9: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/9.jpg)
SoftAttentionforTranslation
“Ilovecoffee”->“Megustaelcafé”
Distributionoverinputwords
Bahdanauetal,“NeuralMachineTranslationbyJointlyLearningtoAlignandTranslate”,ICLR2015
SoftAttentionforTranslation
![Page 10: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/10.jpg)
SoftAttention
FromY.Bengio CVPR2015Tutorial
BidirectionalencoderRNN
DecoderRNN
AttentionModel
![Page 11: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/11.jpg)
SoftAttentionContextvector(inputtodecoder):
Mixtureweights:
Alignmentscore(howwelldoinputwordsnearjmatchoutputwordsatpositioni):
![Page 12: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/12.jpg)
SoftAttentionLuong,PhamandManning’sTranslationSystem(2015):
LuongandManningIWSLT2015
TranslationErrorRatevsHuman
![Page 13: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/13.jpg)
HardAttention
![Page 14: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/14.jpg)
MonotonicAttention
![Page 15: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/15.jpg)
GlobalAttention• Blue=encoder• Red=decoder• Attendtoacontextvector.• Decodercapturesglobalinformationnotonlytheinformationfromonehiddenstate.• Contextvectortakesallcell’soutputsasinputandcomputesaprobabilitydistributionforeachtokenthedecoderwantstogenerate
![Page 16: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/16.jpg)
LocalAttention
• Computeabestalignedpositionfirst• Thencomputeacontextvectorcenteredatthatposition
![Page 17: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/17.jpg)
RNNforCaptioning
CNN
Image:HxWx3
Features:D
h0
Hiddenstate:H
h1
y1
h2
y2
Firstword
Secondword
d1
Distributionovervocab
d2
RNNonlylooksatwholeimage,once
WhatiftheRNNlooksatdifferentpartsoftheimageateachtimestep?
![Page 18: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/18.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
SoftAttentionforCaptioning
![Page 19: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/19.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
SoftAttentionforCaptioning
![Page 20: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/20.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
DistributionoverLlocations
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
SoftAttentionforCaptioning
![Page 21: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/21.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
Weightedcombinationoffeatures
DistributionoverLlocations
z1Weightedfeatures:D
SoftAttentionforCaptioning
![Page 22: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/22.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx
3
Features:LxD
h0
a1
z1
Weightedcombinationoffeatures
h1
DistributionoverLlocations
Weightedfeatures:D y1
Firstword
SoftAttentionforCaptioning
![Page 23: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/23.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
z1
Weightedcombinationoffeatures
y1
h1
Firstword
DistributionoverLlocations
a2 d1
Weightedfeatures:D
Distributionovervocab
SoftAttentionforCaptioning
![Page 24: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/24.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
z1
Weightedcombinationoffeatures
y1
h1
Firstword
DistributionoverLlocations
a2 d1
z2Weightedfeatures:D
Distributionovervocab
SoftAttentionforCaptioning
![Page 25: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/25.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
z1
Weightedcombinationoffeatures
y1
h1
Firstword
DistributionoverLlocations
a2 d1
h2
z2 y2Weightedfeatures:D
Distributionovervocab
SoftAttentionforCaptioning
![Page 26: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/26.jpg)
SoftAttentionforCaptioning
CNN
Image:HxWx3
Features:LxD
h0
a1
z1
Weightedcombinationoffeatures
y1
h1
Firstword
DistributionoverLlocations
a2 d1
h2
a3 d2
z2 y2Weightedfeatures:D
Distributionovervocab
SoftAttentionforCaptioning
![Page 27: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/27.jpg)
SoftvsHardAttention
CNN
Image:HxWx3
Gridoffeatures(EachD-dimensional)
a b
c d
pa pb
pc pd
Distributionovergridlocations
pa+pb+pc+pc=1
FromRNN:
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
SoftvsHardAttention
![Page 28: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/28.jpg)
SoftvsHardAttention
CNN
Image:HxWx3
Gridoffeatures(EachD-dimensional)
a b
c d
pa pb
pc pd
Distributionovergridlocations
pa+pb+pc+pc=1
FromRNN:
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
Contextvectorz(D-dimensional)
SoftvsHardAttention
![Page 29: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/29.jpg)
SoftvsHardAttention
CNN
Image:HxWx3
Gridoffeatures(EachD-dimensional)
a b
c d
pa pb
pc pd
Distributionovergridlocations
pa+pb+pc+pc=1
FromRNN:
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
Contextvectorz(D-dimensional)
Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd
Derivativedz/dpisnice!Trainwithgradientdescent
SoftvsHardAttention
![Page 30: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/30.jpg)
SoftvsHardAttention
CNN
Image:HxWx3
Gridoffeatures(EachD-dimensional)
a b
c d
pa pb
pc pd
Distributionovergridlocations
pa+pb+pc+pc=1
FromRNN:
Xuetal,“Show,AttendandTell:NeuralImageCaptionGenerationwithVisualAttention”,ICML2015
Contextvectorz(D-dimensional)
Softattention:SummarizeALLlocationsz=paa+pbb +pcc +pdd
Derivativedz/dpisnice!Trainwithgradientdescent
Hardattention:SampleONElocation
accordingtop,z=thatvector
Withargmax,dz/dpiszeroalmosteverywhere…
Can’tusegradientdescent;needreinforcementlearning
SoftvsHardAttention
![Page 31: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/31.jpg)
Multi-headedAttention
![Page 32: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/32.jpg)
Attentionisallyouneed
![Page 33: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/33.jpg)
Attentiontricks
![Page 34: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/34.jpg)
SoftvsHardAttentionAttentionTakeawaysPerformance:• Attentionmodelscanimprove
accuracy andreducecomputationatthesametime.
Complexity:• Therearemanydesignchoices.• Thosechoiceshaveabigeffectonperformance.• Ensemblinghasunusuallylargebenefits.• Simplifywherepossible!
![Page 35: Lecture 22: The Attention Mechanism - GitHub Pages€¦ · Lecture 22: The Attention Mechanism Theo Rekatsinas 1. Why Attention? 2 •Consider machine translation: •We need to pay](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273e1f9b5bc92d9b105426/html5/thumbnails/35.jpg)
SoftvsHardAttentionAttentionTakeawaysExplainability:• Attentionmodelsencodeexplanations.• Bothlocusandtrajectoryhelp
understandwhat’sgoingon.
Hardvs.Soft:• Softmodelsareeasiertotrain,hardmodelsrequirereinforcementlearning.
• Theycanbecombined,asinLuongetal.