Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels,...
Transcript of Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels,...
![Page 1: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/1.jpg)
AttentionDLAI– MARTAR.COSTA-JUSSÀ
SLIDESADAPTEDFROMGRAHAMNEUBIG’SLECTURES
![Page 2: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/2.jpg)
Whatadvancementsexciteyoumostinthefield?Iamveryexcitedbytherecentlyintroducedattentionmodels,duetotheirsimplicityandduetothefactthattheyworksowell.Althoughthesemodelsarenew,Ihavenodoubtthattheyareheretostay,andthattheywillplayaveryimportantroleinthefutureofdeeplearning.
ILYASUTSKEVER, RESEARCHDIRECTORANDCOFUNDEROFOPENAI
2
![Page 3: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/3.jpg)
Outline1.Sequencemodeling&Sequence-to-sequencemodels[WRAP-UPFROMPREVIOUSRNN’sSESSION]
2.Attention-basedmechanism
3.Attentionvarieties
4.AttentionImprovements
5.Applications
6.“Attentionisallyouneed”
7.Summary
3
![Page 4: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/4.jpg)
SequencemodelingModeltheprobabilityofsequencesofwords
Frompreviouslecture…wemodelsequences
ithRNNs
p(I’m) p(fine|I’m) p(.|fine) EOS
I’m fine .<s>
4
![Page 5: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/5.jpg)
Sequence-to-sequencemodels
how are you ?
Cómo estás EOS
encoder decoder
¿ Cómo estás
?
?
¿
<s>
THOUGHT/CONTEXT
VECTOR
5
![Page 6: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/6.jpg)
Anyproblemwiththesemodels?
6
![Page 7: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/7.jpg)
7
![Page 8: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/8.jpg)
2.Attention-basedmechanism
8
![Page 9: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/9.jpg)
MotivationinthecaseofMT
9
![Page 10: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/10.jpg)
MotivationinthecaseofMT
10
![Page 11: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/11.jpg)
Attention
encoder
decoder
+
Attention allows to use multiple vectors, based onthe length of the input
11
![Page 12: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/12.jpg)
AttentionKeyIdeas•Encodeeachwordintheinputandoutputsentenceintoavector
•Whendecoding,performalinearcombinationofthesevectors,weightedby“attentionweights”
•Usethiscombinationinpickingthenextword
12
![Page 13: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/13.jpg)
AttentioncomputationI•Use“query”vector(decoderstate)and“key”vectors(allencoderstates)
•Foreachquery-keypair,calculateweight
•Normalizetoaddtooneusingsoftmax
Query Vector
Key Vectors
a1=2.1 a2=-0.1 a3=0.3 a4=-1.0
softmax
a1=0.5 a2=0.3 a3=0.1 a4=0.1
13
![Page 14: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/14.jpg)
AttentioncomputationII• Combinetogethervaluevectors(usuallyencoderstates,likekeyvectors)bytakingtheweightedsum
Value Vectors
a1=0.5 a2=0.3 a3=0.1 a4=0.1* * * *
14
![Page 15: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/15.jpg)
AttentionScoreFunctionsqisthequeryandkisthekey
Reference
Multi-layerPerceptron
𝑎 𝑞, 𝑘 = tanh(𝒲- 𝑞, 𝑘 ) Flexible,oftenverygoodwithlargedata
Bahdanau etal.,2015
Bilinear 𝑎 𝑞, 𝑘 = 𝑞/𝒲𝑘 Luongetal2015
DotProduct 𝑎 𝑞, 𝑘 = 𝑞/𝑘 Noparameters!Butrequiressizestobethesame
Luongetal.2015
ScaledDotProduct𝑎 𝑞, 𝑘 =
𝑞/𝑘|𝑘|�
Scalebysizeofthevector Vaswani etal.2017
15
![Page 16: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/16.jpg)
AttentionIntegration
16
![Page 17: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/17.jpg)
AttentionIntegration
17
![Page 18: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/18.jpg)
3.AttentionVarieties
18
![Page 19: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/19.jpg)
HardAttention*Insteadofasoftinterpolation,makeazero-onedecisionaboutwheretoattend(Xuetal.2015)
19
![Page 20: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/20.jpg)
MonotonicAttentionThisapproach"softly"preventsthemodelfromassigningattentionprobabilitybeforewhereitattendedataprevioustimestep bytakingintoaccounttheattentionattheprevioustimestep.
20
ENCODER STATE E
![Page 21: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/21.jpg)
Intra-Attention/Self- AttentionEachelementinthesentenceattendstootherelementsfromtheSAMEsentenceà contextsensitiveencodings!
21
![Page 22: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/22.jpg)
MultipleSourcesAttendtomultiplesentences(Zoph etal.,2015)
Attendtoasentenceandanimage(Huangetal.2016)
22
![Page 23: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/23.jpg)
Multi-headedAttentionIMultipleattention“heads”focusondifferentpartsofthesentence
𝑎 𝑞, 𝑘 =𝑞/𝑘|𝑘|�
23
![Page 24: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/24.jpg)
Multi-headedAttentionIIMultipleattention“heads”focusondifferentpartsofthesentence
E.g.Multipleindependentlylearnedheads(Vaswani etal.2017)
𝑎 𝑞, 𝑘 =𝑞/𝑘|𝑘|�
24
![Page 25: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/25.jpg)
4.ImprovementsinAttentionINTHECONTEXTOFMT
25
![Page 26: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/26.jpg)
CoverageProblem:Neuralmodelstendstodroporrepeatcontent
InMT,
1.Over-translation:somewordsareunnecessarilytranslatedformultipletimes;
2.Under-translation:somewordsaremistakenlyuntranslated.
SRC:Señor Presidente,abre lasesión.
TRG:Mr PresidentMr PresidentMr President.
Solution:Modelhowmanytimeswordshavebeencoverede.g.maintainingacoveragevectortokeeptrackoftheattentionhistory(Tu etal.,2016)
26
![Page 27: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/27.jpg)
IncorporatingMarkovPropertiesIntuition:Attentionfromlasttimetendstobecorrelatedwithattentionthistime
Approach:Addinformationaboutthelastattentionwhenmakingthenextdecision
27
![Page 28: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/28.jpg)
BidirectionalTraining-Background:Establishedthatforlatentvariabletranslationmodelsthealignmentsimproveifbothdirectionalmodelsarecombined(koehn etal,2005)
-Approach:jointtrainingoftwodirectionalmodels
28
![Page 29: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/29.jpg)
SupervisedTrainingSometimeswecanget“goldstandard”alignmentsa–priori◦ Manualalignments◦ Pre-trainedwithstrongalignmentmodel
Trainthemodeltomatchthesestrongalignments
29
![Page 30: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/30.jpg)
5.Applications
30
![Page 31: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/31.jpg)
Chatbotsacomputerprogramthatconductsaconversation
Human: what is your job Enc-dec: i’m a lawyer Human: what do you do ?Enc-dec: i’m a doctor .
what is your job
I’m a EOS
<s> I’m a
lawyer
lawyer
+attention
31
![Page 32: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/32.jpg)
NaturalLanguageInference
32
![Page 33: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/33.jpg)
OtherNLPTasksText summarization: process of shortening a text document withsoftware to create a summary with the major points of the originaldocument.Question Answering: automatically producing an answer to aquestion given a corresponding document.
Semantic Parsing: mapping natural language into a logical form thatcan be executed on a knowledge base and return an answer
Syntactic Parsing: process of analysing a string of symbols, either innatural language or in computer languages, conforming to the rulesof a formal grammar
33
![Page 34: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/34.jpg)
ImagecaptioningIdecoder
encoder A cat on the mata cat
<s> a
on the mat
cat on the
34
![Page 35: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/35.jpg)
ImageCaptioningII
35
![Page 36: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/36.jpg)
OtherComputerVisionTaskswithAttentionVisual Question Answering: given an image and a natural languagequestion about the image, the task is to provide an accurate naturallanguage answer.Video Caption Generation: attempts to generate a complete andnatural sentence, enriching the single label as in video classification,to capture the most informative dynamics in videos.
36
![Page 37: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/37.jpg)
Speechrecognition/translation
37
![Page 38: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/38.jpg)
6.“Attentionisallyouneed”SLIDESBASEDONHTTPS://RESEARCH.GOOGLEBLOG.COM/2017/08/TRANSFORMER-NOVEL-NEURAL-NETWORK.HTML
38
![Page 39: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/39.jpg)
MotivationSequentialnatureofRNNs-à difficulttotakeadvantageofmoderncomputingdevicessuchasTPUs(TensorProcessingUnits)
39
![Page 40: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/40.jpg)
Transformer
Iarrivedat thebankaftercrossingtheriver
40
![Page 41: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/41.jpg)
TransformerIDecoder
Encoder
41
![Page 42: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/42.jpg)
TransformerII
42
![Page 43: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/43.jpg)
Transformerresults
43
![Page 44: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/44.jpg)
Attentionweights
44
![Page 45: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/45.jpg)
Attentionweights
45
![Page 46: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/46.jpg)
7.Summary
46
![Page 47: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/47.jpg)
RNNsandAttentionRNNsareusedtomodelsequences
Attentionisusedtoenhancemodelinglongsequences
Versatilityofthesemodelsallowstoapplythemtoawiderangeofapplications
47
![Page 48: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/48.jpg)
ImplementationsofEncoder-DecoderLSTM CNN
48
![Page 49: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/49.jpg)
Attention-basedmechanismsSoftvsHard:softattentionweightsallpixels,hardattentioncropstheimageandforcesattentiononlyonthekeptpart.
GlobalvsLocal: aglobal approach whichalwaysattendstoallsourcewordsandalocalonethatonlylooksatasubsetofsourcewordsatatime.
IntravsExternal:intraattentioniswithintheencoder’sinputsentence,externalattentionisacrosssentences.
49
![Page 50: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/50.jpg)
Onelargeencoder-decoder•Text,speech,image…isallconverging toasignalparadigm?
•IfyouknowhowtobuildaneuralMTsystem,youmayeasilylearnhowtobuildaspeech-to-textrecognitionsystem...
•Oryoumaytrainthemtogethertoachievezero-shot AI.
*And other references on this research direction….
50
![Page 52: Attention - GitHub...Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs](https://reader034.fdocuments.in/reader034/viewer/2022042220/5ec67a2fdb0d1917dc626bf6/html5/thumbnails/52.jpg)
Quizz1.Markallstatementsthataretrue
A.Sequencemodelingonlyreferstolanguageapplications
B.Theattentionmechanismcanbeappliedtoanencoder-decoderarchitecture
C.Neuralmachinetranslationsystemsrequirerecurrentneuralnetworks
D.Ifwewanttohaveafixedrepresentation(thoughtvector),wecannotapplyattention-basedmechanisms
2.Giventhequeryvectorq=[],thekeyvector1k1=[]andthekeyvector2k2=[].
A.Whataretheattentionweights1&2computingthedotproduct?
B.Andwhencomputingthescaleddotproduct?
C.Towhatkeyvectorarewegivingmoreattention?
D.Whatistheadvantageofcomputingthescaleddotproduct?
52