Lecture 13: Machine Transla3on II - GitHub...

103
Lecture 13: Machine Transla 3on II Alan Ri8er (many slides from Greg Durrett)

Transcript of Lecture 13: Machine Transla3on II - GitHub...

Page 1: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Lecture13:MachineTransla3onII

AlanRi8er(many slides from Greg Durrett)

Page 2: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT

Page 3: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

LevelsofTransfer:VauquoisTriangle

Slidecredit:DanKlein‣ Issyntaxa“be8er”abstrac3onthanphrases?

Page 4: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

Page 5: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

Page 6: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

Page 7: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]DT→[the,le]

Page 8: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

DT→[the,le]

Page 9: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]

DT→[the,le]

Page 10: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]

DT→[the,le]NP NP

Page 11: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]

DT→[the,le]NP NP

DT1 NN3 JJ2DT1 NN3JJ2

Page 12: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]the yellow car

DT→[the,le]

la voiture jaune

NP NP

DT1 NN3 JJ2DT1 NN3JJ2

Page 13: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]the yellow car

‣ Assumesparallelsyntaxuptoreordering

DT→[the,le]

la voiture jaune

NP NP

DT1 NN3 JJ2DT1 NN3JJ2

Page 14: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT‣ Ratherthanusephrases,useasynchronouscontext-freegrammar:constructs“parallel”treesintwolanguagessimultaneously

NP→[DT1JJ2NN3;DT1NN3JJ2]

DT→[the,la]

NN→[car,voiture]

JJ→[yellow,jaune]the yellow car

‣ Assumesparallelsyntaxuptoreordering

DT→[the,le]

la voiture jaune

NP NP

DT1 NN3 JJ2DT1 NN3JJ2

‣ Transla3on=parsetheinputwith“half”thegrammar,readoffotherhalf

Page 15: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT

Slidecredit:DanKlein

Page 16: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT

Slidecredit:DanKlein

‣ Relaxthisbyusinglexicalizedrules,like“syntac3cphrases”

Page 17: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Syntac3cMT

Slidecredit:DanKlein

‣ Relaxthisbyusinglexicalizedrules,like“syntac3cphrases”

‣ LeadstoHUGEgrammars,parsingisslow

Page 18: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

NeuralMTDetails

Page 19: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Encoder-DecoderMT

Sutskeveretal.(2014)

‣ Sutskeverseq2seqpaper:firstmajorapplica3onofLSTMstoNLP

Page 20: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Encoder-DecoderMT

Sutskeveretal.(2014)

‣ Sutskeverseq2seqpaper:firstmajorapplica3onofLSTMstoNLP

‣ Basicencoder-decoderwithbeamsearch

Page 21: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Encoder-DecoderMT

Sutskeveretal.(2014)

‣ Sutskeverseq2seqpaper:firstmajorapplica3onofLSTMstoNLP

‣ Basicencoder-decoderwithbeamsearch

Page 22: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Encoder-DecoderMT

Sutskeveretal.(2014)‣ SOTA=37.0—notallthatcompe33ve…

‣ Sutskeverseq2seqpaper:firstmajorapplica3onofLSTMstoNLP

‣ Basicencoder-decoderwithbeamsearch

Page 23: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Encoder-DecoderMT

‣ Be8ermodelfromseq2seqlectures:encoder-decoderwitha8en3onandcopyingforrarewords

themoviewasgreat

h1 h2 h3 h4

<s>

h̄1

c1

distribu3onovervocab+copying

le

Page 24: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Page 25: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French‣ 12Msentencepairs

Page 26: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Classicphrase-basedsystem:~33BLEU,usesaddi3onaltarget-languagedata

‣ 12Msentencepairs

Page 27: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Classicphrase-basedsystem:~33BLEU,usesaddi3onaltarget-languagedata

RerankwithLSTMs:36.5BLEU(longlineofworkhere;Devlin+2014)

‣ 12Msentencepairs

Page 28: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Classicphrase-basedsystem:~33BLEU,usesaddi3onaltarget-languagedata

RerankwithLSTMs:36.5BLEU(longlineofworkhere;Devlin+2014)

Sutskever+(2014)seq2seqsingle:30.6BLEU

‣ 12Msentencepairs

Page 29: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Classicphrase-basedsystem:~33BLEU,usesaddi3onaltarget-languagedata

RerankwithLSTMs:36.5BLEU(longlineofworkhere;Devlin+2014)

Sutskever+(2014)seq2seqsingle:30.6BLEU

Sutskever+(2014)seq2seqensemble:34.8BLEU

‣ 12Msentencepairs

Page 30: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-French

Classicphrase-basedsystem:~33BLEU,usesaddi3onaltarget-languagedata

RerankwithLSTMs:36.5BLEU(longlineofworkhere;Devlin+2014)

Sutskever+(2014)seq2seqsingle:30.6BLEU

Sutskever+(2014)seq2seqensemble:34.8BLEU

‣ ButEnglish-Frenchisareallyeasylanguagepairandthere’stonsofdataforit!Doesthisapproachworkforanythingharder?

Luong+(2015)seq2seqensemblewitha8en3onandrarewordhandling:37.5BLEU

‣ 12Msentencepairs

Page 31: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-German

‣ NotnearlyasgoodinabsoluteBLEU,butnotreallycomparableacrosslanguages

Classicphrase-basedsystem:20.7BLEU

Luong+(2014)seq2seq:14BLEU

Luong+(2015)seq2seqensemblewithrarewordhandling:23.0BLEU

‣ 4.5Msentencepairs

Page 32: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Results:WMTEnglish-German

‣ NotnearlyasgoodinabsoluteBLEU,butnotreallycomparableacrosslanguages

Classicphrase-basedsystem:20.7BLEU

Luong+(2014)seq2seq:14BLEU

‣ French,Spanish=easiestGerman,Czech=harderJapanese,Russian=hard(gramma3callydifferent,lotsofmorphology…)

Luong+(2015)seq2seqensemblewithrarewordhandling:23.0BLEU

‣ 4.5Msentencepairs

Page 33: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

MTExamples

Luongetal.(2015)

‣ NMTsystemscanhallucinatewords,especiallywhennotusinga8en3on—phrase-baseddoesn’tdothis

‣ best=witha8en3on,base=noa8en3on

Page 34: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

MTExamples

Luongetal.(2015)

‣ best=witha8en3on,base=noa8en3on

Page 35: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Zhangetal.(2017)

‣ NMTcanrepeatitselfifitgetsconfused(pHorpH)

‣ Phrase-basedMTosengetschunksright,mayhavemoresubtleungramma3cali3es

MTExamples

Page 36: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:WordPieceModels

‣ UseHuffmanencodingonacorpus,keepmostcommonk(~10,000)charactersequencesforsourceandtarget

Input:_the_ecotax_portico_in_Pont-de-Buis…

Output:_le_portique_écotaxe_de_Pont-de-Buis

Wuetal.(2016)

Page 37: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:WordPieceModels

‣ UseHuffmanencodingonacorpus,keepmostcommonk(~10,000)charactersequencesforsourceandtarget

‣ Capturescommonwordsandpartsofrarewords

Input:_the_ecotax_portico_in_Pont-de-Buis…

Output:_le_portique_écotaxe_de_Pont-de-Buis

Wuetal.(2016)

Page 38: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:WordPieceModels

‣ UseHuffmanencodingonacorpus,keepmostcommonk(~10,000)charactersequencesforsourceandtarget

‣ Capturescommonwordsandpartsofrarewords

Input:_the_ecotax_portico_in_Pont-de-Buis…

Output:_le_portique_écotaxe_de_Pont-de-Buis

‣ Subwordstructuremaymakeiteasiertotranslate

Wuetal.(2016)

Page 39: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:WordPieceModels

‣ UseHuffmanencodingonacorpus,keepmostcommonk(~10,000)charactersequencesforsourceandtarget

‣ Capturescommonwordsandpartsofrarewords

Input:_the_ecotax_portico_in_Pont-de-Buis…

Output:_le_portique_écotaxe_de_Pont-de-Buis

‣ Subwordstructuremaymakeiteasiertotranslate

‣Modelbalancestransla3ngandtranslitera3ngwithoutexplicitswitchingWuetal.(2016)

Page 40: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

Sennrichetal.(2016)

‣ Input:adic3onaryofwordsrepresentedascharacters‣ Simplerprocedure,basedonlyonthedic3onary

Page 41: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

Sennrichetal.(2016)

‣ Input:adic3onaryofwordsrepresentedascharacters‣ Simplerprocedure,basedonlyonthedic3onary

Page 42: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

‣ Countbigramcharactercooccurrences

Sennrichetal.(2016)

‣ Input:adic3onaryofwordsrepresentedascharacters‣ Simplerprocedure,basedonlyonthedic3onary

Page 43: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

‣ Countbigramcharactercooccurrences

Sennrichetal.(2016)

‣Mergethemostfrequentpairofadjacentcharacters

‣ Input:adic3onaryofwordsrepresentedascharacters‣ Simplerprocedure,basedonlyonthedic3onary

Page 44: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

‣ Countbigramcharactercooccurrences

Sennrichetal.(2016)

‣Mergethemostfrequentpairofadjacentcharacters

‣ Input:adic3onaryofwordsrepresentedascharacters

‣ Finalsize=ini3alvocab+nummerges.Osendo10k-30kmerges

‣ Simplerprocedure,basedonlyonthedic3onary

Page 45: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

RareWords:BytePairEncoding

‣ Countbigramcharactercooccurrences

Sennrichetal.(2016)

‣Mergethemostfrequentpairofadjacentcharacters

‣ Input:adic3onaryofwordsrepresentedascharacters

‣ Finalsize=ini3alvocab+nummerges.Osendo10k-30kmerges

‣ Simplerprocedure,basedonlyonthedic3onary

‣MostSOTANMTsystemsusethisonbothsource+target

Page 46: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

‣ 8-layerLSTMencoder-decoderwitha8en3on,wordpiecevocabularyof8k-32k

Page 47: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

Page 48: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

Luong+(2015)seq2seqensemblewithrarewordhandling:37.5BLEUGoogle’s32kwordpieces:38.95BLEU

Google’sphrase-basedsystem:37.0BLEU

English-French:

Page 49: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

Luong+(2015)seq2seqensemblewithrarewordhandling:37.5BLEUGoogle’s32kwordpieces:38.95BLEU

Google’sphrase-basedsystem:37.0BLEU

English-French:

Luong+(2015)seq2seqensemblewithrarewordhandling:23.0BLEUGoogle’s32kwordpieces:24.2BLEU

Google’sphrase-basedsystem:20.7BLEU

English-German:

Page 50: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

HumanEvalua3on(En-Es)

Wuetal.(2016)

‣ Similartohuman-level performanceonEnglish-Spanish

Page 51: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

Page 52: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

GenderiscorrectinGNMTbutnotinPBMT

Page 53: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

GenderiscorrectinGNMTbutnotinPBMT

“sled”

Page 54: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

GenderiscorrectinGNMTbutnotinPBMT

“sled”

Page 55: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Google’sNMTSystem

Wuetal.(2016)

GenderiscorrectinGNMTbutnotinPBMT

“sled”“walker”

Page 56: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

Sennrichetal.(2015)

Page 57: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

Sennrichetal.(2015)

‣ Approach1:forcethesystemtogenerateT’astargetsfromnullinputs

Page 58: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

Sennrichetal.(2015)

s1,t1

[null],t’1[null],t’2

s2,t2…

‣ Approach1:forcethesystemtogenerateT’astargetsfromnullinputs

Page 59: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

Sennrichetal.(2015)

s1,t1

[null],t’1[null],t’2

s2,t2…

‣ Approach1:forcethesystemtogenerateT’astargetsfromnullinputs

‣ Approach2:generatesynthe3csourceswithaT->Smachinetransla3onsystem(backtransla3on)

Page 60: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on‣ ClassicalMTmethodsusedabilingualcorpusofsentencesB=(S,T)andalargemonolingualcorpusT’totrainalanguagemodel.CanneuralMTdothesame?

Sennrichetal.(2015)

s1,t1

[null],t’1[null],t’2

s2,t2…

‣ Approach1:forcethesystemtogenerateT’astargetsfromnullinputs

‣ Approach2:generatesynthe3csourceswithaT->Smachinetransla3onsystem(backtransla3on)

s1,t1

MT(t’1),t’1

s2,t2…

…MT(t’2),t’2

Page 61: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Backtransla3on

Sennrichetal.(2015)

‣ parallelsynth:backtranslatetrainingdata;makesaddi3onalnoisysourcesentenceswhichcouldbeuseful

‣ Gigaword:largemonolingualEnglishcorpus

Page 62: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedCNNsforMT

Page 63: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

Strubelletal.(2017)

Page 64: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

Strubelletal.(2017)

Page 65: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

w=2,d=2:gapinthefilter

Strubelletal.(2017)

Page 66: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

w=2,d=2:gapinthefilter

‣ Canchainsuccessivedilatedconvolu3onstogethertogetawiderecep3vefield(seealotofthesentence)

Strubelletal.(2017)

Page 67: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

w=2,d=2:gapinthefilter

‣ Canchainsuccessivedilatedconvolu3onstogethertogetawiderecep3vefield(seealotofthesentence)

Strubelletal.(2017)

w=3,d=1

w=3,d=2

w=3,d=4

Page 68: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

DilatedConvolu3ons‣ Standardconvolu3on:looksateverytokenunderthefilter‣ Dilatedconvolu3onwithgapd:looksateverydthtoken

w=2,d=2:gapinthefilter

‣ Canchainsuccessivedilatedconvolu3onstogethertogetawiderecep3vefield(seealotofthesentence)

Strubelletal.(2017)

w=3,d=1

w=3,d=2

w=3,d=4

‣ Topnodesseelotsofthesentence,butwithdifferentprocessing

Page 69: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

CNNsforMachineTransla3on

Kalchbrenneretal.(2016)

‣ “ByteNet”:operatesovercharacters(bytes)

Page 70: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

CNNsforMachineTransla3on

Kalchbrenneretal.(2016)

‣ “ByteNet”:operatesovercharacters(bytes)‣ Encodesourcesequencew/dilatedconvolu3ons

Page 71: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

CNNsforMachineTransla3on

Kalchbrenneretal.(2016)

‣ “ByteNet”:operatesovercharacters(bytes)‣ Encodesourcesequencew/dilatedconvolu3ons

‣ Predictnthtargetcharacterbylookingatthenthposi3oninthesourceandadilatedconvolu3onoverthen-1targettokenssofar

Page 72: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

CNNsforMachineTransla3on

Kalchbrenneretal.(2016)

‣ “ByteNet”:operatesovercharacters(bytes)‣ Encodesourcesequencew/dilatedconvolu3ons

‣ Predictnthtargetcharacterbylookingatthenthposi3oninthesourceandadilatedconvolu3onoverthen-1targettokenssofar

‣ Todealwithdivergentlengths,tnactuallylooksatsnαwhereαisaheuris3cally-chosenparameter

Page 73: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

CNNsforMachineTransla3on

Kalchbrenneretal.(2016)

‣ “ByteNet”:operatesovercharacters(bytes)‣ Encodesourcesequencew/dilatedconvolu3ons

‣ Predictnthtargetcharacterbylookingatthenthposi3oninthesourceandadilatedconvolu3onoverthen-1targettokenssofar

‣ Todealwithdivergentlengths,tnactuallylooksatsnαwhereαisaheuris3cally-chosenparameter

‣ Assumesmostlymonotonictransla3on

Page 74: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Compare:CNNsvs.LSTMs

Kalchbrenneretal.(2016)

Page 75: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Compare:CNNsvs.LSTMs

Kalchbrenneretal.(2016)

<s>

h̄1

c1

‣ LSTM:looksatpreviousword+hiddenstate,a8en3onoverinput

Page 76: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Compare:CNNsvs.LSTMs

Kalchbrenneretal.(2016)

<s>

h̄1

c1

‣ LSTM:looksatpreviousword+hiddenstate,a8en3onoverinput‣ CNN:sourceencodingatthis

posi3ongivesus“a8en3on”,targetencodinggivesusdecodercontext

Page 77: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

A8en3onfromCNN

Kalchbrenneretal.(2016)

‣Modelischaracter-level,thisvisualiza3onshowswhichwords’scharactersimpacttheconvolu3onalencodingthemost

‣ Largelymonotonicbutdoesconsultotherinforma3on

Page 78: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

AdvantagesofCNNs

Kalchbrenneretal.(2016)

‣ LSTMwitha8en3onisquadra3c:computea8en3onoverthewholeinputforeachdecodedtoken

Page 79: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

AdvantagesofCNNs

Kalchbrenneretal.(2016)

‣ LSTMwitha8en3onisquadra3c:computea8en3onoverthewholeinputforeachdecodedtoken

‣ CNNislinear!

Page 80: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

AdvantagesofCNNs

Kalchbrenneretal.(2016)

‣ LSTMwitha8en3onisquadra3c:computea8en3onoverthewholeinputforeachdecodedtoken

‣ CNNislinear!

‣ CNNisshallowertooinprinciplebuttheconvlayersareverysophis3cated(3layerseach)

Page 81: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

English-GermanMTResults

Kalchbrenneretal.(2016)

Page 82: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

TransformersforMT

Page 83: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

Page 84: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

x4

Page 85: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

x4

Page 86: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

x4

x04

Page 87: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

x4

x04

scalar↵i,j = softmax(x>i xj)

Page 88: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

x4

x04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x>i xj)

x0i =

nX

j=1

↵i,jxj

Page 89: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

‣Mul3ple“heads”analogoustodifferentconvolu3onalfilters.UseparametersWkandVktogetdifferenta8en3onvalues+transformvectors

x4

x04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x>i xj)

x0i =

nX

j=1

↵i,jxj

Page 90: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

‣Mul3ple“heads”analogoustodifferentconvolu3onalfilters.UseparametersWkandVktogetdifferenta8en3onvalues+transformvectors

x4

x04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x>i xj)

x0i =

nX

j=1

↵i,jxj

↵k,i,j = softmax(x>i Wkxj)

Page 91: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

‣Mul3ple“heads”analogoustodifferentconvolu3onalfilters.UseparametersWkandVktogetdifferenta8en3onvalues+transformvectors

x4

x04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x>i xj)

x0i =

nX

j=1

↵i,jxj

↵k,i,j = softmax(x>i Wkxj) x0

k,i =nX

j=1

↵k,i,jVkxj

Page 92: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Self-A8en3on

Vaswanietal.(2017)

themoviewasgreat

‣ Eachwordformsa“query”whichthencomputesa8en3onovereachword

‣Mul3ple“heads”analogoustodifferentconvolu3onalfilters.UseparametersWkandVktogetdifferenta8en3onvalues+transformvectors

x4

x04

scalar

vector=sumofscalar*vector

↵i,j = softmax(x>i xj)

x0i =

nX

j=1

↵i,jxj

↵k,i,j = softmax(x>i Wkxj) x0

k,i =nX

j=1

↵k,i,jVkxj

Page 93: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

Page 94: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

‣ Posi3onalencoding:augmentwordembeddingwithposi3onembeddings,eachdimisasinewaveofadifferentfrequency.Closerpoints=higherdotproducts

Page 95: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

themoviewasgreat

‣ Posi3onalencoding:augmentwordembeddingwithposi3onembeddings,eachdimisasinewaveofadifferentfrequency.Closerpoints=higherdotproducts

Page 96: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

Page 97: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

‣ Encoderanddecoderarebothtransformers

Page 98: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

‣ Encoderanddecoderarebothtransformers

‣ Decoderconsumesthepreviousgeneratedtoken(anda8endstoinput),buthasnorecurrentstate

Page 99: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

Page 100: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Transformers

Vaswanietal.(2017)

‣ Big=6layers,1000dimforeachtoken,16heads,base=6layers+otherparamshalved

Page 101: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Visualiza3on

Vaswanietal.(2017)

Page 102: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Visualiza3on

Vaswanietal.(2017)

Page 103: Lecture 13: Machine Transla3on II - GitHub Pagesaritter.github.io/courses/5525_slides_v2/lec13-mt2.pdf · Results: WMT English-French Classic phrase-based system: ~33 BLEU, uses addi3onal

Takeaways

‣ CanbuildMTsystemswithLSTMencoder-decoders,CNNs,ortransformers

‣Wordpiece/bytepairmodelsarereallyeffec3veandeasytouse

‣ Stateoftheartsystemsarege{ngpre8ygood,butlotsofchallengesremain,especiallyforlow-resourcese{ngs