On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra ›...

103
On Learning Form and Meaning in Neural Machine Translation Models Yonatan Belinkov May 2017 With: Nadir Durrani, Hassan Sajjad, Fahim Dalvi, Lluis Marques, James Glass

Transcript of On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra ›...

Page 1: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

OnLearningFormandMeaninginNeuralMachineTranslationModels

YonatanBelinkovMay2017

With:NadirDurrani,HassanSajjad,Fahim Dalvi,Lluis Marques,JamesGlass

Page 2: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Motivation

• Neuralmachinetranslation(NMT)obtainsstate-of-the-artresults• Elegantandsimpleend-to-endarchitecture

Page 3: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Motivation

• Neuralmachinetranslation(NMT)obtainsstate-of-the-artresults• Elegantandsimpleend-to-endarchitecture

• However,NMTmodelsaredifficulttointerpret;whatdotheylearnaboutthesourceandtargetlanguages?

Page 4: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Motivation

• Neuralmachinetranslation(NMT)obtainsstate-of-the-artresults• Elegantandsimpleend-to-endarchitecture

• However,NMTmodelsaredifficulttointerpret;whatdotheylearnaboutthesourceandtargetlanguages?

• Recentinterestinthecommunity(e.g.Shi+16onsyntax)

Page 5: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Motivation

• Thiswork:analyzingmorphology(andsemantics)inNMT

Page 6: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

TranslationasDecoding

• WarrenWeavertoNorbertWiener,March4,1947:

Alsoknowingnothingofficialabout,buthavingguessedandinferredconsiderableabout,powerfulnewmechanizedmethodsincryptography-methodswhichIbelievesucceedevenwhenonedoesnotknowwhatlanguagehasbeencoded- onenaturallywondersiftheproblemoftranslationcouldconceivablybetreatedasaproblemincryptography.WhenIlookatanarticleinRussian,Isay"ThisisreallywritteninEnglish,butithasbeencodedinsomestrangesymbols.Iwillnowproceedtodecode.”

Page 7: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

BriefHistoryofMachineTranslation

• 1947:InitialideasofMT(Weaver)• 1950s:FirstMTsystems• 1960s:High-qualityMTfails,cutingovernmentfunding• 1970s-1980s:Rule-basedsystems,interlinguaideas• 1990s:StatisticalMT,IBMalignmentmodels• 2000s:Phrase-basedMT,open-sourcetoolkits• 2014-2015:NeuralMT:seq2seq+attention

Page 8: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

Page 9: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

Page 10: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

Page 11: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

• – Translationmodel• – Languagemodel

Page 12: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

• – Translationmodel• – Languagemodel

Marianodió una alabruja verde

Marydidnotslapthegreenwitch

bofetada

From:Jurafsky &Martin2009

Page 13: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

StatisticalMachineTranslation

• TranslateasourcesentenceF intoatargetsentenceE

• – Translationmodel• – Languagemodel

Marianodió una alabruja verde

Marydidnotslapthegreenwitch

bofetada

From:Jurafsky &Martin2009

Page 14: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

NeuralMachineTranslation

Encoder

Decoder

Inputtext

Translatedtext

Page 15: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

NeuralMachineTranslation

Page 16: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

NeuralMachineTranslation

• Encoder:

• Decoder:

• Loss:

Page 17: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

NeuralMachineTranslation

• Encoder:

• Decoder:

• Loss:

Sourcehiddenstate

Targethiddenstate

Summaryvector

Page 18: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Encoder-Decoder

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 19: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• RaymondMooney,June26,2016:

TheProblemwiththeEncoder-Decoder

“Youcan’tcramthemeaningofawhole%&!$#sentenceintoasingle$&!#*vector!”

Page 20: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AttentionMechanism

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 21: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AttentionMechanism

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 22: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AttentionMechanism

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 23: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Attentionassoftalignment

Marianodió una alabruja verde

Marydidnotslapthegreenwitch

bofetada

Phrase-basedMT

Page 24: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Attentionassoftalignment

Marianodió una alabruja verde

Marydidnotslapthegreenwitch

bofetadaMarianodió una alabruja verde

Marydidnotslapthegreenwitch

bofetada

Phrase-basedMTNeuralMT

Page 25: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ResearchQuestions

Page 26: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ResearchQuestions

• WhichpartsoftheNMTarchitecturecapturewordstructure?Whichcapturemeaning?• Whatisthedivisionoflaborbetweendifferentcomponents?• Howdodifferentwordrepresentationshelplearnbettermorphology?• Howdoesthetargetlanguageaffectthelearningofwordstructure?

Page 27: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Methodology

• Threestepprocedure:1. TrainaneuralMTsystem2. Extractfeaturerepresentationsusingtrainedthemodel3. Trainaclassifierusingextractedfeaturesandevaluateitonanextrinsictask

Page 28: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Methodology

• Threestepprocedure:1. TrainaneuralMTsystem2. Extractfeaturerepresentationsusingtrainedthemodel3. Trainaclassifierusingextractedfeaturesandevaluateitonanextrinsictask

• Assumption:performanceoftheclassifierreflectsqualityoftheNMTrepresentationsforthegiventask

Page 29: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Methodology

Page 30: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Methodology

Page 31: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Methodology

Page 32: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

PartA:Morphology

Page 33: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ExperimentalSetup

• Tasks• Part-of-speechtagging• Morphologicaltagging

• Languages• Arabic-,German-,French-,andCzech-English• Arabic-Hebrew(richandsimilar)• Arabic-German(richbutdifferent)

Page 34: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ExperimentalSetup

• MTdata:TEDtalks• Annotateddata• Goldtags• Predictedtags

Page 35: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Encoder

Page 36: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofWordRepresentation

running running

Wordembedding CharacterCNN

Page 37: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofWordRepresentation

POSAccuracy BLEUWord Char Word Char

Ar-En

Ar-He

De-En

Fr-En

Cz-En

Page 38: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofWordRepresentation

POSAccuracy BLEUWord Char Word Char

Ar-En 89.62 95.35 24.7 28.4

Ar-He 88.33 94.66 9.9 10.7

De-En 93.54 94.63 29.6 30.4

Fr-En 94.61 95.55 37.8 38.8

Cz-En 75.71 79.10 23.2 25.4

Page 39: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofWordRepresentation

POSAccuracy BLEUWord Char Word Char

Ar-En 89.62 95.35 24.7 28.4

Ar-He 88.33 94.66 9.9 10.7

De-En 93.54 94.63 29.6 30.4

Fr-En 94.61 95.55 37.8 38.8

Cz-En 75.71 79.10 23.2 25.4

• Character-basedmodelsgeneratebetterrepresentationsforPOStagging

Page 40: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• Especiallywithrichermorphologicalsystems

EffectofWordRepresentation

POSAccuracy BLEUWord Char Word Char

Ar-En 89.62 95.35 24.7 28.4

Ar-He 88.33 94.66 9.9 10.7

De-En 93.54 94.63 29.6 30.4

Fr-En 94.61 95.55 37.8 38.8

Cz-En 75.71 79.10 23.2 25.4

Page 41: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofWordRepresentation

POSAccuracy BLEUWord Char Word Char

Ar-En 89.62 95.35 24.7 28.4

Ar-He 88.33 94.66 9.9 10.7

De-En 93.54 94.63 29.6 30.4

Fr-En 94.61 95.55 37.8 38.8

Cz-En 75.71 79.10 23.2 25.4

• Character-basedmodelsimprovetranslationquality

Page 42: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ImpactofWordFrequency

Page 43: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ImpactofWordFrequency

Page 44: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ImpactofTagFrequency

Page 45: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ComparingSpecificTagsWord-based Char-based

Page 46: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ComparingSpecificTags

NN,NNP

DetDet

Word-based Char-based

Page 47: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

• NMTmodelscanbeverydeep• GoogleTranslate:8encoder/decoderlayers• Zhou+2016:16layers

Page 48: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

• NMTmodelscanbeverydeep• GoogleTranslate:8encoder/decoderlayers• Zhou+2016:16layers

• Whatkindofinformationislearnedateach?

Page 49: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

• NMTmodelscanbeverydeep• GoogleTranslate:8encoder/decoderlayers• Zhou+2016:16layers

• Whatkindofinformationislearnedateach?• Weanalyzeda2-layerencoder• Extractrepresentationsfromdifferentlayersfortrainingtheclassifier

Page 50: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

Page 51: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

Page 52: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

• Layer1>Layer2>Layer0• Butdeepermodelstranslatebetter

Page 53: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofEncoderDepth

• Islayer2learningmoreaboutsemantics?Moreonthatlater…

Page 54: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• Howdoesthetargetlanguageaffectthelearnedsourcelanguagerepresentations?

Page 55: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• Howdoesthetargetlanguageaffectthelearnedsourcelanguagerepresentations?

• Experiment:• FixsourcesideandtrainNMTmodelsondifferenttargetlanguages• ComparelearnedrepresentationsonPOS/morphologicaltagging

Page 56: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• Sourcelanguage:Arabic• Targetlanguages:English,German,Hebrew,Arabic

Page 57: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• Sourcelanguage:Arabic• Targetlanguages:English,German,Hebrew,Arabic

Page 58: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• Poorermorphologyontargetside,bettersourcesiderepresentationsformorphology

Page 59: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

• HigherBLEU≠betterrepresentations

Page 60: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Decoder

Page 61: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EncodervsDecoder

POSAccuracyEncoder Decoder

Arabic↔ English

German↔ English

Czech↔ English

Page 62: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EncodervsDecoder

POSAccuracyEncoder Decoder

Arabic↔ English 89.6 43.9

German↔ English 93.5 53.6

Czech↔ English 75.7 36.3

Page 63: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EncodervsDecoder

POSAccuracyEncoder Decoder

Arabic↔ English 89.6 43.9

German↔ English 93.5 53.6

Czech↔ English 75.7 36.3

• Thedecoderlearnsverylittleabouttargetlanguagemorphology

Page 64: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EncodervsDecoder

POSAccuracyEncoder Decoder

Arabic↔ English 89.6 43.9

German↔ English 93.5 53.6

Czech↔ English 75.7 36.3

• Thedecoderlearnsverylittleabouttargetlanguagemorphology• Why?

Page 65: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofAttention

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 66: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofAttention

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 67: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofAttention

Marianodió una bofetada alabruja verde

Marydidnotslapthegreenwitch<STOP>

Page 68: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofAttention

Withattention

Withoutattention

Englishà German

Englishà Czech

Page 69: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofAttention

Withattention

Withoutattention

Englishà German 44.55 50.26

Englishà Czech 36.35 42.09

Page 70: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• Removingattentionimprovesdecoderrepresentations• Attentionisremovingburdenoffofthedecoder• Thedecoderdoesnotneedtolearnasmuchabouttargetwords

EffectofAttention

Withattention

Withoutattention

Englishà German 44.55 50.26

Englishà Czech 36.35 42.09

Page 71: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• Concatenatingmostattendedwordimprovesperformance• Encoderrepresentationshelpfulfortargetmorphology

EffectofAttention

Withattention

Withoutattention

Withmostattendedword

Englishà German 44.55 50.26 60.34

Englishà Czech 36.35 42.09 48.64

Page 72: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• Concatenatingmostattendedwordimprovesperformance• Encoderrepresentationshelpfulfortargetmorphology• Butusingonlyencodersideisnotasgood

EffectofAttention

Withattention

Withoutattention

Withmostattendedword

Onlymostattendedword

Englishà German 44.55 50.26 60.34 43.43

Englishà Czech 36.35 42.09 48.64 36.36

Page 73: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Summary

• NMTencoderlearnsgoodrepresentationsformorphology• Character-basedrepresentationsmuchbetterthanword-based• Targetlanguageimpactssourcesiderepresentations• Layer1>Layer2>Layer0

• Decoderlearnspoortargetsiderepresentations• Attentionmodelhelpsdecoderexploitsourcerepresentations

Page 74: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Summary

• NMTencoderlearnsgoodrepresentationsformorphology• Character-basedrepresentationsmuchbetterthanword-based• Targetlanguageimpactssourcesiderepresentations• Layer1>Layer2>Layer0

• Decoderlearnspoortargetsiderepresentations• Attentionmodelhelpsdecoderexploitsourcerepresentations

Page 75: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

PartB:Semantics

Page 76: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Recap

• Wesaw• NMTrepresentationsfromlayer1betterthanlayer2(andlayer0)forPOSandmorphologicaltagging• Deepernetworksleadtobettertranslationperformance

Page 77: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Recap

• Wesaw• NMTrepresentationsfromlayer1betterthanlayer2(andlayer0)forPOSandmorphologicaltagging• Deepernetworksleadtobettertranslationperformance

• Questions• Whatiscapturedinhigherlayers?• Howissemanticinformationrepresented?

Page 78: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Recap

• Wesaw• NMTrepresentationsfromlayer1betterthanlayer2(andlayer0)forPOSandmorphologicaltagging• Deepernetworksleadtobettertranslationperformance

• Questions• Whatiscapturedinhigherlayers?• Howissemanticinformationrepresented?

• Let’sapplyasimilarmethodologytoasemantictask

Page 79: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Semantictagging

• Lexicalsemantics• AbstractionoverPOStagging• Language-neutral,aimedformulti-lingualsemanticparsing

Page 80: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Semantictagging

• Lexicalsemantics• AbstractionoverPOStagging• Language-neutral,aimedformulti-lingualsemanticparsing

• Someexamples• Determiners:every,no,some• Commaasconjunction,disjunction,apposition• Rolenouns,entitynouns• Comparisonadjectives:comparative,superlative,equative

Page 81: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

ExperimentalSetup

• Semantictaggingdata• 66fine-grainedtags,13coarsecategories

• MTdata– UNcorpus• Multi-parallel• 11Msentences• Arabic,Chinese,English,French,Spanish,Russian

Train Dev TestSentences 42.5K 6.1K 12.2K

Tokens 937.1K 132.3K 265.5K

Page 82: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Baselines

System AccuracyMostfrequenttag 82.0

Unsupervised embeddings 81.1

Word2Tagencoder-decoder 91.4

State-of-the-art(Bjerva+16) 95.5

Page 83: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

Page 84: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

Page 85: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

• Layer0belowbaseline

Page 86: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

• Layer0belowbaseline• Layer1>>layer0

Page 87: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

• Layer0belowbaseline• Layer1>>layer0• Layer4>layer1

Page 88: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofNetworkDepth

Mostfrequenttag

• Layer0belowbaseline• Layer1>>layer0• Layer4>layer1

• Similartrendsforcoarsetags

Page 89: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

Mostfrequenttag

Page 90: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

Mostfrequenttag

• Noimpactonsemantictagging

Page 91: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

EffectofTargetLanguage

Mostfrequenttag

• Noimpactonsemantictagging• Butlargeimpactontranslation:

BLEUEn-Ar 32.7

En-Es 49.1

En-Fr 38.5

En-Ru 34.2

En-Zh 32.1

Page 92: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Layer4vslayer1• Bleu:distinguishingamongcoarsetags• Red:distinguishingamongfine-grainedtagswithinacoarsecategory

Page 93: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Layer4>layer1

Page 94: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Layer4>layer1• Especiallywith:• Discourserelations(DIS)• Propertiesofnouns(ENT)• Events,tenses(EVE,TNS)• Logicrelationsandquantifiers (LOG)• Comparativeconstructions(COM)

Page 95: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Negativeexamples

Page 96: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Negativeexamples

• Modality(MOD)• Closed-class(“no”,“not”,“should”,”must”,etc.)

Page 97: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

AnalyzingSpecificTags

• Negativeexamples

• Modality(MOD)• Closed-class(“no”,“not”,“should”,”must”,etc.)

• Namedentities(NAM)• OOVs?• NeuralMTlimitation?

Page 98: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Semantictagsvs.POStags

Page 99: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Semantictagsvs.POStags

0 1 2 3 4POS 87.9 92.0 91.7 91.8 91.9

Sem 81.8 87.8 87.4 87.6 88.2

Page 100: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• HigherlayersimprovesemantictaggingbutnotPOStagging• Layer1bestforPOS;layer4bestforsemantictagging

Semantictagsvs.POStags

0 1 2 3 4POS 87.9 92.0 91.7 91.8 91.9

Sem 81.8 87.8 87.4 87.6 88.2

Page 101: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

• HigherlayersimprovesemantictaggingbutnotPOStagging• Layer1bestforPOS;layer4bestforsemantictagging• Similartrendswithbidirectionalencoder

Semantictagsvs.POStags

0 1 2 3 4

UniPOS 87.9 92.0 91.7 91.8 91.9

Sem 81.8 87.8 87.4 87.6 88.2

BiPOS 87.9 93.3 92.9 93.2 92.8

Sem 81.9 91.3 90.8 91.9 91.9

Page 102: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

Summary

• NeuralMTrepresentationscontainusefulinformationaboutwordformandmeaning• LowerlayersfocusonPOS/morphology• Higherlayersfocuson(lexical)semantics• Targetlanguagedoesnotaffectsemantictaggingquality

Page 103: On Learning Form and Meaning in Neural Machine Translation ...people.csail.mit.edu › mitra › meetings › 2017-May09-Yonatan.pdf · Brief History of Machine Translation •1947:

FutureWork

• OtherneuralMTarchitectures• Wordrepresentations;multi-lingualmodels

• Otherlinguisticproperties• Syntacticandsemanticrelations,complexstructures

• ImprovingneuralMT• Multi-tasklearning

• Analyzingrepresentationsinotherneuralmodels• End-to-endspeechrecognition