Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT...
Transcript of Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT...
![Page 1: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/1.jpg)
MachineTranslation:ChallengesandApproaches
![Page 2: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/2.jpg)
Announcements• Finalexam,Dec.21st,1;10-4PM
• DanJurafsky,StanfordUniv.,"DoesThisVehicleBelongtoYou?"ProcessingtheLanguageofPolicingforImprovingPolice-CommunityRelaPons,Dec.5th,5pmDavisAuditorium
• RupalPatel,NorthwesternUniv.,Speechrecordings–alifealteringformofbiologicaldonaPon,Dec.4th,11:30AM,DavisAuditorium
![Page 3: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/3.jpg)
MultilingualUsers• ContentlanguagesforwebsitesPercentageofInternetusersbylanguage
http://en.wikipedia.org/wiki/Global_Internet_usage
Slide from Radev
![Page 4: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/4.jpg)
Afrikaans
Bulgarian
Greek German Igno Kurdish Malayalm
Polish sindhi Tamil
Albanian Catalan English GujaraP Indonesian
Kyrgyz Maltese Portuguese
Sinhala Telugu
Amharic Cebuano Esperanto
HaiPanCreole
Irish Lao Maori Punjabi Slovak Thai
Arabic Chichewa
Estonian Hausa Italian LaPn Marathi Romanian
Sloveian Turkish
Armenian
Chinese Filipino Hawaiian Japanese Latvian Mongolian
Russian Somali Ukranian
Azerbaijani
Corsican Finnish Hebrew Javanese Lithuanian
Myanmar
Samoan Spanish Urdu
Basque CroaPan French Hindi Kannada Luxembourgish
Nepala ScotsGaelic
Sundanese
Uzbek
Belarusian
Czech Frisian Hmong Kazakh Macedonian
Norwegian
Serbian Swahili Veitnamese
Bengali Danish Galician Hungarian
Khmer Malagasy Pashto Sesotho Swedish welsh
Bosnian Dutch Georgian Icelandic Korean Malay Persian Shona Tajik Xhosa
Yiddish Yoruba Zulu
![Page 5: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/5.jpg)
ThankyouforyouraaenPon!QuesPons?
![Page 6: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/6.jpg)
• Romancelanguageshandledwell
• Similarlanguagepairshandledwell(e.g.,Spanish,Portuguese)
• Formalgenreshandledbeaer
SPllmanyproblems!
![Page 7: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/7.jpg)
Today• MulPlingualChallengesforMT
• MTApproaches• StaPsPcal• Neuralnet(Thursday)
• MTEvaluaPon
![Page 8: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/8.jpg)
Today• MulPlingualChallengesforMT
• MTApproaches• StaPsPcal• Neuralnet
• MTEvaluaPon
![Page 9: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/9.jpg)
MultilingualChallenges• OrthographicVariaPons
• Ambiguousspelling • كتب االوالد اشعارا كَتَبَ األوْالدُ اشعَاراً
• Ambiguouswordboundaries•
• LexicalAmbiguity• Bankèبنك(financial)vs. ضفة(river)• Eatèessen(human)vs.fressen(animal)
Slide from Nizar Habash
![Page 10: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/10.jpg)
MultilingualChallengesMorphologicalVariations
• AffixaPonvs.Root+Paaern
write è written كتب è ب وكتمkill è killed قتل è ل وقتمdo è done فعل è ل وفعم
conj
noun
plural article
• Tokenization And the cars è and the cars
ات سيارالو è w Al SyArAt Et les voitures è et le voitures
Slide from Nizar Habash
![Page 11: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/11.jpg)
هنا لستI-am-not here
am
I here
I am not here
not
ت لس
هنا
Translation Divergences conflation
Je ne suis pas ici I not am not here
suis
Je ici ne pas
Slide from Nizar Habash
![Page 12: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/12.jpg)
TranslationDivergencesEnglish John swam across the river quickly Spanish Juan cruzó rapidamente el río nadando
Gloss: John crossed fast the river swimming Arabic سباحةالنهرعبورجوناسرع
Gloss: sped john crossing the-river swimming Chinese 约翰 快速 地 游 过 这 条
河
Gloss: John quickly (DE) swam cross the (Quantifier) river
Russian Джон быстро переплыл реку Gloss: John quickly cross-swam river
Slide from Nizar Habash
![Page 13: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/13.jpg)
LanguageDifferences-vocabulary
[Example from Jurafsky and Martin]
![Page 14: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/14.jpg)
LanguageDifferences-Syntax• Wordorder
• SVO:English,Mandarin• VSO:Irish,ClassicalArabic• SOV:Hindi,Japanese
• Wordorderinphrases(Fr.)• lamaisonbleue,thebluehouse
• Wordorderinsentences(Jap.)• Iliketodrinkcoffee• watashiwakohiionomunogasukidesu• I-subjcoffee-objdrink-dat-rhemelike
• PreposiPons(Jap.)• toMariko,Mariko-ni Slide adapted from Radevc
![Page 15: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/15.jpg)
Today• MulPlingualChallengesforMT
• MTApproaches• StaPsPcal• Neuralnet
• MTEvaluaPon
![Page 16: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/16.jpg)
MTApproachesMTPyramid
S
(Source)
T (Target)
I (Interlingua)
syntax
semantics
phrases phrases
syntax
semantics
![Page 17: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/17.jpg)
String-to-StringTranslation
S T
I
syntax
semantics
phrases phrases
syntax
semantics
Slide from Radev
![Page 18: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/18.jpg)
MTApproachesGistingExample
Sobre la base de dichas experiencias se estableció en 1988 una metodología.
Envelope her basis out speak experiences them settle at 1988 one methodology.
On the basis of these experiences, a methodology was arrived at in 1988.
![Page 19: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/19.jpg)
Phrase-BasedTranslation
S T
I
syntax
semantics
phrases phrases
syntax
semantics
Slide from Radev
![Page 20: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/20.jpg)
Tree-to-TreeTranslation
S T
I
syntax
semantics
phrases phrases
syntax
semantics
Slide from Radev
![Page 21: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/21.jpg)
MTApproachesTransferExample
• TransferLexicon• MapSLstructuretoTLstructure
à
poner
X mantequilla en
Y
:obj :mod :subj
:obj
butter
X Y
:subj :obj
X puso mantequilla en Y X buttered Y Slide from Nizar Habash
![Page 22: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/22.jpg)
Tree-to-StringTranslation
S T
I
syntax
semantics
phrases phrases
syntax
semantics
Slide from Radev
![Page 23: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/23.jpg)
MTApproachesMTPyramid
S
(Source)
T (Target)
I (Interlingua)
syntax
semantics
phrases phrases
syntax
semantics
![Page 24: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/24.jpg)
MTApproachesInterlinguaExample:LexicalConceptualStructure
(Dorr, 1993)
![Page 25: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/25.jpg)
MTApproachesMTPyramid
S
(Source)
T (Target)
I (Interlingua)
syntax
semantics
phrases phrases
syntax
semantics
Interlingual Lexicons
Transfer Lexicons Transfer Lexicons
Dictionaries/Parallel Corpora
![Page 26: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/26.jpg)
Today• MulPlingualChallengesforMT
• MTApproaches• StaPsPcal• Neuralnet
• MTEvaluaPon
![Page 27: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/27.jpg)
TranslationasDecoding• “OnenaturallywondersiftheproblemoftranslaPoncouldconceivablybetreatedasaproblemincryptography.WhenIlookatanarPcleinRussian,Isay:'ThisisreallywriaeninEnglish,butithasbeencodedinsomestrangesymbols.Iwillnowproceedtodecode.'“• WarrenWeaver,“TranslaPon(1955)”
Slide from Radev
![Page 28: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/28.jpg)
http://www.ancientegypt.co.uk/writing/rosetta.html
Carved in 196 BC in Egypt Deciphered by Champollion in 1822 Mixture of Egyptian (hieroglyphs and Demotic) and Greek
TheFirstparallelcorpus:TheRosettaStone
Slide from Radev
![Page 29: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/29.jpg)
Europarl:AParallelCorpusforStatisticalMachineTranslation• ProceedingsoftheEuropeanParliament• 21Europeanlanguages
• Romanic(French,Italian,Spanish,Portuguese,Romanian),Germanic(English,Dutch,German,Danish,Swedish),Slavik(Bulgarian,Czech,Polish,Slovak,Slovene),Finni-Ugric(Finnish,Hungarian,Estonian),BalPc(Latvian,Lithuanian),andGreek
• 60millionwords/language• Mustbealignedfirst
Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf
![Page 30: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/30.jpg)
Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf
![Page 31: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/31.jpg)
![Page 32: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/32.jpg)
StatisticalMTNoisyChannelModel
Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf
![Page 33: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/33.jpg)
StatisticalMTTranslate from French: “une fleur rouge”?
Slide from Radev
p(e) p(f|e) p(e)*p(f|e)
1. a flower red 2. red flower a 3. flower red a 4. a red dog 5. dog cat mouse 6. a red flower
![Page 34: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/34.jpg)
![Page 35: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/35.jpg)
StatisticalMTTranslate from French: “une fleur rouge”?
Slide from Radev
p(e) p(f|e) p(e)*p(f|e)
1. a flower red Low 2. red flower a Low 3. flower red a Low 4. a red dog High 5. dog cat mouse Low 6. a red flower High
![Page 36: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/36.jpg)
![Page 37: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/37.jpg)
StatisticalMTTranslate from French: “une fleur rouge”?
Slide from Radev
p(e) p(f|e) p(e)*p(f|e)
1. a flower red Low High 2. red flower a Low High 3. flower red a Low High 4. a red dog High Low 5. dog cat mouse Low Low 6. a red flower High High
![Page 38: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/38.jpg)
StatisticalMTTranslate from French: “une fleur rouge”?
Slide from Radev
p(e) p(f|e) p(e)*p(f|e)
1. a flower red Low High Low 2. red flower a Low High Low 3. flower red a Low High Low 4. a red dog High Low Low 5. dog cat mouse Low Low Low 6. a red flower High High High
![Page 39: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/39.jpg)
StatisticalMTAutomaticWordAlignment
• GIZA++• AstaPsPcalmachinetranslaPontoolkitusedtotrainwordalignments.• UsesExpectaPon-MaximizaPonwithvariousconstraintstobootstrapalignments
Slide based on Kevin Knight’s http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
Mary
did
not
slap
the
green
witch
Maria no dio una bofetada a la bruja verde
Slide from Nizar Habash
![Page 40: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/40.jpg)
![Page 41: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/41.jpg)
StatisticalMTIBMModel(Word-basedModel)
http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf
![Page 42: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/42.jpg)
IBM’sEMtrainedmodels(1-5)• WordtranslaPon• Localalignment• FerPliPes• Class-basedalignment• Re-orderingAllareseparatemodelstotrain!Model1: ∏
=+==
m
jajm jefp
nceafpeapeafp
1
)|()1(
),|(*)|()|,(
![Page 43: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/43.jpg)
Phrase-BasedStatisticalMT
• Foreign input segmented in to phrases – “phrase” is any sequence of words
• Each phrase is probabilistically translated into English – P(to the conference | zur Konferenz) – P(into the meeting | zur Konferenz)
• Phrases are probabilistically re-ordered See [Koehn et al, 2003] for an intro. This was state-of-the-art before neural MT
Morgen fliege ich nach Kanada zur Konferenz
Tomorrow I will fly to the conference In Canada
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 44: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/44.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
WordAlignmentInducedPhrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 45: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/45.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
WordAlignmentInducedPhrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 46: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/46.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
WordAlignmentInducedPhrases
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch)
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 47: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/47.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) …
Word Alignment Induced Phrases Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 48: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/48.jpg)
Mary did not slap the green witch
Maria no dió una bofetada a la bruja verde
(Maria, Mary) (no, did not) (slap, dió una bofetada) (la, the) (bruja, witch) (verde, green) (a la, the) (dió una bofetada a, slap the) (Maria no, Mary did not) (no dió una bofetada, did not slap), (dió una bofetada a la, slap the) (bruja verde, green witch) (Maria no dió una bofetada, Mary did not slap) (a la bruja verde, the green witch) … (Maria no dió una bofetada a la bruja verde, Mary did not slap the green witch)
WordAlignmentInducedPhrasesSlide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 49: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/49.jpg)
AdvantagesofPhrase-BasedSMT
• Many-to-manymappingscanhandlenon-composiPonalphrases
• LocalcontextisveryusefulfordisambiguaPng• “Interestrate” à…• “Interestin”à…
• Themoredata,thelongerthelearnedphrases• SomePmeswholesentences
Slide courtesy of Kevin Knight http://www.sims.berkeley.edu/courses/is290-2/f04/lectures/mt-lecture.ppt
![Page 50: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/50.jpg)
StringtoTreeTranslation
(Yamada and Knight 2001)
He adores listening to music
He music to listening adores
He/ha music to listening/no ga adores/desu
![Page 51: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/51.jpg)
Clauserestructuring(Collinsetal.)• IchwerdeIhnendenReportaushaendigen…damitSiedeneventuelluebernehmentkoennen.
• Iwillpass_onto_youthereport,so_thatyoucanadoptthatperhaps
• verbiniPal:thatperhapsadoptcan->adoptthatperhapscan
• verbsecond:sothatyouadopt…can->sothatyoucanadopt
• movesubject:sothatcanyouadopt->sothatyoucanadopt
• parPcles:weacceptthepresidency*ParPcle*->weacceptthepresidency(inGerman,split-prefixphrasalverbsareverycommon,e.g.,“anrufen”->“rufensiebiaenocheinmalan”–callrightbackplease)
Slide from Radev
![Page 52: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/52.jpg)
SynchronousGrammars• Generateparsetreesinparallelintwolanguagesusingdifferentrules
• E.g.,• NP->ADJN(inEnglish)• NP->NADJ(inSpanish)
• ITG(InversionTransducPonGrammar)[Wu1995]• Don’tallowallpermutaPonsinderivaPons• Only<>and[]areallowed
Slide from Radev
![Page 53: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/53.jpg)
MTApproachesPracticalConsiderations
• ResourcesAvailability• ParsersandGenerators
• Input/Outputcompatability
• TranslaPonLexicons• Word-basedvs.Transfer/Interlingua
• ParallelCorpora• Domainofinterest• Biggerisbeaer
• TimeAvailability• StaPsPcaltraining,resourcebuilding
Slide from Nizar Habash
![Page 54: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/54.jpg)
Today• MulPlingualChallengesforMT
• MTApproaches• StaPsPcal• Neuralnet(Thursday)
• MTEvaluaPon
![Page 55: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/55.jpg)
MTEvaluation• Moreartthanscience• WiderangeofMetrics/Techniques
• interface,…,scalability,…,faithfulness,...space/Pmecomplexity,…etc.
• AutomaPcvs.Human-based• DumbMachinesvs.SlowHumans
Slide from Nizar Habash
![Page 56: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/56.jpg)
5 contents of original sentence conveyed (might need minor corrections)
4 contents of original sentence conveyed BUT errors in word order
3 contents of original sentence generally conveyed BUT errors in relationship between phrases, tense, singular/plural, etc.
2 contents of original sentence not adequately conveyed, portions of original sentence incorrectly translated, missing modifiers
1 contents of original sentence not conveyed, missing verbs, subjects, objects, phrases or clauses
Human-based Evaluation Example Accuracy Criteria
Slide from Nizar Habash
![Page 57: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/57.jpg)
5 clear meaning, good grammar, terminology and sentence structure
4 clear meaning BUT bad grammar, bad terminology or bad sentence structure
3 meaning graspable BUT ambiguities due to bad grammar, bad terminology or bad sentence structure
2 meaning unclear BUT inferable
1 meaning absolutely unclear
Human-based Evaluation Example Fluency Criteria
Slide from Nizar Habash
![Page 58: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/58.jpg)
Today:Crowdsourcing• AmazonMechanicalTurkorCrowdFlower
• CreateaHITforeachsentence
• GetmulPpleworkerstorate
• Pay.01to.10perhit
• CompleteanevaluaPoninhours(vsdays/weeks)
• Ethics?
![Page 59: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/59.jpg)
AutomaticEvaluationExampleBleuMetric(Papinenietal2001)
• Bleu• BiLingualEvalua@onUnderstudy• Modifiedn-gramprecisionwithlengthpenalty• Quick,inexpensiveandlanguageindependent• CorrelateshighlywithhumanevaluaPon• BiasagainstsynonymsandinflecPonalvariaPons
Slide from Nizar Habash
![Page 60: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/60.jpg)
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriously
Gold Standard References
all dull jade ideas sleep irately drab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Slide from Nizar Habash
![Page 61: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/61.jpg)
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriously
Gold Standard References
all dull jade ideas sleep irately drab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Unigram precision = 4/5
Slide from Nizar Habash
![Page 62: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/62.jpg)
AutomaticEvaluationExampleBleuMetric
TestSentence
colorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriouslycolorlessgreenideassleepfuriously
Gold Standard References
all dull jade ideas sleep irately drab emerald concepts sleep furiously
colorless immature thoughts nap angrily
Unigram precision = 4 / 5 = 0.8 Bigram precision = 2 / 4 = 0.5
Bleu Score = (a1 a2 …an)1/n = (0.8 ╳ 0.5)½ = 0.6325 è 63.25
Slide from Nizar Habash
![Page 63: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/63.jpg)
BLEUscoresfor110translationsystemstrainedonEuroparl
Koehn, MT Summit, 2005 http://homepages.inf.ed.ac.uk/pkoehn/publications/europarl-mtsummit05.pdf
![Page 64: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/64.jpg)
![Page 65: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/65.jpg)
AutomaticEvaluationExampleMETEOR(LavieandAgrawal2007)
• MetricforEvaluaPonofTranslaPonwithExplicitwordOrdering
• ExtendedMatchingbetweentranslaPonandreference• Porterstems,wordNetsynsets
• UnigramPrecision,Recall,parameterizedF-measure• ReorderingPenalty• ParameterscanbetunedtoopPmizecorrelaPonwithhumanjudgments
• Notbiasedagainst“non-staPsPcal”MTsystems
Slide from Nizar Habash
![Page 66: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/66.jpg)
MetricsMATRWorkshop• WorkshopinAMTAconference2008
• AssociaPonforMachineTranslaPonintheAmericas
• EvaluaPngevaluaPonmetrics• Compared39metrics
• 7baselinesand32newmetrics• VariousmeasuresofcorrelaPonwithhumanjudgment• DifferentcondiPons:textgenre,sourcelanguage,numberofreferences,etc.
Slide from Nizar Habash
![Page 67: Machine Translation: Challenges and Approacheskathy/NLP/2017/ClassSlides/Class22MT/MT2017.pdfMT Pyramid S (Source) T (Target) I (Interlingua) ... • verb iniPal: ... , split-prefix](https://reader031.fdocuments.in/reader031/viewer/2022022522/5b3176937f8b9a2c0b8bb671/html5/thumbnails/67.jpg)
AutomaticEvaluationExampleSEPIA(HabashandElKholy2008)• AsyntacPcally-awareevaluaPonmetric• (LiuandGildea,2005;Owczarzaketal.,2007;GiménezandMàrquez,2007)
• UsesdependencyrepresentaPon• MICAparser(Nasr&Rambow2006)• 77%ofallstructuralbigramsaresurfacen-gramsofsize2,3,4
• Includesdependencysurfacespanasafactorinscore• long-distancedependenciesshouldreceiveagreaterweightthanshortdistancedependencies
• HigherdegreeofgrammaPcality? 0%5%10%15%20%25%30%35%40%45%50%
1 2 3 4plus