New CS11-737: Multilingual Natural Language Processing...

42
CS11-737: Multilingual Natural Language Processing Yulia Tsvetkov Translation

Transcript of New CS11-737: Multilingual Natural Language Processing...

Page 1: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

CS11-737: Multilingual Natural Language Processing

Yulia Tsvetkov

Translation

Page 2: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Translation

Mr. and Mrs. Dursley, who lived at number 4 on Privet Drive, were proud to say they were very normal, fortunately.

El señor y la señora Dursley, que vivían en el número 4 de Privet Drive, estaban orgullosos de decir que eran muy normales, afortunadamente.

Page 3: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Plan

● The practice of translation● Machine translation (MT)● MT data sources● MT evaluation

Page 4: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Translation is important and ubiquitous

Page 5: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate
Page 6: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate
Page 7: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Why is it difficult to translate?

● Lexical ambiguities and divergences across languages

[Examples from Jurafsky & Martin Speech and Language Processing 2nd ed.]

Page 8: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Why is it difficult to translate?

● Cross-lingual lexical and structural divergences

錨玉自在枕上感念寶釵。。。又聽見窗外竹梢焦葉之上, 雨聲漸沂, 清寒透幕, 不党又滴下淚來 。

dai yu zi zai zhen shang gan nian bao chai...you ting jian chuang wal zhu shao xiang ye

zhe shang, yu sheng xili, qing han tou mu, bu jue you di xia lei lat

From “Dream of the Red Chamber” Cao Xue Qin (1792)

Page 9: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Why is it difficult to translate?

[Example from Jurafsky & Martin Speech and Language Processing 2nd ed.]

Page 10: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Why is it difficult to translate?

● Ambiguities○ words○ morphology○ semantics○ pragmatics

● Gaps in data○ availability of corpora○ commonsense knowledge

● +Understanding of context, connotation, social norms, etc.

Page 11: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

3 Classical methods for MT

● Direct● Transfer ● Interlingua

Page 12: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

The Vauquois triangle (1968)

Page 13: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Direct translation

● Word-by-word dictionary translation● Rely on linguistic knowledge for simple reordering or morphological

processing

morphological analysis

lexical transfer using bilingual dictionary

local reordering

morphological generation

source language text

target language text

Page 14: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Direct MT dictionary entry

Page 15: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Levels of transfer

Page 16: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Syntactic transfer

Page 17: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Syntactic transfer

Page 18: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Syntactic transfer

Page 19: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Semantic transfer

Page 20: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

● Semantic transfer

Page 21: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Transfer approaches

Page 22: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Interlingua

Page 23: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate
Page 24: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate
Page 25: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Learning from data

Page 26: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Parallel corpora

Page 27: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Parallel corpora

Page 28: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Parallel corpora

Page 29: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Parallel corpora

Page 30: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Parallel corpora

Mining parallel data from microblogs Ling et al. 2013

Page 31: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

opus.nlpl.eu

Page 32: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Is it a good translation?

Page 33: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

MT evaluation is hard

● MT Evaluation is a research topic on its own

● Language variability: there is no single correct translation○ Is system A better than system B?

● Human evaluation is subjective

Page 34: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Human evaluation

● Adequacy and Fluency ○ Usually on a Likert scale (1 “not adequate at all” to 5 “completely adequate”)

Page 35: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Human evaluation

● Ranking of the outputs of different systems at the system level

Page 36: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Human evaluation

● Adequacy and Fluency ○ Usually on a Likert scale (1 “not adequate at all” to 5 “completely adequate”)

● Ranking of the outputs of different systems at the system level ● Post editing effort: how much effort does it take for a translator (or even

monolingual) to “fix” the MT output so it is “good” ● Task-based evaluation: was the performance of the MT system sufficient to

perform a task.

Page 37: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Automatic evaluation

● Precision-based○ BLEU, NIST, ...

● F-score-based○ Meteor,...

● Error rates○ WER, TER, PER,...

● Using syntax/semantics○ PosBleu, Meant, DepRef,...

● Embedding based ○ BertScore, chrF, YISI-1, ESIM, ...

Page 38: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Automatic evaluation

● The BLEU score proposed by IBM (Papineni et al., 2002) ○ Count n-grams overlap between machine translation output and

reference reference translations○ Compute precision for ngrams of size 1 to 4 ○ No recall (because difficult with multiple references) ○ To compensate for recall: “brevity penalty”. Translations that are too short

are penalized ○ Final score is the geometric average of the n-gram precisions, times the

brevity penalty

○ Calculate the aggregate score over a large test set

Page 39: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

BLEU vs. human judgments

Page 40: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Automatic evaluation

● Embedding based ○ BertScore, chrF, YISI-1, ESIM, ...

Page 41: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

MT venues and competitions

● MT tracks in *CL conferences ● WMT, IWSLT, AMTA...

● www.statmt.org

Page 42: New CS11-737: Multilingual Natural Language Processing Translationdemo.clab.cs.cmu.edu/11737fa20/slides/multiling-06... · 2020. 9. 17. · Use Google translate to back-translate

Class discussion

● Pick a 4-line excerpt from a poem in English

● Use Google translate to back-translate the poem via a pivot language, e.g., ○ English → Spanish → English○ English → L1 → L2 → English, where L1 and L2 are typologically

different from English and from each other

● Compare the original poem and its English back-translation, and share your observations. For example, ○ what information got lost in the process of translation? ○ Are there translation errors associated with linguistic properties of pivot

languages and with linguistic divergences across languages? ○ Try different pivot languages: can you provide insights about the quality

of MT for those language pairs?