Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with:...
-
Upload
erick-milton-dorsey -
Category
Documents
-
view
223 -
download
1
Transcript of Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with:...
![Page 1: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/1.jpg)
Active Learning for Statistical Phrase-based Machine Translation
Gholamreza HaffariJoint work with: Maxim Roy, Anoop Sarkar
Simon Fraser UniversityNAACL talk, Boulder, June 2009
![Page 2: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/2.jpg)
2
The Problem
• Statistical Machine Translation (SMT)
• MFE is a standard log-linear model and is composed of two main components:– Phrase tables
– Language model
• Good phrase tables are typically learned from large bilingual (F,E)-text
– What if we don’t have large bilingual text?
MFELanguage F Language E
![Page 3: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/3.jpg)
3
A Solution
• Suppose we are given a large monolingual text in the source language F
• Pay a human expert and ask him/her to translate these sentences into the target language E– This way, we will have a bigger bilingual text
• But our budget is limited !– We cannot afford to translate all monolingual
sentences
![Page 4: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/4.jpg)
4
A Better Solution
• Choose a subset of monolingual sentences for which:
if we had the translation,
the SMT performance would increase the most
• Only ask the human expert for the translation of these highly informative sentences
• This is the goal of Active Learning– Workshop on Active Learning for NLP
![Page 5: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/5.jpg)
5
Active Learning for SMT
Train
MFE
Bilingual text
FF EE
Monolingual text
DecodeTranslated text
FF EE
Translate by human
FF EE FF
SelectInformative Sentences
SelectInformative Sentences
Re-
For more details, see the paper
For more details, see the paper
![Page 6: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/6.jpg)
6
Outline
• General idea of active learning (AL) for statistical machine translation (SMT)
• Sentence Selection Strategies
– Similarity, Decoder’s Confidence– Hierarchical Adaptive Sampling– Sentence merit based on the translation units
• Experiments
– The simulated AL setting– The real AL setting
![Page 7: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/7.jpg)
7
Intuitive Underpinnings for Sent. Selection
• Sentences for which the model is not confident about their translations– Hopefully high confident translations are good ones
• Sentences similar to bilingual text are easy to translate by the model– Select the dissimilar ones to the bilingual text
• Cluster monolingual sentences– Choose some representative sentences for each
cluster
![Page 8: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/8.jpg)
8
Sentence Selection strategies
• Baseline: Randomly choose sentences from the pool of monolingual sentences
• Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007)
• Our proposed methods:– Similarity to the bilingual training data – Reverse model– Hierarchical Adaptive Sampling (HAS)– Utility of the translation units
![Page 9: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/9.jpg)
9
Sentence Selection strategies
• Baseline: Randomly choose sentences from the pool of monolingual sentences
• Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007)
• Our proposed methods:– Similarity to the bilingual training data Reverse modelHierarchical Adaptive Sampling (HAS)Utility of the translation units
![Page 10: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/10.jpg)
10
Reverse Model
Comparing– the original sentence, and– the final sentence
Tells us something about the value of the sentence
I will let you know about the issue later
Je vais vous faire plus tard sur la question
I will later on the question
MEF
Rev: MFE
![Page 11: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/11.jpg)
11
Hierarchical Adaptive Sampling
U0: Monolingual sentences
U1 U2
U2,2U2,1
Average Decoder’s Score Sort sentences wrt similarity to the Bilingual text
Sample sentences from these two nodes
MFE
Bilingual text
FF EE
(Dasgupta & Hsu, 2008)
![Page 12: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/12.jpg)
12
Utility of the Translation Units
Phrases are the basic units of translations in phrase-based SMT
I will let you know about the issue later
Monolingual Text6
6
18
3
Bilingual Text5
6
12
3
7
The more frequent a phrase is in the monolingual text, the more important it is
The more frequent a phrase is in the bilingual text, the less important it is
![Page 13: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/13.jpg)
13
Generative Models for Phrases
Monolingual Text Bilingual Text
66183
Count
.25
.25
.05
.33
.12
Probability
561237
Count Probability
.21
.22
.05
.09
.14
.29
m b
![Page 14: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/14.jpg)
14
Averaged Probability Ratio Score
• For a monolingual sentence S– Consider , the bag of its phrases
– Score: Normalized probability ratio P(S| m)/P(S| b)
– We will refer to it as Geom-Phrase
• Dividing the phrase probabilities captures our intuition about the utility of the translation units
![Page 15: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/15.jpg)
15
Sentence Segmentation
• How to prepare the bag of phrases for a sentence S?
– For the bilingual text, we have the segmentation from the training phase of the SMT model
– For the monolingual text, we run the SMT model to produce the top-n translations and the corresponding segmentations
![Page 16: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/16.jpg)
16
Extensions of the Score
• Instead of using phrases, we may use n-grams
• We may alternatively use the following score
– We will refer to it as Arithmetic Average
![Page 17: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/17.jpg)
17
Sentence Selection strategies (Recap)
• Baseline: Randomly choose sentences from the pool of monolingual sentences
• Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007)
• Our proposed methods:Similarity to the bilingual training data Reverse modelHierarchical Adaptive Sampling (HAS)Utility of the translation units
![Page 18: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/18.jpg)
18
Outline
• General idea of active learning (AL) for statistical machine translation (SMT)
• Sentence Selection Strategies
– Similarity, Decoder’s Confidence– Hierarchical Adaptive Sampling– Sentence merit based on the translation units
• Experiments
– The simulated AL setting– The real AL setting
![Page 19: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/19.jpg)
19
Experimental Setup
• Dataset size:
• We select 200 (or 100) sentences from the monolingual sentence set for 25 (or 5) iterations
• We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007)
Bilingual text Monolingual Text test
Bangla-English 11K 20K 1K
Fr,Gr,Sp-English 5K 20K 2K
![Page 20: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/20.jpg)
20
The Simulated AL Setting
Geometric Phrase
Random
Decoder’s Confidence
Bet
ter
![Page 21: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/21.jpg)
21
The Real AL Setting
• Our human translator is different from the text author
– The methods are good at adapting to the new writing style
Geometric Phrase
Random
![Page 22: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/22.jpg)
22
Domain Adaptation• Now suppose the both test and monolingual text are
out-of-domain with respect to the bilingual text
– The ‘Decoder’s Confidence’ does a good job
– The ‘Geom 1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner
Geom 1-gram
Random Random
Decoder’s Conf
![Page 23: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/23.jpg)
23
Analysis
• The coverage of the bilingual text is important but is not the only factor– Notice the Geom 1-gram and Geom-phrase methods
Cov
erag
e
![Page 24: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/24.jpg)
24
Analysis
![Page 25: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/25.jpg)
25
Conclusions
• We presented different sentence selection methods for SMT in an AL setting
• Using knowledge about the internal architecture of the SMT system is crucial
• Yet, we are after better sentence selection strategies– See our upcoming paper in ACL09
![Page 26: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/26.jpg)
26
Merci
Thank You
![Page 27: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/27.jpg)
27
Domain Adaptation
• Selecting sentences based on: – The ‘Confidence’ does a good job– The ‘1-gram’ outperforms other methods since it quickly
expands the lexicon set in an effective manner
Method Bleu% per% wer%
Geom 1-gram 14.92 34.83 46.06
Confidence 14.74 35.02 46.11
Random 14.11 35.28 46.47
![Page 28: Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.](https://reader036.fdocuments.in/reader036/viewer/2022062321/56649eaa5503460f94baef74/html5/thumbnails/28.jpg)
28
The Simulated AL Setting
Language Pair Geometric Average
Bleu% per% wer%
Random (Baseline)
Bleu% per% wer%
French-English 22.49 27.99 38.45 21.97 28.31 38.80
German-English 17.54 31.51 44.28 17.25 31.63 44.41
Spanish-English 23.03 28.86 39.17 23.00 28.97 39.21
• Using other measure other than BLEU– wer: word error rate– per: position independent word error rate