Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with:...

Active Learning for Statistical Phrase-based Machine Translation

Gholamreza HaffariJoint work with: Maxim Roy, Anoop Sarkar

Simon Fraser UniversityNAACL talk, Boulder, June 2009

The Problem

• Statistical Machine Translation (SMT)

• MFE is a standard log-linear model and is composed of two main components:– Phrase tables

– Language model

• Good phrase tables are typically learned from large bilingual (F,E)-text

– What if we don’t have large bilingual text?

MFELanguage F Language E

A Solution

• Suppose we are given a large monolingual text in the source language F

• Pay a human expert and ask him/her to translate these sentences into the target language E– This way, we will have a bigger bilingual text

• But our budget is limited !– We cannot afford to translate all monolingual

sentences

A Better Solution

• Choose a subset of monolingual sentences for which:

if we had the translation,

the SMT performance would increase the most

• Only ask the human expert for the translation of these highly informative sentences

• This is the goal of Active Learning– Workshop on Active Learning for NLP

Active Learning for SMT

Bilingual text

Monolingual text

DecodeTranslated text

Translate by human

FF EE FF

SelectInformative Sentences

For more details, see the paper

Outline

• General idea of active learning (AL) for statistical machine translation (SMT)

• Sentence Selection Strategies

– Similarity, Decoder’s Confidence– Hierarchical Adaptive Sampling– Sentence merit based on the translation units

• Experiments

– The simulated AL setting– The real AL setting

Intuitive Underpinnings for Sent. Selection

• Sentences for which the model is not confident about their translations– Hopefully high confident translations are good ones

• Sentences similar to bilingual text are easy to translate by the model– Select the dissimilar ones to the bilingual text

• Cluster monolingual sentences– Choose some representative sentences for each

cluster

Sentence Selection strategies

• Baseline: Randomly choose sentences from the pool of monolingual sentences

• Previous Work: Decoder’s confidence for the translations (Kato & Barnard, 2007)

• Our proposed methods:– Similarity to the bilingual training data – Reverse model– Hierarchical Adaptive Sampling (HAS)– Utility of the translation units

Sentence Selection strategies

• Our proposed methods:– Similarity to the bilingual training data Reverse modelHierarchical Adaptive Sampling (HAS)Utility of the translation units

Reverse Model

Comparing– the original sentence, and– the final sentence

Tells us something about the value of the sentence

I will let you know about the issue later

Je vais vous faire plus tard sur la question

I will later on the question

Rev: MFE

Hierarchical Adaptive Sampling

U0: Monolingual sentences

U2,2U2,1

Average Decoder’s Score Sort sentences wrt similarity to the Bilingual text

Sample sentences from these two nodes

Bilingual text

(Dasgupta & Hsu, 2008)

Utility of the Translation Units

Phrases are the basic units of translations in phrase-based SMT

I will let you know about the issue later

Monolingual Text6

Bilingual Text5

The more frequent a phrase is in the monolingual text, the more important it is

The more frequent a phrase is in the bilingual text, the less important it is

Generative Models for Phrases

Monolingual Text Bilingual Text

Probability

561237

Count Probability

Averaged Probability Ratio Score

• For a monolingual sentence S– Consider , the bag of its phrases

– Score: Normalized probability ratio P(S| m)/P(S| b)

– We will refer to it as Geom-Phrase

• Dividing the phrase probabilities captures our intuition about the utility of the translation units

Sentence Segmentation

• How to prepare the bag of phrases for a sentence S?

– For the bilingual text, we have the segmentation from the training phase of the SMT model

– For the monolingual text, we run the SMT model to produce the top-n translations and the corresponding segmentations

Extensions of the Score

• Instead of using phrases, we may use n-grams

• We may alternatively use the following score

– We will refer to it as Arithmetic Average

Sentence Selection strategies (Recap)

• Our proposed methods:Similarity to the bilingual training data Reverse modelHierarchical Adaptive Sampling (HAS)Utility of the translation units

Outline

• General idea of active learning (AL) for statistical machine translation (SMT)

• Sentence Selection Strategies

– Similarity, Decoder’s Confidence– Hierarchical Adaptive Sampling– Sentence merit based on the translation units

• Experiments

– The simulated AL setting– The real AL setting

Experimental Setup

• Dataset size:

• We select 200 (or 100) sentences from the monolingual sentence set for 25 (or 5) iterations

• We use Portage from NRC as the underlying SMT system (Ueffing et al, 2007)

Bilingual text Monolingual Text test

Bangla-English 11K 20K 1K

Fr,Gr,Sp-English 5K 20K 2K

The Simulated AL Setting

Geometric Phrase

Random

Decoder’s Confidence

The Real AL Setting

• Our human translator is different from the text author

– The methods are good at adapting to the new writing style

Geometric Phrase

Random

Domain Adaptation• Now suppose the both test and monolingual text are

out-of-domain with respect to the bilingual text

– The ‘Decoder’s Confidence’ does a good job

– The ‘Geom 1-gram’ outperforms other methods since it quickly expands the lexicon set in an effective manner

Geom 1-gram

Random Random

Decoder’s Conf

Analysis

• The coverage of the bilingual text is important but is not the only factor– Notice the Geom 1-gram and Geom-phrase methods

Analysis

Conclusions

• We presented different sentence selection methods for SMT in an AL setting

• Using knowledge about the internal architecture of the SMT system is crucial

• Yet, we are after better sentence selection strategies– See our upcoming paper in ACL09

Thank You

Domain Adaptation

• Selecting sentences based on: – The ‘Confidence’ does a good job– The ‘1-gram’ outperforms other methods since it quickly

expands the lexicon set in an effective manner

Method Bleu% per% wer%

Geom 1-gram 14.92 34.83 46.06

Confidence 14.74 35.02 46.11

Random 14.11 35.28 46.47

The Simulated AL Setting

Language Pair Geometric Average

Bleu% per% wer%

Random (Baseline)

Bleu% per% wer%

French-English 22.49 27.99 38.45 21.97 28.31 38.80

German-English 17.54 31.51 44.28 17.25 31.63 44.41

Spanish-English 23.03 28.86 39.17 23.00 28.97 39.21

• Using other measure other than BLEU– wer: word error rate– per: position independent word error rate

Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with:...

Documents

Transcript of Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with:...

Semantic Role Labeling Tutorial - NAACL HLT 2013naacl2013.naacl.org/Documents/semantic-role-labeling-part-1-naacl... · Semantic Role Labeling Tutorial NAACL, June 9, 2013 Part 1:

1 Gholamreza Haffari Simon Fraser University MT Summit, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT.

Computer Science and Engineering (CSE) Department, IIT ......Diptesh Kanojia, Malhar Kulkarni, Pushpak Bhattacharyya, Gholamreza Haffari: Cognate Identification to improve Phylogenetic

Curriculum Vita Gholamreza Mowlaviisp.tums.ac.ir/content/board_members/10/Dr_Gholamreza...1 Curriculum Vita Gholamreza Mowlavi Professor of Parasitology Department of Medical Parasitology

The NLTK FrameNet Lexicon API - NAACL: North American ...naacl.org/naacl-hlt-2015/tutorial-framenet-data/FrameNetAPI.pdf · • NLTK (Bird, Klein, & Loper 2009; ) is a Python toolkit

Business Meeting NAACL 2015 · 2020. 5. 1. · Result of last year's initiatives (1/2) Continued video recording of NAACL 2015 Three scholarships for summer school ... gaming, health,etc.)

Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

Margaret Mitchell - NAACL HLT 2016m-mitchell.com/NAACL-2016/SRW/N16-2-2016.pdfMinjoon Seo, University of Washington Kairit Sirts, Tallinn University of Technology Huan Sun, University

13 naacl-a latent variable model-qiu and jiang-slides

Hierarchical Dirichlet Trees for Information Retrieval Gholamreza Haffari Simon Fraser University Yee Whye Teh University College London NAACL talk, Boulder,

Fatemeh Rasouli Khorshidi , Gholamreza Sarami , Habibollah Naderi , Ali Asghar …edcbmj.ir/article-1-1809-en.pdf · * Corresponding author: Gholamreza Sarami, Assistant Professor,

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

Type-Based MCMCpliang/papers/type-naacl2010-talk.pdf · Type-Based MCMC NAACL 2010 { Los Angeles Percy Liang Michael I. Jordan Dan Klein. Type-Based MCMC NAACL 2010 { Los Angeles

[Karl Terzaghi, Ralph B. Peck, Gholamreza Mesri] S(BookZZ.org)

FINANCIAL INSTRUMENTS By: Associate Professor Dr. GholamReza Zandi zandi@segi.edu.my.

CURRICULUM VITAE GHODRATI AMIRI, Gholamreza … Files/cv/CV-Ghodrati-96-2-23.pdf · CURRICULUM VITAE GHODRATI AMIRI, Gholamreza PERSONAL INFORMATION ... Canadian Standard Association

FrameNet for NLP - naacl.orgnaacl.org/naacl-hlt-2015/tutorial-framenet-data/TutorialIntroCFB... · FrameNet for NLP Getting the ... Collin F. Baker, ICSI FrameNet Tutorial, NAACL-HLT

NAACL HLT 2019 · 2019. 5. 10. · Introduction Welcome to the NAACL-HLT 2019 Student Research Workshop! This year's submissions were organized in two tracks: research papers and

Analysis of Semi-supervised Learning with the Yarowsky Algorithm Gholamreza Haffari School of Computing Sciences Simon Fraser University.

Distributional Semantic Models - Tutorial at NAACL-HLT ...