Meta-level Statistical Machine Translation System

Human Language Technology LabAmirkabir University of Technology

IJCNLP 2013, Nagoya, Japan

Amirkabir University

of Technology

Sajad Ebrahimi, Kourosh Meshgi, Shahram Khadivi

and Mohammad Ebrahim Shiri Ahmad Abady

of Technology

neIntroductionBackgroundStacking for classificationAdapting Stacking to SMT Experiments and ResultsRelated WorkConclusion and Future Work

1 .Introduction Traditional approaches to System Combination need multiple structurally different

SMT systems. In this research, we focus on a single SMT system. We try to introduce a meta-level SMT which can learn how to decrease or modify

translation errors. To do this, we utilize an Ensemble Learning algorithm, called Stacking. The basic idea :

a collection of base-level SMTs is generated for obtaining a meta-level corpus Then a meta-level SMT is trained on this corpus

We address the issue of how to adapt Stacking to SMT.

of Technology

2 .Background2.1 Log-linear model and statistical machine translation : Given a source string S , the goal of SMT is to find a target string from all

possible translations :

In meta-SMT, given a machinery output , the goal is to find a target sentence :

of Technology

�̂�=𝑎𝑟𝑔𝑚𝑎𝑥𝑡 2{𝑝𝑟 ( 𝑡2|𝑡 ) }

𝑡=𝑎𝑟𝑔𝑚𝑎𝑥𝑡 1{𝑝𝑟 (𝑡 1|𝑠 ) }

disjoint almost equal parts : set of different Learning algorithms

2.1 Stacking for Classification

Overview Proposed by Wolpert(1992) learn a meta-level (or level-1) classifier based on the output of base-level (or level-0)

classifiers, estimated via cross-validation as follows: Define

of Technology

Class ValueFeature Vector

J-fold cross-validation

Overview Define ,

of Technology

Training setTest set

At each j-th step, given the learning algorithms, we invoke each of them on to induce and apply to the test part .

The concatenated predictions + the original class value => At the end of the entire cross-validation :

full meta-level data set

3.1. Overview is applied to a learning algorithm to induce meta-level classifier . Finally ,

All the learning algorithms () are applied to the entire data set inducing the final base-level classifiers to be used at runtime.

of Technology

to classify a new instance : the concatenated predictions of all base-level classifiers form a meta-level vector that is assigned a class value by the meta-level classifier

We adopt stacking to SMT in a principled way…

3 .Adapting Stacking to SMT

we adapt it to SMT as follows:

𝑆𝑀𝑇 𝑃𝑎𝑟𝑎𝑑𝑖𝑔𝑚𝑆𝑀𝑇 𝑗

𝑆𝑀𝑇 𝑃𝑎𝑟𝑎𝑑𝑖𝑔𝑚

𝑆𝑀𝑇

𝑚𝑒𝑡𝑎−𝑆𝑀𝑇

New Source Sentences

of Technology

Target Sentences

3.1 Training base-level SMTs we train 5 phrase-based SMT systems on the training part and obtain

the result of these systems on the corresponding test sets. We need these results for the next step.

3.2 Training meta-level SMTs We gathered the n-best outputs of base-level SMTs on the corresponding

test sets to : build a meta-level corpus using these outputs along with correct human

translations Then, train a meta-SMT on this new corpus. We train our meta-SMT on 10 meta-level corpus which is progressively created

from n-best outputs of base-level systems, . we call these systems as meta-SMT (1-best) and meta-SMT (2-best) and so on.

of Technology

3.3 Tuning meta-level SMTs To build a meta-level development set, we tune 5 base-level SMT

systems on the tuning part and obtain the result of these systems on the corresponding test sets.

Finally a meta-level development set is created by gathering these outputs paired with correct human translations to tune meta-level SMTs.

of Technology

4 .Experiments

The corpus that is used for training and cross-validation process is Verbmobil project corpus

of Technology

4.1 Data

#of words #of sentences

249K 23K English216K 23K Persian

4 .Experiments

Giza++ => bi-directional word alignment SRILM => language model training case-insensitive BLEU => quality measuring Moses decoder => a phrase-based SMT (both base-level and meta-

level) MERT => tune the feature weights on the development data

of Technology

4.2 Experimental setup

4 .Experiments Amirkabir University

of Technology

4.3 Evaluation

Type of SMT Test setbaseline SMT 30.47meta-SMT (1-best) 31.20meta-SMT (2-best) 31.00meta-SMT (3-best) 31.37meta-SMT (4-best) 31.49meta-SMT (5-best) 31.41meta-SMT (6-best) 31.05meta-SMT (7-best) 31.19meta-SMT (8-best) 31.40meta-SMT (9-best) 31.30meta-SMT (10-best) 31.54

BLEU (%) scores of baseline SMT and meta-SMTs on the Verbmobil test set that has 250 sentences with four reference translations.

of Technology

4.3 Evaluation• Some examples:

• Delete a wrong word: • EN : that is perfect . then we have talked about everything . goodbye .

•FA (main) . : ردیمMک صحبت اش درباره چیز همه ما پس است عالی خداحافظ. . میبینمآن•FA (meta) . . . : خداحافظ کردیم صحبت دیروز چیز همه ما پس است عالی آن

• Translate an untranslated word:• EN : I think we will take the Metropol hotel . could you reserve two single

rooms ?•FA (main) : هتل را ما میکنم فکر رزرو . Metropolمن دو شما ؟ roomsمیتوانیم مجزا• FA (meta) :هتل را ما میکنم فکر رزرو . Metropolمن دو شما ؟ اتاقهامیتوانیم تک بیندازم

• EN : yes , I would suggest the flight at a quarter past seven .•FA (main) : میکنم پیشنهاد را من ، هفت . flightبله ساعت از بعد ربع یک•FA (meta) : میکنم پیشنهاد را من ، هفت . پروازبله ساعMت از بعد ربع یک

of Technology

4.3 Evaluation• Some examples:

• Rephrase and reordering : • EN : the best thing would be for us to take the subway from our hotel to the

station.•FA (main) : تا بود خواهد ما برای چیز ما مترواز را بهترین . ایستگاهتا هتل

•FA (meta) : از تا بود خواهد ما برای چیز ما بهترین . مترو ایستگاهتا هتل

of Technology

4.3 Evaluation two factors possibly contribute to these results :

performing cross-validation on the training set the re-optimization on the system

we perform two experiments to investigate the effect of each factor : (Straight1) => test the approach without any cross-validation process, but

with the development set obtained from stacking. (Straight2) => to build meta-level SMTs tuned with a development set which

is obtained directly from baseline SMT (i.e., without performing cross-validation on it).

of Technology

4.3 Evaluation

n-best list

StackingStraight1Straight2

Comparison of Stacking, Straight1 and Straight2

of Technology

4.3 EvaluationAfter analyzing the results:

it can be concluded that both factors, i.e., cross-validation and re-optimizing the system with the stacking-based development set, are important to outperform the baseline SMT system. Since use of both factors, consistently lead to the best results.

We conducted statistical significance tests using paired bootstrap resampling proposed by Koehn (2004) to measure the reliability of the conclusion that meta-SMTs are really better than baseline SMT. It is observed that all stacking-based meta-SMTs are really better than the baseline SMT in 99% of the times.

5 .Related Work Amirkabir University

of Technology

Xiao et al. (2010) presented a general solution for adaption of bagging and boosting to SMT. Their results showed that ensemble learning algorithms are promising in SMT.

Simard et al. (2007a), trained a “mono-lingual” Phrase-based SMT system on the output of an RBMT system for the source side of the training set of the Phrase-based SMT system and the corresponding human translated (manually post-edited) reference.

Béchara et al. (2011) designed a full phrase-based SMT pipeline that included a translation step and a post-editing step. They use a novel context aware approach.

5 .Conclusion and future work Amirkabir University

of Technology

We have presented a simple and effective approach to translation error modification by building a meta-level SMT using a meta-level corpus that is created form original corpus by cross validation.

Experimental results showed that such a meta-SMT can fix many translation errors that occur in the baseline translations.

As a future work, we have planned to develop a technique for combining multiple SMT systems using stacked generalization algorithm

5 .Conclusion and future work Amirkabir University

of Technology

Moreover, we are running more tests with different language-pairs and larger corpora.

As another future work, we will apply our framework under different SMT paradigms such as hierarchical phrase-based SMT and syntax-based SMT.

6 .References Amirkabir University

of Technology

1. Almut Silja Hildebrand and Stephan Vogel. 2008. Combination of machine translation systems via hypothesis selection from combined n-best lists. In Proc. of the 8th AMTA conference, pages 254-261.

2. Evegeny Matusov, Nicola Ueffing and Hermann Ney. 2006. Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. In Proc. of EACL 2006, pages 33-40.

3. Antti-Veikko Rosti, Spyros Matsoukas and Richard Schwartz. 2007. Improved word-level system combination for machine translation. In Proc. of the 45th Annual Meeting of the Association for Computational Linguistics, pages. 312-319.

4. Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007a. Statistical phrase-based post-editing. In NAACL-HLT, pages 508-515

5. Béchara, H., Y. Ma, and J. van Genabith. 2011. Post-editing for a statistical MT system. In MT Summit XIII, pages 308-315

6. David H. Wolpert. 1992. Stacked generalization. Neural Networks, 5(2): 241-259.7. Leo Breiman. 1996b. Bagging predictors. Machine Learning, 24(2):123-140.

THANK YOU

of Technology

Meta-level Statistical Machine Translation System

Documents

Transcript of Meta-level Statistical Machine Translation System

Statistical Machine Translation Enhancements through ...

Toward Statistical Machine Translation without Parallel ...klementiev.org/slides/eacl12mt_slides.pdf · Toward Statistical Machine Translation without Parallel Corpora! Alex Klementiev,

Paraphrases for Statistical Machine Translation

Meta-level Statistical Machine Translation System Human Language Technology Lab Amirkabir University of Technology IJCNLP 2013, Nagoya, Japan Amirkabir.

Syntax-based Statistical Machine Translation

Lecture 14: Statistical Machine Translation

Introduction to Statistical Methods in Meta-Analysisodin.mdacc.tmc.edu/~ryu/materials/meta.pdf · What is meta-analysis? Meta-analysis: the statistical synthesis of information from

Statistical Machine Translation Referat - LMU Munichfraser/smt_nmt_2016_seminar/Referat... · Statistical Machine Translation Referat Alexander Fraser CIS, LMU München 2016.11.15

Topic-based term translation models for statistical ...nlp.ict.ac.cn/~mengfandong/papers/AI-2016-mfd.pdf · Statistical machine translation. Term translation is of great importance

Meta Analysis - download.e-bookshelf.de€¦ · Meta Analysis A Guide to Calibrating and Combining Statistical Evidence Elena Kulinskaya Statistical Advisory Service, Imperial College,

An Introduction to Statistical Machine Translation

Symbolic-to-statistical hybridization: extending ... · Symbolic-to-statistical hybridization: extending generation-heavy machine translation ... Statistical machine translation ·Arabic–English

Statistical Translation Language Model

Further Meta-Evaluation of Machine Translation

A Syntax-based Statistical Machine Translation ModelA Syntax-based Statistical Translation Model References A Syntax based Statistical Translation Model Kenji Yamada and Kevin Knight,

Statistical machine translation

Hierarchical Phrase-Based Statistical Machine Translation Systembibek/hierarchical_MT_bibek... · 2014. 8. 17. · Hierarchical Phrase-Based Statistical Machine Translation System

NATIONAL STATISTICAL META DATA DICTIONARY

Recurrent neural networks for statistical machine translation (Neural Machine Translation) · 2016-11-14 · Recurrent neural networks for statistical machine translation (Neural

A Review of Meta-Analysis Packages in R · Keywords: meta-analysis; effect size; statistical software; R 1. Introduction Meta-analysisis a statistical technique that allows an analyst