Download - Exploration of System Combination in Statistical Machine Translation

Transcript
Page 1: Exploration of System Combination in Statistical Machine Translation

Exploration of system combination in statistical machine translation

Le Truong Vinh Phu

Supervisor: Prof. Ng Hwee Tou

Master of Computing dissertation

School of Computing

27th May 2014

Page 2: Exploration of System Combination in Statistical Machine Translation

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT)

•  Experiments •  Conclusion and Future Research

Outline

2

Page 3: Exploration of System Combination in Statistical Machine Translation

•  Introduction ♦  Machine translation (MT)

♦  Statistical machine translation (SMT)

♦  Machine translation system combination

♦  Problem description & objective

•  Literature Review

•  Multi-Engine Machine Translation (MEMT) •  Experiments

•  Conclusion and Future Research

Outline

3

Page 4: Exploration of System Combination in Statistical Machine Translation

•  the use of computers to automate translation •  difficulty: translation divergences •  real-world benefits

•  different paradigms and approaches ♦  dictionary-based

♦  rule-based

♦  statistical

Machine translation (MT)

4

Page 5: Exploration of System Combination in Statistical Machine Translation

•  enabled by the availability of large corpora (mono, bi-lingual)

•  relying on probability models ♦  faithfulness

♦  fluency

•  P(F|E): translation model, P(E): language model

•  Phrase-based SMT (Koehn et al., 2003)

Statistical machine translation (SMT)

5

Page 6: Exploration of System Combination in Statistical Machine Translation

•  Language model: ♦  conditional probability of a word given previous words

♦  requires monolingual corpus

•  Alignment

Statistical machine translation (SMT)

6

Page 7: Exploration of System Combination in Statistical Machine Translation

•  Reordering model: ♦  penalties for long distance reordering

♦  distance-based (Koehn et al., 2005), phrase-based and hierarchical reordering (Galley & Manning, 2008)

•  Automatic evaluation: ♦  BLEU (Papineni et al., 2002)

Statistical machine translation (SMT)

7

Page 8: Exploration of System Combination in Statistical Machine Translation

•  different MT systems => different strengths and weaknesses

•  synthesizing a consensus translation

•  main aspects: ♦  combination method

♦  selection of good component systems to combine

MT system combination

8

Page 9: Exploration of System Combination in Statistical Machine Translation

•  Problem description ♦  in which situation and settings system combination works well?

•  Objective:

♦  evaluating system combination via empirical experiments Ø  available datasets: NIST OpenMT, WMT

♦  utilizing system combination to improve a Chinese-to-English phrase-based system

Problem description & objective

9

Page 10: Exploration of System Combination in Statistical Machine Translation

•  Introduction •  Literature Review ♦  System combination

♦  Confusion network decoding

♦  Other approaches

♦  Diverse hypotheses generation

•  Multi-Engine Machine Translation (MEMT) •  Experiments •  Conclusion and Future Research

Outline

10

Page 11: Exploration of System Combination in Statistical Machine Translation

•  successfully applied in speech recognition (Fiscus, 1997; Mangu et al., 2000)

•  crucial steps: aligning hypotheses, controlling word order

•  variety of approaches: ♦  hypothesis re-ranking (Hildebrand & Vogel, 2008)

♦  confusion networks (Rosti et al., 2007a, 2007b)

♦  collaborative decoding (Li et al., 2009)

System combination

11

Page 12: Exploration of System Combination in Statistical Machine Translation

•  current mainstream •  Bangalore et al. (2001), Matusov et al. (2006), Rosti et

al. (2007a, 2007b), Sim et al. (2007), He et al. (2008)

•  Rosti et al. (2007a) ♦  Sentence level

♦  Phrase level

♦  Word level

Confusion network decoding

12

Page 13: Exploration of System Combination in Statistical Machine Translation

Confusion network decoding

• cat sat the mat, cat sitting on the mat, and hat on a mat.

13

Page 14: Exploration of System Combination in Statistical Machine Translation

•  Collaborative decoding (Li et al.,2009) ♦  avoid early pruning of potentially good translations

♦  leverage agreement information of n-grams

•  Multi-Engine Machine Translation (MEMT) ♦  METEOR alignment (Banerjee & Lavie, 2005)

♦  no fixed backbone

Other approaches

14

Page 15: Exploration of System Combination in Statistical Machine Translation

•  Not a trivial problem (Siohan et al., 2005) •  Key point: complementary error patterns •  Approaches: ♦  selecting different systems of different paradigms

♦  diversifying one baseline system Ø  introducing randomness (Siohan et al., 2005) Ø  different morphological decompositions of source language (de

Gispert et al., 2009) Ø  varying alignment algorithms (Xu & Rosti, 2010) Ø  controlling target “trait” values (Devlin and Matsoukas, 2012)

Diverse hypothesis generation

15

Page 16: Exploration of System Combination in Statistical Machine Translation

•  Exploiting multiple Chinese word segmentation standards: Zhang et al. (2008), Dyer et al. (2008), Xu et al. (2005)

•  Zhang et al. (2008): ♦  Exploiting four SIGHAN standards: AS, CITYU, MSR, PKU

Diverse hypothesis generation

16

Page 17: Exploration of System Combination in Statistical Machine Translation

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT) ♦  Overview

♦  Description

•  Experiments •  Conclusion and Future Research

Outline

17

Page 18: Exploration of System Combination in Statistical Machine Translation

•  Open source toolkit: http://kheafield.com/code/memt/ •  WMT system name: cmu-combo (2009), cmu-heafield-

combo (2010, 2011) •  Superior performance in WMT 2011

•  Easy to use, robust and efficient

Overview

18

Page 19: Exploration of System Combination in Statistical Machine Translation

•  Combining 1-best outputs of component systems ♦  Pair-wise alignment (METEOR)

♦  Beam search

♦  Z-MERT tuning (Zaidan, 2009)

•  Features: ♦  length

♦  language model

♦  backoff

♦  match

Description

19

Page 20: Exploration of System Combination in Statistical Machine Translation

•  METEOR alignment: ♦  exact matches

♦  identical stems (Porter, 2001)

♦  WordNet synonyms (Miller, 1995)

♦  TERp unigram paraphrases (Snover et al., 2009)

Description

20

Page 21: Exploration of System Combination in Statistical Machine Translation

•  Search space: ♦  picking one word at a time, from left to right

♦  maintaining two sets of “captured” and “uncaptured” words

♦  no duplication, fluency across switches

♦  no fixed backbone

Description

21

Page 22: Exploration of System Combination in Statistical Machine Translation

•  final hypothesis weaves together parts of component outputs

Description

22

Page 23: Exploration of System Combination in Statistical Machine Translation

•  Introduction •  Literature Review •  Multi-Engine Machine Translation (MEMT)

•  Experiments ♦  MEMT on WMT11

♦  MEMT on NIST MT08

♦  Diversifying Chinese-English phrase-based SMT

♦  Exploiting multiple CWS standards

•  Conclusion and Future Research

Outline

23

Page 24: Exploration of System Combination in Statistical Machine Translation

•  http://www.statmt.org/wmt11 •  two language pairs: French-English and Spanish-English •  Ranking participating systems by BLEU on the test set

•  Selecting different component systems for system combination

MEMT on WMT11

24

Page 25: Exploration of System Combination in Statistical Machine Translation

•  French-English MEMT on WMT11

system combination gain 25

Page 26: Exploration of System Combination in Statistical Machine Translation

•  Spanish-English MEMT on WMT11

system combination gain 26

Page 27: Exploration of System Combination in Statistical Machine Translation

•  Spanish-English ♦  why E1 (combining all) < E2 (excluding the bottom two) ?

MEMT on WMT11

27

Page 28: Exploration of System Combination in Statistical Machine Translation

•  LDC catalog no. LDC2010T21 and LDC2010T01 •  No accompanied system papers •  Challenging: mix of newswire and web texts

•  Chinese-English and Arabic-English ♦  split datasets into tuning set and test set

MEMT on NIST MT08

28

Page 29: Exploration of System Combination in Statistical Machine Translation

•  Chinese-English: ♦  Tuning set: 524 sentences, test set: 788 sentences

♦  Combining the top 5 systems out of 23 systems

♦  similar to Ma and McKeown (2012)

•  Arabic-English ♦  Tuning set: 509 sentences, test set: 803 sentences

♦  Combining the top 7 systems out of 14 systems

MEMT on NIST MT08

29

Page 30: Exploration of System Combination in Statistical Machine Translation

•  Chinese-English, gain = 3.76

MEMT on NIST MT08

30

Page 31: Exploration of System Combination in Statistical Machine Translation

•  Arabic-English, gain = 3.47

MEMT on NIST MT08

31

Page 32: Exploration of System Combination in Statistical Machine Translation

•  Varying different steps of training pipeline •  Tune on MTC1+MTC3 datasets (LDC2002T01 and

LDC2004T07), test on NIST02-NIST08 evaluation sets

•  Varying decoding algorithm: Maximum A Posteriori (MAP), Minimum Bayes Risk (MBR), Lattice Minimum Bayes Risk (LMBR)

•  Varying reordering model: word-based (wbe), phrase-based (phrase), hierarchical (hier), combined reordering (phrase-hier)

Diversifying Chinese-English SMT

32

Page 33: Exploration of System Combination in Statistical Machine Translation

•  Varying decoding algorithm, gain=-0.17

Diversifying Chinese-English SMT

33

Page 34: Exploration of System Combination in Statistical Machine Translation

•  Varying reordering model, gain=0.19 Diversifying Chinese-English SMT

34

Page 35: Exploration of System Combination in Statistical Machine Translation

•  Chinese Word Segmentation ♦  Correlates weakly with MT quality

♦  Potential source of diversity

•  SIGHAN Bakeoff evaluation campaign ♦  Academia Sinica (AS)

♦  City University of Hong Kong (CITYU)

♦  Penn Chinese Treebank (CTB)

♦  Microsoft Research (MSR)

♦  Peking University (PKU)

Exploiting multiple CWS standards

35

Page 36: Exploration of System Combination in Statistical Machine Translation

•  Chinese Word Segmentation

Exploiting multiple CWS standards

36

Page 37: Exploration of System Combination in Statistical Machine Translation

•  Baseline System ♦  Chinese-English phrase-based SMT systems trained with

Moses

♦  Segmenting and training five different systems corresponding to five CWS standards

♦  Training bi-text: 8,290,649 sentence pairs

♦  Interpolated language model of order 5

♦  Tuning set MTC1+MTC3: 1928 sentences, 4 references each

♦  giza++ alignment, combined reordering scheme, MBR decoding

Exploiting multiple CWS standards

37

Page 38: Exploration of System Combination in Statistical Machine Translation

•  System combination experiments ♦  Same tuning set MTC1+MTC3

♦  ZMERT and PRO tuning

♦  Test sets: NIST 2002 to 2006, 2008

♦  Evaluation: mteval-v11b, case-insensitive

Exploiting multiple CWS standards

38

Page 39: Exploration of System Combination in Statistical Machine Translation

•  Results – component systems Exploiting multiple CWS standards

39

Page 40: Exploration of System Combination in Statistical Machine Translation

•  Results – combining 5 systems ♦  Avg gain: 0.52 (ZMERT) and 0.82 (PRO)

Exploiting multiple CWS standards

40

Page 41: Exploration of System Combination in Statistical Machine Translation

•  Results – combining the top 3 systems ♦  Avg gain: 0.35 (ZMERT) and 0.64 (PRO)

♦  Lower than when combining 5 systems

Exploiting multiple CWS standards

41

Page 42: Exploration of System Combination in Statistical Machine Translation

•  Discussion ♦  CWS is a good source to generate diverse SMT systems

♦  Benefits: Ø  Reducing segmentation errors Ø  Reducing out-of-vocabulary words Ø  Providing diverse translations

Exploiting multiple CWS standards

42

Page 43: Exploration of System Combination in Statistical Machine Translation

•  Component system outputs

Exploiting multiple CWS standards

43

Page 44: Exploration of System Combination in Statistical Machine Translation

•  Combined system output

Exploiting multiple CWS standards

44

Page 45: Exploration of System Combination in Statistical Machine Translation

Conclusion and future research

•  Conclusion ♦  System combination does benefit MT

♦  Exceptions Ø  Combining very few systems Ø  Some component systems with exceptionally bad performance Ø  Combining very similar systems (non-complementary)

♦  Achieved the goal of improving Chinese-English SMT system

45

Page 46: Exploration of System Combination in Statistical Machine Translation

Conclusion and future research

•  Future research ♦  Evaluating different combination algorithms

Ø  Collaborative decoding (Li et al., 2009)

♦  Trait-based approach as a way to generate diverse inputs (Devlin and Matsoukas, 2012)

46

Page 47: Exploration of System Combination in Statistical Machine Translation

Summary

•  Empirical experiments ♦  MEMT as system combination module

♦  WMT and NIST evaluation sets

•  System combination does benefit MT quality ♦  comparable, complementary input systems

•  Exploiting multiple CWS as a way to diversify SMT systems ♦  improve a strong Chinese-English phrase-based system

♦  average gain 0.5-0.8 BLEU in NIST02-06 and NIST08

47

Page 48: Exploration of System Combination in Statistical Machine Translation

Thank You

48