Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of...

96
Syntax in Statistical Machine Translation Qun Liu 14 July 2015 at IIS Academia Sinica

Transcript of Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of...

Page 1: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

Syntax in Statistical Machine Translation

Qun Liu 14 July 2015

at IIS Academia Sinica

Page 2: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Overview of Syntax in SMT

Introduc)on  

Syntax-­‐based  Transla0on  Models  

Syntax-­‐based  Language  Models  

Syntax-­‐based  Transla0on  Evalua0ons  

Conclusion  and  Future  Work

Page 3: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Google Translate - An Example

日本和美国的关系 Relationship between Japan and the United States 日本外交政策和美国亚太再平衡的关系 Japan's foreign policy and the US Asia-Pacific rebalancing of relations

Page 4: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Google Translate - An Example

日本和美国的关系 Relationship between Japan and the United States 日本外交政策和美国亚太再平衡的关系 Japan's foreign policy and the US Asia-Pacific rebalancing of relations When input sentences become longer, it is more

difficult for the Google Translate to capture their syntax structures.

Page 5: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Google Translate - An Example

Language  Paris   à   ß  English  - French   91   92  English  - German   77   86  English  - Italian   87   89  

English  - Japanese   26   49  English  - Chinese   17   49  

Aiken, Milam, and Shilpa Balan. "An analysis of Google Translate accuracy." Translation journal 16.2 (2011): 1-3.

Page 6: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Google Translate - An Example

Language  Paris   à   ß  English  - French   91   92  English  - German   77   86  English  - Italian   87   89  

English  - Japanese   26   49  English  - Chinese   17   49  

Aiken, Milam, and Shilpa Balan. "An analysis of Google Translate accuracy." Translation journal 16.2 (2011): 1-3.

Google Translate performs worse for language pairs with bigger difference in syntax structures.

Page 7: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Overview of Syntax in SMT

Introduc0on  

Syntax-­‐based  Transla)on  Models  

Syntax-­‐based  Language  Models  

Syntax-­‐based  Transla0on  Evalua0ons  

Conclusion  and  Future  Work

Page 8: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie MT Pyramid

Vauquois’ Triangle

Analysis Generation

Page 9: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Word-based Models

IBM Model 1-5

Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263-311.

Page 10: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Word-based Models

Page 11: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie IBM Models

Source Target Probability

Bushi (布什)

Bush 0.7

President 0.2

US 0.1

yu (与)

and 0.6

with 0.4

juxing (举行)

hold 0.7

had 0.3 le

(了) hold 0.01

Page 12: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Phrase-based Models

Phrase-based Model Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference, pages 127-133, Edmonton, Canada, May.

Alignment Template Model Franz J. Och and Hermann Ney. 2004. The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, 30(4):417-449.

Page 13: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Phrase-based Models

Page 14: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Phrase-based Model

Source Target Probability

Bushi (布什)

Bush 0.5

president Bush 0.3

the US president 0.2

Bushi yu (布什与)

Bush and 0.8

the president and 0.2

yu Shalong (与沙⻰龙)

and Shalong 0.6

with Shalong 0.4

juxing le huiang (举行了会谈)

hold a meeting 0.7

had a meeting 0.3

Page 15: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Syntax-based Model

Dependency  Syntax-­‐based  Model  

Cons0tuent  Syntax-­‐based  Model  

Hierarchical  Phrase-­‐based  Model  

Page 16: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Hierarchical Phrase-based Model

bushi yu shalong juxing le huitan

Bush held a talk with Sharon

X 1

X 1

X 2

X 2

X 3

X 3

Page 17: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Hierarchical Phrase-based Model

Source Target Probability

juxing le huiang (举行了会谈)

hold a meeting 0.6

had a meeting 0.3

X huitang (X会谈)

X a meeting 0.8

X a talk 0.2

juxing le X (举行了X)

hold a X 0.5

had a X 0.5 Bushi yu Shalong

(布什与沙⻰龙) Bush and Sharon 0.8

Bushi X (布什X)

Bush X 0.7

X yu Y (X与Y)

X and Y 0.9

Page 18: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Hierarchical Phrase-based Model

Advantage: •  Non-linguistic knowledge used

– Language Independent•  High Performance

– Synchronous CFGDisadvantage:

•  Limitation in long distance dependency– Use of Glue Rules for long phrases

Page 19: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Hierarchical Phrase-based Model

Advantage: •  Non-linguistic knowledge used

– Language Independent•  High Performance

– Synchronous CFGDisadvantage:

•  Limitation in long distance dependency– Use of Glue Rules for long phrases

Page 20: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchronous CFG

An Introduction to Synchronous Grammars

David Chiang∗

21 June 2006

1 Introduction

Synchronous context-free grammars are a generalization of context-free grammars (CFGs) that generatepairs of related strings instead of single strings. Thus they are useful in many situations where one mightwant to specify a recursive relationship between two languages. Originally, they were developed in the late1960s for programming-language compilation [1]. In natural language processing, they have been used formachine translation [19, 20, 3] and (less commonly, perhaps) semantic interpretation.

As a preview, consider the following English sentence and its (admittedly somewhat unnatural) equiva-lent in Japanese (with English glosses):

(1) The boy stated that the student said that the teacher danced

(2) shoonen-gathe boy

gakusei-gathe student

sensei-gathe teacher

odottadanced

tothat

ittasaid

tothat

hanasitastated

One might imagine writing a finite-state transducer to perform a word-for-word translation between Englishand Japanese sentences, but not to perform the kind of reordering seen here. But a synchronous CFG can dothis.

The term synchronous CFG is recent and far from universal. They were originally known as syntax-directed transduction grammars [10] or syntax-directed translation schemata [1], the latter still probablybeing the most common name. In the NLP community they are also known as 2-multitext grammars [11],and inversion transduction grammars [19] are a special case of synchronous CFGs.

2 Definition

We give only an informal definition here. A synchronous CFG is like a CFG, but its productions have tworight-hand sides—call them the source side and the target side—that are related in a certain way. Below isan example synchronous CFG for a fragment of English and Japanese:

S→ ⟨NP 1 VP 2 ,NP 1 VP 2 ⟩(3)VP→ ⟨V 1 NP 2 ,NP 2 V 1 ⟩(4)NP→ ⟨i,watashi wa⟩(5)NP→ ⟨the box, hako wo⟩(6)V→ ⟨open, akemasu⟩(7)

∗Thanks to Philip Resnik, Anoop Sarkar, Kevin Knight, and Liang Huang for their helpful feedback.

1

The implementation of decoding algorithm is straightforward – just like a parsing procedure, either CYK or Chart algorithm works

Page 21: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Hierarchical Phrase-based Model

Advantage: •  Non-linguistic knowledge used

– Language Independent•  High Performance

– Synchronous CFGDisadvantage:

•  Limitation in long distance dependency– Use of Glue Rules for long phrases

Page 22: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie

•  Using Glue Rules means sequentially concatenating all the target phrases, which lead to a back-off to phrase based model

•  Two cases to use Glue Rules: Ø No hierarchical rules applicable Ø The span to be covered by the hierarchical

rule is longer than a threshold

Glue Rules

Page 23: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Glue Rules

•  Using Glue Rules means sequentially concatenate all the phrases which lead to a back-off to phrase based model

•  Two cases to use Glue Rules: Ø No hierarchical rules applicable Ø The span to be covered by the hierarchical

rule is longer than a threshold

Hierarchical  Rules  failed  to  capture  dependency  between  words  with  a  distance  longer  than  a  threshold  

 

Page 24: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Syntax-based Model

Dependency  Syntax-­‐based  Model  

Cons0tuent  Syntax-­‐based  Model  

Hierarchical  Phrase-­‐based  Model  

Page 25: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Forest-­‐based  

Tree-­‐to-­‐Tree  

Tree-­‐to-­‐String  String-­‐to-­‐Tree  

Page 26: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie String-to-Tree Model

●  Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical machine translation model. In Proceedings of ACL 2001.

●  Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of EMNLP 2006.

●  Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of COLING-ACL 2006.

Page 27: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie String-to-Tree Model

Page 28: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie String-to-Tree Model

Source Target Probability

juxing le huiang

(举行了会谈)

VP(VPD(hold) NP(DT(a) NN(meeting))) 0.6

VP(VPD(had) NP(DP(a) NN(meeting))) 0.3

VP(VPD(had) NP(DT(a) NN(talk))) 0.1

x1 huitang (x1会谈)

VP(x1:VPD NP(DT(a) NN(meeting))) 0.8

VP(x1:VPD NP(DT(a) NN(talk))) 0.2

juxing le x1 (举行了x1)

VP(VPD(hold) NP(DT(a) x1:NN)) 0.5 VP(VPD(had) NP(DT(a) x1:NN)) 0.5

x1 yu x2 (x1与x2)

NP(x1:NNP CC(and) x2:NNP)) 0.9

Page 29: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Tree-to-String Model

●  Yang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-String Alignment Template for Statistical Machine Translation. In Proceedings of COLING/ACL 2006, pages 609-616, Sydney, Australia, July.

(Meritorious Asian NLP Paper Award) •  Huang, Liang, Kevin Knight, and Aravind Joshi.

"Statistical syntax-directed translation with extended domain of locality." Proceedings of AMTA. 2006.

Page 30: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Tree-to-String Model

Page 31: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Tree-to-String Model

Source Target Probability

VPB(VS(juxing) AS(le) NPB(huiang)) (举行了会谈)

hold a meeting 0.6 have a meeting 0.3

have a talk 0.1

VPB(VS(juxing) AS(le) x1) (举行了x1)

hold a x1 0.5 have a x1 0.5

VP(PP(P(yu) x1:NPB) x2:VPB) (与 x1 x2)

x2 with x1 0.9

IP(x1:NPB VP(x2:PP x3:VPB)) x1 x3 x2 0.7

Page 32: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Tree-to-Tree Model

●  Jason Eisner. 2003. Learning non-isomorphic tree mappings for machine translation. In Proc. of ACL 2003

●  Min Zhang, Hongfei Jiang, Aiti Aw, Haizhou Li, Chew Lim Tan, and Sheng Li. "A tree sequence alignment-based tree-to-tree translation model." ACL-08: HLT (2008): 559.

●  Yang Liu, Yajuan Lü, and Qun Liu. 2009. Improving Tree-to-Tree Translation with Packed Forests. In Proceedings of ACL/IJCNLP 2009, pages 558-566, Singapore, August.

Page 33: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Tree-to-Tree Model

Page 34: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Advantage: •  Linguistic knowledge used

Ø  Long distance dependencyDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity•  Computational Complexity

Ø  Synchronous TSG

Page 35: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Advantage: •  Linguistic knowledge used

Ø  Long distance dependencyDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity•  Computational Complexity

Ø  Synchronous TSG

Page 36: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Ungrammatical Phrases

•  Pure tree-based models get very low performance, even lower than phrase-based models

•  Various techniques are developed to incorporate ungrammatical phrases into tree-based models, which lead to an significant improvement on tree-based models He is afraid of death

R V A P N

PP

AP

VP

S

Page 37: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Advantage: •  Linguistic knowledge used

Ø  Long distance dependencyDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity•  Computational Complexity

Ø  Synchronous TSG

Page 38: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Syntactic Ambiguity

Page 39: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie 1-best ➜ n-best trees?

Page 40: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Forest-based Translation

•  Mi, Haitao, Liang Huang, and Qun Liu. "Forest-Based Translation." Proceedings of ACL 2008.

•  Mi, Haitao, and Liang Huang. "Forest-based translation rule extraction." Proceedings of the EMNLP 2008.

Page 41: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Packed Forest

Page 42: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie N-best Trees vs. Forest

Page 43: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Advantage: •  Linguistic knowledge used

Ø  Long distance dependencyDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity•  Computational Complexity

Ø  Synchronous TSG

Page 44: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchronous TSG

In a TSG derivation, we start with an elementary tree that is rooted in the start symbol, and then repeatedlychoose a leaf nonterminal symbol X and attach to it an elementary tree rooted in X. For example:

S

NP VP

V

misses

NP⇒

S

NP

John

VP

V

misses

NP(24)

S

NP

John

VP

V

misses

NP

Mary

(25)

In a synchronous TSG, the productions are pairs of elementary trees, and the leaf nonterminals are linkedjust as in synchronous CFG:

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

S

NP 1 VP

V

misses

NP 2

,

S

NP 2 VP

V

manque

PP

P

a

NP 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝NP

John,NP

Jean

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

NP

Mary,NP

Marie

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

8

Synchronous CFG can be regarded as a special case of Synchronous TSG where the trees are limited to have only two layers of nodes

Page 45: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous CFG, there is only one possible tree in the source side

Page 46: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 47: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 48: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 49: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 50: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 51: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 52: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

Considering matching a rule starting from the root node

For Synchronous TSG, there are a large number of possible trees in the source side

Page 53: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Synchr. TSG vs Synchr. CFG

•  The implementation of Synchronous TSG is much more complex than Synchronous CFG, both in space and in time

•  Technologies are developed to deal with the rule indexing problem for Synchronous TSG decoder [Zhang et al., ACL-IJCNLP 2009]

•  The syntax based decoder implemented in Moses does not support Synchronous TSG model with rules having more than two layers.

Page 54: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Constituent Syntax-based Models

Advantage: •  Linguistic knowledge used

Ø  Long distance dependencyDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity•  Computational Complexity

Ø  Synchronous TSGIs it possible to build a linguistically syntax-based model with the complexity of Synchronous CFG?

Page 55: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Syntax-based Model

Dependency  Syntax-­‐based  Model  

Cons0tuent  Syntax-­‐based  Model  

Hierarchical  Phrase-­‐based  Model  

Page 56: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie History

Ding Y. et al. 2003, 2004 Quick C. et al. 2005 Xiong D. et al. 2007

Page 57: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Difficult of Dependency-based SMT

Page 58: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Difficult of Dependency-based SMT

…世界杯(World Cup)…在(in)…成功(Successfully) 举行(was held)

Problem: Low Coverage, Sparcity

A dependency translation rule:

Page 59: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie History

Ding Y. et al. 2003, 2004 Quick C. et al. 2005 Xiong D. et al. 2007

Dependency-Treelet-based Approach

Page 60: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dependency-Treelet-based Approach

Dependency Treelet: Any connected subgraph of a dependency tree

Page 61: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dependency-Treelet-based Rules

Page 62: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Problem of Dep-Treelet-based Approach

•  The partition of a dependency tree to a set of treelets is too flexible (more flexible than the partition of a constituent tree in a tree-to-string model)

•  The reordering is difficult in target side: -  These are no sequential orders between treelets-  The translation of a treelet is usually non-

continuous

Page 63: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dependency-to-String Model

• Our Solution -  One layer subtree (head-dependency)-  Using POS for Smoothing

Jun Xie, Haitao Mi and Qun Liu, A novel dependency-to-string model for statistical machine translation, in the Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2011), pages 216-226, Edinburgh, Scotland, UK. July 27–31, 2011

Page 64: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dep-to-String Rule

Page 65: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Leaf nodes

Page 66: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Internal nodes

Page 67: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Leaf & Internal node

Page 68: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Head node

Page 69: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Head & Leaf nodes

Page 70: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: Head & Internal nodes

Page 71: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Smoothing with: All nodes

Page 72: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Experiments

Page 73: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dependency-to-String Model

Advantage: •  Linguistic knowledge used

Ø  Long distance dependency•  Computational Complexity

Ø  Equivalent to: Synchronous CFGDisadvantage:

•  Ungrammatical phrases•  Syntactic Ambiguity

Page 74: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Dependency-to-String Model implemented as Synchronous CFG

Liangyou Li, Jun Xie, Andy Way, Qun Liu, Transformation and Decomposition for Efficiently Implementing and Improving Dependency-to-String Model In Moses, In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Pages 122-131. Doha, Qatar. 2014. •  Implement Dependency-to-String in a Synchronous CFG

which is compatible with Moses chart decoder -  Open Source Tools: dep2str

•  Implement pseudo-forest to support partially matched head-dependency structures

Page 75: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Summary: Syntax-based Models

Syntax-­‐based  

Formally  

ME-­‐BTG  Model  

Hierarchical  Phrase-­‐Based  Model  

Head  Transducer  

Linguis0cally  

cons0tuent  

Tree-­‐to-­‐String  Model  

String-­‐to-­‐Tree  Model  

Tree-­‐to-­‐Tree  Model  

Dependency  

Tree-­‐to-­‐String  Model  

String-­‐to-­‐Tree  Model  

Tree-­‐to-­‐Tree  Model  

Page 76: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Summary: Syntax-based Models

Syntax-­‐based  

Formally  

ME-­‐BTG  Model  

Hierarchical  Phrase-­‐Based  Model  

Head  Transducer  

Linguis0cally  

cons0tuent  

Tree-­‐to-­‐String  Model  

String-­‐to-­‐Tree  Model  

Tree-­‐to-­‐Tree  Model  

Dependency  

Tree-­‐to-­‐String  Model  

String-­‐to-­‐Tree  Model  

Tree-­‐to-­‐Tree  Model  

Our Contribution:

Page 77: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Overview of Syntax in SMT

Introduc0on  

Syntax-­‐based  Transla0on  Models  

Syntax-­‐based  Language  Models  

Syntax-­‐based  Transla0on  Evalua0ons  

Conclusion  and  Future  Work

Page 78: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Structural Language Model

Using syntax information to capture long distance dependency in target side

Page 79: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Structural Language Model

Using syntax information to capture long distance dependency in target side

Page 80: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Existing work

•  Generative Structural Language Model (Charniak, 2003) Eugene Charniak, Kevin Knight, and Kenji Yamada. 2003. Syntax-based language models for statistical machine translation. In Proceedings of MT Summit IX. Intl. Assoc. for Machine Translation.

•  Idea -  Estimate head n-gram probability-  Using POS for smoothing

•  Disadvantage -  Only available when the target tree is generated-  Can only be used in re-ranking rather than decoding-  Generative model: features are fixed and not tunable

Page 81: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Our work

Heng Yu, Haitao Mi, Liang Huang, and Qun Liu. 2014. A Structured Language Model For Incremental Tree-to-String Translation. To be appeared in Proceedings of the 25th International Conference on Computational Linguistics (Coling2014)

•  Dependency-based Language Model •  Incremental: can be used in left-to-right decoding •  Discriminative Model:

•  Large number of used-defined features•  Feature weights tunable

Page 82: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Preliminary work

•  Incremental Tree-to-String Decoding Liang Huang and Haitao Mi. 2010. Efficient incremental decoding for tree-to-string translation. In Proceedings of EMNLP, pages 273–283.

•  Structured perceptron with inexact search Liang Huang, Suphan Fayong, and Yang Guo. 2012. Structured perceptron with inexact search. In Proceedings of NAACL 2012, Montreal, Quebec.

Page 83: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Incremental Tree-to-String Decoding Huang 2010

Page 84: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Incremental Tree-to-String Decoding

Left-to-Right vs Top-down or Bottom-up

Huang 2010

Page 85: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Online Structural LM for SMT

Page 86: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Experiment Results

Decoding time:

8 x Baseline

Training Corpus: 1.5M sent. pairs

Page 87: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Overview of Syntax in SMT

Introduc0on  

Syntax-­‐based  Transla0on  Models  

Syntax-­‐based  Language  Models  

Syntax-­‐based  Transla)on  Evalua)ons  

Conclusion  and  Future  Work

Page 88: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Evaluation for Machine Translation

Candidate 1: It is a guide to action which ensures that the military always obeys the command of the party Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct

Reference 1: It is a guide to action that ensures that the military will forever heed party commands Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the party Reference 3: It is the practical guide for the army to heed the directions of the party Question: Given the human translations as references, how to evaluation the machine translation candidates automatically?

Page 89: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Existing MT Evaluation Metrics

•  Lexicalized Metrics BLEU NIST Rouge WER PER METEOR AMBER

•  Syntax-based Metrics STM HWCM

•  Semantic-based Metrics MEANT HMEANT

•  Combinational Metrics LAYERED DISCOTK

Page 90: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Existing MT Evaluation Metrics

Page 91: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Parsing as Evaluation

•  We proposed a novel MT Evaluation Metrics based on Dependency Parsing Model

•  We use the reference translations as the training corpus to train a parser

•  The parser are used to parse the translation candidates

•  The score of the parsing model obtained by the translation candidates are regarded as its quality score.

Page 92: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie WMT 2015 Metric Shared Tasks

Page 93: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Overview of Syntax in SMT

Introduc0on  

Syntax-­‐based  Transla0on  Models  

Syntax-­‐based  Language  Models  

Syntax-­‐based  Transla0on  Evalua0ons  

Conclusion  and  Future  Work

Page 94: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Conclusion

•  Syntax-based Translation Models •  Constituent Tree-to-String Model•  Forest-based Translation Approach•  Dependency-based Model

•  Syntax-based Language Model •  Online Discriminative Structural LM for SMT

•  Syntax-based Translation Evaluation Metrics •  Dependency Parsing as Evaluation for SMT

Page 95: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

www.adaptcentre.ie Future Work

•  Graph-based Translation Model -  Sequence-based è Tree-based è Graph-based-  A natural framework to incorporate various linguistic

knowledge (1)  n-gram (2) morphology (3) syntax (4) semantic

•  Dependency Parsing as Evaluation for SMT -  Extension to a discriminative model-  Used as a combination framework

Page 96: Syntax in Statistical Machine Translation...Statistical Phrase-Based Translation. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics

Q&A !Qun Liu, Professor, Dr.,PI

ADAPT Centre Dublin City University

Institute of Computing Technology Chinese Academy of Sciences

Email: [email protected]