Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech...

47
Left-to-Right Hierarchical Phrase-based Translation System (LR-Hiero) Maryam Siahbani

Transcript of Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech...

Page 1: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Left-to-Right Hierarchical Phrase-based Translation System(LR-Hiero)

Maryam Siahbani

Page 2: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Overview

• History of Machine Translation• Rule based MT• Statistical MT– Training – Decoding

• Left-to-Right Hierarchical Phrase-based MT• Using LR-Hiero in Simultaneous Translation

2

Page 3: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

History of Machine Translation

• Late 1940’s: Early rule-based systems– computers would replace human translations within 5

years!• 1966: ALPAC report cuts research funding• Early 1970’s: First commercial system (Systran)• Late 1980’s: IBM developed first statistical models

inspired by speech research• Late 2000’s: Explosion in MT research• 2006: First version of Google Translate

3

Page 4: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Rule-based Machine Translation

• Rules hand-written by linguists

• State of the art until early 2000’s– e.g. Systran

• Expensive to create maintain and adapt 4

FrenchNP

Nounchat

Adjectivenoir

EnglishNP

Nouncat

Adjectiveblack

Page 5: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Statistical Machine Translation

• Data driven approaches to MT• Learn translation from textual data– Parallel Data

• Language independent • Normally use probabilistic models – The best translation = the most probable translation where f: source sentence

• State of the art for most language pairs– Best systems include rules (hybrid)

5

Page 6: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

translationmodel

Statistical Machine Translation

6

Training Pipeline

Training dataMonolingual & Bilingual data

Decoder

Input sentence

translation

Page 7: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Translation Data

Parallel Text:(Web, United Nations, European/Canadian Parliament, Wikipedia, etc.)

Page 8: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Statistical Machine Translation (SMT)

8

Aligned Words

EnZhhappens

发生 事情我们十分 关注 的we are very much concerned with what in region

地区非洲African

Learn alignment from parallel text

Page 9: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Statistical Machine Translation (SMT)

9

Aligned Words

EnZh

Translation rules

happens

发生 事情我们十分 关注 的we are very much concerned with what in region

地区非洲African

Learn alignment from parallel text

Id Source Target Weight

r1 关注 X_1 concerned with X_1 -5.3

r2 X_1 发生 X_2 事情

what happens X_2 X_1 -4.8

r3 非洲 地区 African region -3.1Learn weighted translation rules from word aligned text

Page 10: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Translation Rules (phrase-pairs)

10

Source Target p(e|f)

den Vorschlag the proposal 0.6227den Vorschlag ‘s proposal 0.1068den Vorschlag a proposal 0.0341den Vorschlag the idea 0.0250den Vorschlag this proposal 0.0227den Vorschlag proposal 0.0205den Vorschlag of the proposal 0.0159den Vorschlag the proposals 0.0159

* German-English phrase table trained on Europarl

Millions of translation rules

Log probability -1.7986

Page 11: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

translationmodel

Statistical Machine Translation (SMT)

11

drdyee

rhwfePe )(.maxarg)|(maxarg*)(

Aligned Words

EnZh

Translation rules

Decoder

happens

发生 事情我们十分 关注 的we are very much concerned with what in region

地区非洲African

Learn alignment from parallel text

Id Source Target Weight

r1 关注 X_1 concerned with X_1 -5.3

r2 X_1 发生 X_2 事情

what happens X_2 X_1 -4.8

r3 非洲 地区 African region -3.1Learn weighted translation rules from word aligned text

Decoder generates many candidate translations, scores them and returns the most likely one

Find the translation for any given input (f)

f e

Page 12: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Measuring Translation Quality: BLEU score

• BLEU is a simple but effective scoring metric shown to be proportional to human judgment of translation quality

• The idea is to measure overlap between the translation generated by MT system and the reference translation

• Measure one word overlaps, two word overlaps,… (n-grams)

• Compute precision score for each n-gram• Impose a brevity penalty for candidates that are shorter

than reference

12

Page 13: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Measuring Translation Quality: BLEU score

• Input:– Ich war in meinen zwangzigern bevor ich erstmals in ein

kunstmuseum ging .• Reference translation:– I was in my twenties before I ever went to an art museum .

• Low BLEU score (41.1):– I was twenty I ever went to art .

• High BLEU score (89.0):– I was in my twenties before I first went to an art museum .

13

Page 14: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Hierarchical Phrase-based Translation (Hiero)

Page 15: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

SCFG

Hierarchical Phrase-based Translation

Synchronous Context-Free Grammar

15

Aligned Words

EnZh

Translation Rules

X -> < 我们十分 X_1 / we are very much X_1>

X -> < 事情 / what >

我们 十分 关注 发生 的 事情 地区非洲

(Hiero)

X -> < 非洲 地区 / african region >

we are very much

X-> < 关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>

concerned with happens inwhat african region

X -> < 我们十分 X_1 / we are very much X_1>X-> < 关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>X -> < 事情 / what >X -> < 非洲 地区 / african region >

translationmodel

Decoder

Page 16: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Hiero Decoder

O(n^3)LM computation

我们 关注 发生 的 事情 地区十分 非洲 。

we are very much concerned with what happens in african regions .

X_2

X_1 X_2= what

X -> < 关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1>

X_1= african region

concerned with happens in

what african region

LM LM LM

Bottom-up Dynamic Programing algorithm

we are very much concerned with

16

Page 17: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Left-to-Right Hierarchical Phrase-based Translation System

Page 18: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Left-to-Right Target Generation (Watanabe et al. 2006)

18

X1

X1

X1

we are very much

concerned with

X2what happens X1

in african region

X1

X1

X1

我们十分关注

X2发生X1

的非洲 地区发生

的我们 关注 发生 事情 地区十分 非洲

we are very much concerned with what happens african regionin

X -> < 我们十分 X_1 / we are very much X_1>

X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>

X -> < 关注 X_1 / concerned with X_1>

X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF

Greibach Normal Form (GNF)

Page 19: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X

• Dynamic programming algorithm to extract rules for LR-– Linear time complexity (in number of rules)

LR-Hiero Rule Extraction

19

< 我们十分 X_1 / we are very much X_1>

事情

happens

发生我们十分 关注 的

we are very much concerned with what in region

地区非洲

AfricanX_1

X_1

Page 20: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X

• A novel Dynamic programming algorithm to extract rules for LR-Hiero– Linear time complexity vs. exhaustive search

LR-Hiero Rule Extraction

20

< 我们十分 X_1 / we are very much X_1>

事情

happens

发生我们十分 关注 的

we are very much concerned with what in region

地区非洲

African

X2X_1

< X_1 发生 X_2 事情 / what happens X_2 X_1>

X2 X_1

Page 21: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

• Linear time complexity vs. exhaustive search• Can easily extract rules with more non-terminals

LR-Hiero Rule Extraction

21

1 2 3 40

500100015002000250030003500

Effect of No. of Non-terminals on extraction time

Hiero HeuristicDP Extractor

No. of Non-terminals

Tim

e (s

ec.)

Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A. Sarkar. AMTA(2014)

Page 22: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

的Left-to-Right Decoding

X -> < 我们十分 X_1 / we are very much X_1>

X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>

X -> < 非洲 地区 / African region >

<s> [0,8]<s> <s> we are very much<s> we are very much concerned with<s> we are very much concerned with what happens

<s> we are very much concerned with what happens in

0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲

X -> < 关注 X_1 / concerned with X_1>

X -> < 的 / in >

we are very much[2,8]concerned with[3,8]what happens[6,7] [3,5]

in

[3,5]African region

22

Page 23: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

的Left-to-Right Decoding

<s> [0,8]<s> we are very much [2,8] <s> we are very much concerned with [3,8] <s> we are very much concerned with what happens [6,7][3.5] <s> we are very much concerned with what happens in [3,5]<s> we are very much concerned with what happens in African region

0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲

Typical CKY: 23

drdyt

rfwt )(.maxarg*)(

Candidate translations are scored by:

< 我们十分 X_1 / we are very much X_1>, -4.7

<X_1 发生 X_2 事情 / what happens X_2 X_1>, -3.6

< 非洲 地区 / African region >, -2.7

< 关注 X_1 / concerned with X_1>, -3.8

< 的 / in >, -1.2

, -7.7, -7.1

, -5.9, -4.5

, -3.3, 0

Page 24: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

LR-Hiero State-of-the-art

1000 2000 3000 4000 5000 6000 7000 800017

19

21

23

25

27

29

Czech-EnglishGerman-EnglishChinese-English

LM Calls (translation time)

BLEU

(tra

nsla

tion

accu

racy

)LR-Hiero Results

3 Times FasterComparable Translation Accuracy

Page 25: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Statistical Machine Translation (SMT)

• Available SMT systems:– Moses (Edinburgh)– Phrasal (Stanford)– Jane 2 (Aachen University)– Joshua (JHU)– Kriya (SFU)– CDEC (CMU)– LR-Hiero

Phrase-Based

Hierarchical Phrase-Based(Hiero)

Left-to-Right Hierarchical Phrase-based

Available : https://github.com/sfu-natlang/lrhiero

• Time efficient • Can model complex translation• Generates translation in left-to-right

manner• Suitable choice for online translation

Page 26: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Simultaneous Translation

Page 27: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Speech to Speech Translation

Karlsruhe (KIT) Lecture Translator

NICT Speech Translator Skype Translator

Page 28: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Incremental Translation

• Facilitate continuous translation with low latency– Latency: time difference between start of source

sentence (speech) and start of target sentence (speech)

• Ensure acceptable translation accuracy

Good evening, I would like a taxi to the airport please

Buenas noches. Quiero untaxi al aeropuerto por favor

6 sec

Good evening, I would 0.7 sec

0.2 sec

0.2 seclike a taxi

to the airport please

Non-incremental

Buenas noches quiero

como un taxi

al aeropuerto por favor

Incremental

Page 29: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

translate

segment?

Good

Integrating Segmentation with Translation Process

Page 30: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

segment?

Goodevening translate

Integrating Segmentation with Translation Process

Page 31: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Integrating Segmentation with Translation Process

segment?

Good eveningI Buenas nochestranslate

Page 32: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Incremental Translation Results

Translation accuracy measure

• Task: English-German TED speech translation• MT System Training Data: IWSLT 2013 Train data +

Europarl v7 data [Koehn 2005]

Bleu Latency (sec) Segs/SecondNon-incremental 21.08 6.353 0.15Prosodic 20.88 0.468 2.27Incremental 20.86 0.311 3.22

Page 33: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Publications

33

• Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. Siahbani, Maryam and Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)

• Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. Siahbani, Maryam and Sarkar, Anoop. EMNLP(2014)

• Expressive Hierarchical Rule Extraction for Left-to-Right Translation. Siahbani, Maryam and Sarkar, Anoop. AMTA(2014)

• Incremental Translation using a Hierarchical Phrase-based Translation System. Siahbani, Maryam and Mehdizadeh Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT (2014)complexity (in number of rules)

Page 34: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Question?

Page 35: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Partial Hypothesis

<s> [0,8], -3.3

<s> we are very much [2,8], -4.5

的0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲

<s> we are very much concerned with [3,8], -5.9

<s> we are very much concerned with what happens [6,7][3,5], -7.1

Page 36: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

LR-Decoding with Beam Search• LR-Decoding integrated with beam-search (Watanabe

et al. 2006)• Stacks: hypotheses with same number of source side

words covered• Exhaustively generating all possible partial

hypotheses for a given stack

36

Page 37: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Cube pruning• Each cube: a group of hypotheses and applicable

rules • Cubes are fed to a priority queue which fills the

current stack

37

Page 38: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

• Rows: hypotheses• Columns: rules• Rows and columns are sorted based on the scores• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

Cube pruning

38

12.5 12.4 14.3

12.6 12.8 14.7

13.3 13.5 15.4

0.9 1.1 3.2

students have not yet 10.2 12.512.512.412.4

mad

e

done

do

pupils have not yet 11.5

student has not 12.7

Page 39: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Time Efficiency: avg of LM queries

Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39

Watanabe et al. (2006)

Page 40: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Reordering Features

• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU scores less than Hiero

40

Watanabe et al. (2006)

Page 41: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Reordering Features

• Distortion feature (when apply each rule)

• Number of reordering rules (non-terminals on source and target side are reordered)

41

r<>= 1r<>= 0

<X_1 发生 X_2 事情 / what happens X_2 X_1>

<X_1 发生 X_2 事情 / what happens X_1 X_2><X_1 发生 X_2 事情 / what happens X_2 X_1>

的0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲

d = (5-3) + (7-6) + (8-6) + (7-3) + (8-5)

Page 42: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Translation Quality

Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42

Watanabe et al. (2006)

Page 43: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Search Error in Cube Pruning

43

8.1 8.2 8.5

8.0 8.4 8.6

8.3 8.9 8.8

0.9 1.3 3.2

6.6

6.7

6.9

9.1 8.9 9.3

8.0 8.5 9.0

7.7 7.9 8.1

1.0 1.3 1.5

6.2

6.3

6.5

8.1

8.0 8.18.08.28.2

• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

• Adding LM score violates the assumption

Page 44: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Search Error in Cube Pruning

44

• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

• Adding LM score violates the assumption

8.1 8.2 8.5

8.0 8.4 8.6

8.3 8.9 8.8

0.9 1.3 3.2

6.6

6.7

6.9

9.1 8.9 9.3

8.0 8.5 9.0

7.7 7.9 8.1

1.0 1.3 1.5

6.2

6.3

6.5

8.08.0 8.08.07.7

7.7

Queue diversity

Page 45: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Queue Diversity

Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45

Chinese-English23.5

24

24.5

25

25.5

26

26.5

BLEU score

LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)

Chinese-English0

500010000150002000025000300003500040000

No. LM calls

LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)

Page 46: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Lexicalized Reordering Model

• Distortion penalty is weak– deviation from the monotonic translation

• Learn reordering preferences for each phrase (respect to previous phrase)– Monotone– Swap– Discontinuous

46

F

EFigure from "Statistical Machine Translation“ Koehn 2010

Page 47: Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech Translation - Maryam Siahbani

Lexicalized Reordering Model

• Collect orientation information during rule extraction– Convert each rule to a phrase-pair (possibly discontinuous)– M: If there is a phrase-pair on the top-left– S: If there is a phrase-pair on the top right– D: otherwise

• Estimation by relative frequency

47

F

E

Figure from "Statistical Machine Translation“ Koehn 2010