Machine TranslationCourse 8
Diana TrandabățAcademic year: 2014-2015
Reordering
• Different languages put words in different order• Computational Problem
– during translation we have to compose an output sentence
– when composing a sentence of n words, there are n! possible orderings
– exponential spaces are hard to search• Linguistic Problem
– how do different languages define the word order?– how can we model reordering transformations during
translation?
Reordering
• Reordering remains a hard problem for SMT and motives new models
• Reordering is the most important feature for predicting translation performance (on indo-european languages)
Bag of words
the nobody security Swiss account the of will bank trust
nobody will trust the security of the Swiss bank account
•From New York Times•Can you put the words in the correct order?
– Does a good language model suffice?
The Language Model/Translation Model Debate
• Language model argument– Humans can make sense of scrambled sentences,
machines should too– Build really big language models to determine what makes
sense in the target language• Translation model argument
– structure in source sentence defines relationship between words
– we can learn how to map this structure into a target-language structure
Example
German: Ich halte eine Vorlesung über die Reihenfolge von Wörten
English: I teach a class about the order of words
Monotone reordering (no reordering)
Example
French: Barack Obama vit dans la Maison Blanche
English:Barack Obama lives in the White House
Adjective/noun swapping
Example
German:Ich muss eine Vorlesung über die Reihenfolge von Wörten halten
English: I must teach a class about the order of words
German main verb movement
Example
English:we have taken these considerations very much into account when we designed the budget
French:
lorsque nous avons préparé le budget , nous avons pris cela en considération
Clause reordering, zero alignment of the comma
“Free” word order
• The following German sentences mean the same:– Der Mann gibt der Frau das Buch.– Das Buch gibt der Mann der Frau.– Der Frau gibt der Mann das Buch.– Der Mann gibt das Buch der Frau.– Das Buch gibt der Frau der Mann.– Der Frau gibt das Buch der Mann.
Amount of reordering (in a couple of European languages)
From (Birch et al., EMNLP2008)
Amount of reorderingItalian itSwedish svEnglish enGreek elPortuguese ptDanish daSpanish esFrench fr*Dutch nl*Finnish fi*German de
* hard languages for reordering
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Distortion
• Corresponding words do not appear in the same order in source and target languages
• Assign an offset to move words from their original positing to their final position
• In IBM model 3, the distortion factor determines how likely it is that an English word in position i aligns to a foreign word in position j, given the lengths of both sentences:
• Note, positions are absolute positions
Distortion (Reordering)
• Making the offset dependent on all the words would be too costly… so in IBM model 3, the offset only depends:
• on the position of the word within the sentence!!!• the length of the sentences in both languages• P(offset=o | Position = p, SourceLength = m,
ForeignLength = n)– ex: chien brun --> brown dog
• brown --> offset -1 dog --> offset +1– offset of brown = P(offset| 1,2,2)– ex: P(+1| 1,2,2) = 0.3 P(0| 1,2,2) = 0.6 P(-1| 1,2,2) = 0.1
An Example (en|fr)
Given the English
The brown dog did not
go home
Fertility Model
1 1 1 1 2 1 3
Transformed English
The brown dog did not not go home home home
Translation Model
Le brun chien est n' pas allé à la maison
Offset Model 0 +1 -1 +1 -1 0 0 0 0 0
A possible Translation
Le chien brun n' est pas allé à la maison
• Then use Language Model P(e) to evaluate fluency of all possible translations
P(e) x e)|P(f argmaxe*e
Distortion Probabilities• The translation model includes probabilities for “distortion”
– Ex.: P(5|2,4,6)– the probability that ws in position 2 (2nd word in the
source language) will produce a wt in position 5 when the source sentence has 4 words and the target sentence has 6 words.
Distortion Probabilities• For phrase-based MT: measure distortion of phrase i as
the distance between the start of the f phrase generated by ēi, ai and the end of the phrase generated by the previous phrase ēi-1 with its alignments ai-1
• Where:– f phrase is the phrase in the source language– ēi is the phrase in the target language corresponding fo f– ai is the alignment of phrase i– ēi-1 is the previous phrase in the target language
• Typically assume the probability of a distortion decreases exponentially with the distance of the movement.
19
Set 0<<1 based on fit to phrase-aligned training dataThen set c to normalize the distortion of our phrase i, d(i) , so it sums to 1.
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Distance Model
• Simple reordering model– Cost of a reordering depends only on the distance of the
reordering– Distribution can be estimated from alignment– Or just a Gaussian with mean 1– Or log p( aj | aj-1, I) = aj – aj-1 i.e. reordering cost
proportional to distance
Distance-based Reordering• Distance-based reordering models have a simplistic view of
language– Consider how many words a phrase moves (e.g. a maximum of 6
words)• In reality, movement depends on the words
– German: modal verbs require moving the main verb to the end of the sentence
– French: adjectives require moving the noun before the adjective • Use the aligned training corpus to estimate a model• These models do not generalize well since their parameters
are tied to absolute word position within sentences which tend to be different for the same words across sentences.
Relative distortion model
– defined in terms of the difference between the position of the current phrase and the position of the previous phrase in the source sentence.
– Or, assigning a probability distribution over possible relative jumps conditioned on source words.
– Conditioning on the source words allows for a much more fine-grained model. For instance, words that tend to act as modifers (e.g., adjectives) would have a different distribution than verbs or nouns => this lead to lexicalized reordering models
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Lexicalized Reordering Models• Also called Block Distortion model
– Used in Moses SMT system– Condition on actual words– Different possibilities:
• Condition on source words vs target words• Condition on words at start of jump (O - outbound) vs words at
landing point (I - inbound) ),,,,|(111 Ieeffaap
jj aajjjj
E
F Where:aj-aj-1 is the offsetfj is the foreign word at position jfj-1 is the previous foreign word is the source word aligned with foreign word at position j is the source word aligned with the previous foreign wordI or O stands for inbound (end of jump) or outbound (start of jump)
jae
1jae
Lexicalized model
• Three types of reordering:– monotone (m)– swap (s)– discontinuous (d)
Block Distortion Model• Given current block, look at links at the corners• Top: how did I come from previous phrase?• Bottom: how do I continue to next phrase?
F
E
CurrentBlock
LeftTop
LeftBottom
RightBottom
RightTop
PreviousBlock
NextBlock
CurrentBlock
Block Distortion Model• Top-Left: prev-to-current = monotone
F
E
CurrentBlock
LeftTop
PreviousBlock
Block Distortion Model• Top-Right: prev-to-current = swap
F
E
CurrentBlock
RightTop
PreviousBlock
Block Distortion Model• Neither top-left nor top-right: prev-to-current
= disjoint
F
E
CurrentBlock
PreviousBlock
Block Distortion Model• Bottom-Right: current-to-next = monotone
F
E
CurrentBlock
NextBlock
Block Distortion Model• Bottom-Left: current-to-next = swap
F
E
CurrentBlock
NextBlock
Block Distortion Model• Neither bottom-Left nor bottom-right: current-
to-next = disjoint
F
E
CurrentBlock
NextBlock
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Extracting reordering rules
• Many types of reorderings are systematic– move verb group together– subject - verb - object– move negation in front of verb
• Write rules by hand or try to derive them from annotated data
Word reordering
• Trying all possible word reordering is an NP-Complete problem, which makes searching for the optimal solution among all possible permutations computationally intractable.
• Therefore, typically, the number of permutations considered are limited for efficiency reasons by placing reordering restrictions. – allow only reordering of at most n words at any given
time. – using contiguity restrictions on the reordering.
Pre-reordering
• A different approach to allow for a limited reordering is to reorder the input sentence such that the source and the target sentences have similar word order and then proceed to monotonically decode the reordered source sentence.
• Pre-reordering translates words in the same order they appear in the target language. – the input and output sentences have the same word order.
• It is very efficient since the optimal decoding can be found in polynomial time.
Rewrite patterns• A different method is to automatically acquire rewrite
patterns to be applied to any given input sentence so that the rewritten source and target sentences have similar word order. – rewrite patterns can be automatically extracted by parsing
the source and target sides of the training parallel corpus, i.e. using syntactic rules.
– Or, rewriting the input sentence can be done using heuristics.
• Reordering is better handled during the search algorithm and as part of the optimization function, rather than during the decoding phase.
Word reorderings restrictions
• Allowing a restricted set of word reorderings:– word reordering is done only for words belonging to the
verb group. – word reordering based on reordering operations extracted
from syntactic parse trees.
Reordering models
• Distortion• Distance-based models• Lexicalised models• Pre-ordering• Clause reordering
Clause Level Reordering
• Why clause structure?– languages differ vastly in their clause structure (English:
SVO, Arabic: VSO, German: fairly free order; a lot details differ: position of adverbs, sub clauses, etc.)
– large-scale restructuring is a problem for phrase models• Restructuring
– reordering of constituents (main focus)– add/drop/change of function words
Clause Structure
Example from (Koehn, 2006)
Syntax tree from German parser
Clause Reordering When Translating
• Reordering when translating into English– tree is flattened– clause level constituents line up
Clause structure
Clause level reordering is a well defined task– label German constituents with their English order
Examples of translations improved when using reordering
• we must also this criticism should be taken seriously .→ we must also take this criticism seriously .• i am with him that it is necessary , the institutional balance by
means of a political revaluation of both the commission and the council to maintain .
→ i agree with him in this , that it is necessary to maintain the institutional balance by means of a political revaluation of both the commission and the council .
• perhaps it would be a constructive dialog between the government and opposition parties , social representative a positive impetus in the right direction .
→ perhaps a constructive dialog between government and opposition parties and social representative could give a positive impetus in the right direction .
“I’m trying to translate what my cat says and put it in a book, but how many homonyms are there for meow?”
Jarod Kintz The Days of Yay are Here! Wake Me Up When They're
Over.
Top Related