Post on 19-Jan-2018
description
September 2004 CSAW 2004 1
Extraction of Bilingual Information from Parallel Texts
Mike Rosner
September 2004 CSAW 2004 2
Outline
• Machine Translation• Traditional vs. Statistical Architectures• Experimental Results• Conclusions
September 2004 CSAW 2004 3
Translational Equivalence:many:many relation
SOURCE TARGET
September 2004 CSAW 2004 4
Traditional Machine Translation
September 2004 CSAW 2004 5
Remarks
• Character of System– Knowledge based.– High quality results if domain is well delimited.– Knowledge takes the form of specialised rules
(analysis; synthesis; transfer).• Problems
– Limited coverage– Knowledge acquisition bottleneck.– Extensibility.
September 2004 CSAW 2004 6
Statistical Translation
• Robust• Domain independent• Extensible• Does not require language specialists• Uses noisy channel model of translation
September 2004 CSAW 2004 7
Noisy Channel ModelSentence Translation (Brown et. al. 1990)
sourcesentence
target sentence
sentence
September 2004 CSAW 2004 8
The Problem of Translation
• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.find S that maximises P(S|T)
• By Bayes' theorem P(S|T) = P(S) x P(T|S)
P(T)whose denominator is independent of S.
• Hence it suffices to maximise P(S) x P(T|S)
September 2004 CSAW 2004 9
A Statistical MT System
Source Language
Model
TranslationModel
P(S) * P(T|S) = P(S,T)
S T
DecoderT S
September 2004 CSAW 2004 10
The Three Components of a Statistical MT model
1. Method for computing language model probabilities (P(S))
2. Method for computing translation probabilities (P(S|T))
3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)
September 2004 CSAW 2004 11
ProbabilisticLanguage Models
• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))
• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))
• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))
September 2004 CSAW 2004 12
A Simple Alignment Based Translation Model
Assumption: target sentence is generated from the source sentence word-by-word
S: John loves Mary
T: Jean aime Marie
September 2004 CSAW 2004 13
Sentence Translation Probability
• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.
• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)
September 2004 CSAW 2004 14
More Realistic Example
The proposal will not now be implemented
Les propositions ne seront pas mises en application maintenant
September 2004 CSAW 2004 15
Some Further Parameters
• Word Translation Probability:P(t|s)
• Fertility: the number of words in the target that are paired with each source word: (0 – N)
• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)
September 2004 CSAW 2004 16
Searching
• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)
• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *
September 2004 CSAW 2004 17
Parameter Estimation
• In general - large quantities of data• For language model, we need only source
language text.• For translation model, we need pairs of
sentences that are translations of each other.
• Use EM Algorithm (Baum 1972) to optimize model parameters.
September 2004 CSAW 2004 18
Experiment (Brown et. al. 1990)• Hansard. 40,000 pairs of sentences = approx.
800,000 words in each language.• Considered 9,000 most common words in each
language.• Assumptions (initial parameter values)
– each of the 9000 target words equally likely as translations of each of the source words.
– each of the fertilities from 0 to 25 equally likely for each of the 9000 source words
– each target position equally likely given each source position and target length
September 2004 CSAW 2004 19
English: notFrench Probabilitypas .469ne .460non .024pas du tout .003faux .003plus .002ce .002que .002jamais .002
Fertility Probability2 .7580 .1331 .106
September 2004 CSAW 2004 20
English: hear
French Probabilitybravo .992entendre .005entendu .002entends .001
Fertility Probability0 .5841 .416
September 2004 CSAW 2004 21
Bajada 2003/4
• 400 sentence pairs from Malta/EU accession treaty
• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models
• Model 1: distortion independent• Model 2: distortion dependent
September 2004 CSAW 2004 22
Bajada 2003/4
Model 1 Model 2word pairs present 244 244
word pairs identified 145 145
correct 58 77incorrect 87 68precision 40% 53%recall 24% 32%
September 2004 CSAW 2004 23
Conclusion/Future Work
• Larger data sets• Finer models of word/word translation
probabilities taking into account– fertility– morphological variants of the same words
• Role and tools for bilingual informant (not linguistic specialist)