Extraction of Bilingual Information from Parallel Texts
description
Transcript of Extraction of Bilingual Information from Parallel Texts
![Page 1: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/1.jpg)
September 2004 CSAW 2004 1
Extraction of Bilingual Information from Parallel Texts
Mike Rosner
![Page 2: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/2.jpg)
September 2004 CSAW 2004 2
Outline
• Machine Translation
• Traditional vs. Statistical Architectures
• Experimental Results
• Conclusions
![Page 3: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/3.jpg)
September 2004 CSAW 2004 3
Translational Equivalence:many:many relation
SOURCE TARGET
![Page 4: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/4.jpg)
September 2004 CSAW 2004 4
Traditional Machine Translation
![Page 5: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/5.jpg)
September 2004 CSAW 2004 5
Remarks
• Character of System– Knowledge based.– High quality results if domain is well delimited.– Knowledge takes the form of specialised rules
(analysis; synthesis; transfer).
• Problems– Limited coverage– Knowledge acquisition bottleneck.– Extensibility.
![Page 6: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/6.jpg)
September 2004 CSAW 2004 6
Statistical Translation
• Robust
• Domain independent
• Extensible
• Does not require language specialists
• Uses noisy channel model of translation
![Page 7: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/7.jpg)
September 2004 CSAW 2004 7
Noisy Channel ModelSentence Translation (Brown et. al. 1990)
sourcesentence
target sentence
sentence
![Page 8: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/8.jpg)
September 2004 CSAW 2004 8
The Problem of Translation
• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.
find S that maximises P(S|T)• By Bayes' theorem
P(S|T) = P(S) x P(T|S)
P(T)
whose denominator is independent of S.• Hence it suffices to maximise P(S) x P(T|S)
![Page 9: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/9.jpg)
September 2004 CSAW 2004 9
A Statistical MT System
Source Language
Model
TranslationModel
P(S) * P(T|S) = P(S,T)
S T
DecoderT S
![Page 10: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/10.jpg)
September 2004 CSAW 2004 10
The Three Components of a Statistical MT model
1. Method for computing language model probabilities (P(S))
2. Method for computing translation probabilities (P(S|T))
3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)
![Page 11: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/11.jpg)
September 2004 CSAW 2004 11
ProbabilisticLanguage Models
• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))
• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))
• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))
![Page 12: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/12.jpg)
September 2004 CSAW 2004 12
A Simple Alignment Based Translation Model
Assumption: target sentence is generated from the source sentence word-by-word
S: John loves Mary
T: Jean aime Marie
![Page 13: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/13.jpg)
September 2004 CSAW 2004 13
Sentence Translation Probability
• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.
• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)
![Page 14: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/14.jpg)
September 2004 CSAW 2004 14
More Realistic Example
The proposal will not now be implemented
Les propositions ne seront pas mises en application maintenant
![Page 15: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/15.jpg)
September 2004 CSAW 2004 15
Some Further Parameters
• Word Translation Probability:P(t|s)
• Fertility: the number of words in the target that are paired with each source word: (0 – N)
• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)
![Page 16: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/16.jpg)
September 2004 CSAW 2004 16
Searching
• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)
• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *
![Page 17: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/17.jpg)
September 2004 CSAW 2004 17
Parameter Estimation
• In general - large quantities of data
• For language model, we need only source language text.
• For translation model, we need pairs of sentences that are translations of each other.
• Use EM Algorithm (Baum 1972) to optimize model parameters.
![Page 18: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/18.jpg)
September 2004 CSAW 2004 18
Experiment (Brown et. al. 1990)
• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.
• Considered 9,000 most common words in each language.
• Assumptions (initial parameter values)– each of the 9000 target words equally likely as
translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for
each of the 9000 source words– each target position equally likely given each source
position and target length
![Page 19: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/19.jpg)
September 2004 CSAW 2004 19
English: not
French Probability
pas .469
ne .460
non .024
pas du tout .003
faux .003
plus .002
ce .002
que .002
jamais .002
Fertility Probability
2 .758
0 .133
1 .106
![Page 20: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/20.jpg)
September 2004 CSAW 2004 20
English: hear
French Probability
bravo .992
entendre .005
entendu .002
entends .001
Fertility Probability
0 .584
1 .416
![Page 21: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/21.jpg)
September 2004 CSAW 2004 21
Bajada 2003/4
• 400 sentence pairs from Malta/EU accession treaty
• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models
• Model 1: distortion independent• Model 2: distortion dependent
![Page 22: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/22.jpg)
September 2004 CSAW 2004 22
Bajada 2003/4
Model 1 Model 2
word pairs present 244 244
word pairs identified 145 145
correct 58 77
incorrect 87 68
precision 40% 53%
recall 24% 32%
![Page 23: Extraction of Bilingual Information from Parallel Texts](https://reader035.fdocuments.in/reader035/viewer/2022062517/56813b89550346895da4b6a1/html5/thumbnails/23.jpg)
September 2004 CSAW 2004 23
Conclusion/Future Work
• Larger data sets
• Finer models of word/word translation probabilities taking into account– fertility– morphological variants of the same words
• Role and tools for bilingual informant (not linguistic specialist)