A Probabilistic Framework for Structure-based Alignment
-
Upload
rajah-soto -
Category
Documents
-
view
36 -
download
0
description
Transcript of A Probabilistic Framework for Structure-based Alignment
![Page 1: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/1.jpg)
Language & K nowledge Engineering Lab
A Probabilistic Framework for Structure-based
Alignment
Kurohashi-lab M256430 Toshiaki Nakazawa
![Page 2: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/2.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 3: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/3.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 4: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/4.jpg)
Language & K nowledge Engineering LabStandard Way of Machine Translation
ParallelCorpus Alignment Resource
Output
Translation
Input
Parallel Corpus: Text which is written in two different languages but the content is almost same.
Alignment: To find the correspondence between two parallel sentences. (word level, phrase level, etc…)
The performance of alignment affects the accuracy of
translation.
![Page 5: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/5.jpg)
Language & K nowledge Engineering LabStatistical Machine Translation (SMT)
Learn models for translation from parallel corpus statistically
Not use any linguistic resources Small translation unit (= “word”)
– Recently, the number of studies handling bigger unit (= “couple of words” or “phrase”) is increasing
Require large parallel corpus for highly-accurate translation
![Page 6: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/6.jpg)
Language & K nowledge Engineering Lab
Basic Method for SMT
Translate by maximizing the probability:
)|()(maxarg
)|(maxarg
EJPEP
JEPE
E
E
Language Model Translation Model
Learn from a parallel corpus(usually with unsupervised learning algorithm)
Ex) IBM Model [Brown et al., 93]
![Page 7: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/7.jpg)
Language & K nowledge Engineering Lab
Overview of EBMT
ParallelCorpus Alignment TMDB
Output
Translation
Input
Advanced NLP technologies
Translation Memory Data
Base
![Page 8: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/8.jpg)
Language & K nowledge Engineering LabExample-based Machine Translation (EBMT)
Divide the input sentence into a few parts Find a similar expressions (examples)
from parallel corpus for each parts Combine the examples to generate output
translation Use any linguistic resources as much as
possible Larger translation unit (larger example) is
better
![Page 9: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/9.jpg)
Language & K nowledge Engineering Lab
Flow of EBMT
![Page 10: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/10.jpg)
Language & K nowledge Engineering Lab
SMT vs. EBMT
SMT EBMT
Good
Point
- Works enough for languages which don’t have sufficient NLP resources.
- Active to utilize any kinds of NLP resources.
- High performance.
Bad
Point
- Not easy to achieve high performance.
- Weak for the wide difference between the languages.
- Algorithm is usually heuristic.
- Modification is necessary for each language pair.
We introduce a probabilistic framework for structure-
based alignment.
![Page 11: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/11.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 12: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/12.jpg)
Language & K nowledge Engineering Lab
Alignment
交差点 で 、突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
J: 交差点で、突然あの車が 飛び出して来たのです。
E : The car came at me from
the side at the intersection.
J: JUMAN/KNPE: Charniak’s nlparser → Dependency tree
![Page 13: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/13.jpg)
Language & K nowledge Engineering Lab
Alignment
交差点 で 、突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences• Bilingual dictionaries• Transliteration detection
ローズワイン → rosuwain ⇔ rose wine (similarity:0.78)新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
![Page 14: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/14.jpg)
Language & K nowledge Engineering Lab
Alignment
交差点 で 、突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
![Page 15: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/15.jpg)
Language & K nowledge Engineering Lab
Disambiguation
日本 で保険
会社 に対して
保険請求の
申し立て が
可能です よ
you
will have
to file
insurance
an claim
insurance
with the office
in Japan
Cunamb → Camb : 1/(Distance in J tree) + 1/(Distance in E tree)
1/2 + 1/1
![Page 16: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/16.jpg)
Language & K nowledge Engineering Lab
Alignment
交差点 で 、突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrases
![Page 17: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/17.jpg)
Language & K nowledge Engineering Lab
Alignment
交差点 で 、突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrases
5. Registration to translation example database
![Page 18: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/18.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 19: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/19.jpg)
Language & K nowledge Engineering Lab
Corresponding Pattern (CP)
![Page 20: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/20.jpg)
Language & K nowledge Engineering Lab
Corresponding Pattern (CP)
![Page 21: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/21.jpg)
Language & K nowledge Engineering Lab
Corresponding Pattern (CP)
(1, 2, 1, 1)
(0, 1, 0, 1)(0, 2, 0, 1) (0, 2) (0, 1) (0, 1) (0, 1)
(1, 2) (1, 1)
![Page 22: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/22.jpg)
Language & K nowledge Engineering Lab
CP-score
Assign a score to each CP = CP-score Calculation of CP-score
– Count the frequency of each CP Using the aligned parallel corpus by the baseline align
ment method
– Divide the frequency by the total frequency of all CPs (CP-score is a probability of occurrence)
Alignment Score (AS) by CP-score
1
1 1,
M
i
M
ijjiscoreCPAS
![Page 23: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/23.jpg)
Language & K nowledge Engineering Lab
Alignment Disambiguation by AS
Adopt the alignment with highest AS
![Page 24: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/24.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 25: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/25.jpg)
Language & K nowledge Engineering Lab
Maximum Entropy (ME)
The principle of maximum entropy:– a method for analyzing the available information
in order to determine a unique epistemic probability distribution. (by WIKIPEDIA)
![Page 26: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/26.jpg)
Language & K nowledge Engineering Lab
Maximum Entropy (ME)
The principle of maximum entropy:– a method for analyzing the available information
in order to determine a unique epistemic probability distribution. (by WIKIPEDIA)
Alignment probability with ME [Och et al,. 02]
),,( TSAmhS: Source sentenceT: Target sentenceA: Alignment
ATSA
TSATSA
]),,(exp[
]),,(exp[),|Pr(
1
1M
m mm
M
m mm
h
h
m
:Feature function
: Model parameter
![Page 27: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/27.jpg)
Language & K nowledge Engineering Lab
Feature Functions
1. Alignment Score (AS)
2. Parse score (Jap. and Eng.)
3. Depth pattern score (DP-score)
4. Probability of lexicon (Jap. and Eng.)
5. Coverage of the correspondences (Jap. and Eng.)
6. Average size of the correspondences (Jap. and Eng.)
![Page 28: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/28.jpg)
Language & K nowledge Engineering Lab
Outline
I. Introduction of Machine Translationi. What is Alignment?ii. Statistical Machine Translation (SMT)iii. Example-based Machine Translation
(EBMT)
II. Baseline alignment methodIII. A probabilistic framework for alignment
i. Corresponding Pattern score (CP-score)ii. Integration of Maximum Entropy (ME)
IV. Experiments and resultsV. Discussion and conclusion
![Page 29: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/29.jpg)
Language & K nowledge Engineering Lab
Experiments
Select 500 moderately long sentences from BTEC corpus of IWSLT2005 training data set
Manually annotate phrase-to-phrase alignment
Conducted 5-fold cross validation– 400 for training and 100 for testing
Calculated the F-measure
RP
PRF
2 P: Precision
R: Recall
![Page 30: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/30.jpg)
Language & K nowledge Engineering Lab
Results
Method
All sentences w/ function words
All sentences w/o function words
Ambiguous sentences w/ function words
Baseline 63.86 65.14 60.43
+CP-score 64.21 65.54 61.60
+ME 64.58 66.03 63.00
GIZA++ 22.14 52.85 23.78
![Page 31: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/31.jpg)
Language & K nowledge Engineering Lab
Discussion
Not considering clause– Correspondences in the same clause of
source sentence are likely to be in the same clause of target sentence
Sentence complexity– Proposed method works effectively for
long and complex sentences Preciseness of dictionary
– Erroneous correspondence by the dictionary makes bad effects on alignment
![Page 32: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/32.jpg)
Language & K nowledge Engineering Lab
Conclusion
Proposed a probabilistic framework to improve structure-based alignment
Proposed a new criteria CP-score for evaluating alignment
Integrate the ME model into alignment approach
![Page 33: A Probabilistic Framework for Structure-based Alignment](https://reader035.fdocuments.in/reader035/viewer/2022062422/56813553550346895d9cb540/html5/thumbnails/33.jpg)
Language & K nowledge Engineering Lab
Future Work
Sophisticate the CP and CP-score
– Consider clauses
Select the feature functions
Test our method on other corpora
– Longer and more complex sentences