The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu,...
-
Upload
hubert-norman -
Category
Documents
-
view
214 -
download
0
Transcript of The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu,...
![Page 1: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/1.jpg)
The ICT Statistical Machine Translation Systems for IWSLT 2007Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan Lu, Qun Liu
Institute of Computing TechnologyChinese Academy of Sciences
2007.09.15– 2007.08.16
![Page 2: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/2.jpg)
Outline
OverviewMT Systems
BruinConfuciusLynx
Official EvaluationDiscussionSummary
![Page 3: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/3.jpg)
Introduction of Our Group
Multilingual Interaction Technology Laboratory, Institute of Computing Technology, Chinese Academy SciencesLong history for working on MT
Rule-basedExample-based
Focus on SMT from 2004Website: http://mtgroup.ict.ac.cn/
![Page 4: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/4.jpg)
People Working on SMT at ICT
StaffsQun Liu (Researcher)Yajuan Lu (Associate Researcher) Yang Liu (Associate Researcher) Weihua Luo (Assistant Researcher)
PhD StudentsZhongjun HeHaitao MiJinsong SuYang Feng
Master StudentsYun HuangWenbin JiangZhixiang Ren…
![Page 5: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/5.jpg)
IWSLT 2007 Evaluation
Chinese-English transcript translation task
![Page 6: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/6.jpg)
Systems for IWSLT 2007 Evaluation
MT Systems:Bruin (formally syntax-based)Confucius (extended phrase-based) Lynx (linguistically syntax-based)
![Page 7: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/7.jpg)
Outline
OverviewMT Systems
BruinLynxConfucius
Official EvaluationDiscussionSummary
![Page 8: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/8.jpg)
BruinBruin is a formally syntax-based systemMaxEnt Reordering Model build on BTG rules
Regard reordering as a binary classificationBuilding a MaxEnt-based classifierUsing boundary words instead of whole phrases as features for the classifier
target
source
straight inverted
![Page 9: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/9.jpg)
FeaturesSource and target boundary words (lexical feature)Combinations of boundary words (collocation feature)
C1
E1
C2
E2
Target boundary head words
Source boundary head words
b1 b2
![Page 10: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/10.jpg)
Training and Decoding
Training the modelLearning reordering examples from bilingual word-aligned corpusGenerating features from reordering examplesTraining a MaxEnt model on the features
DecodingCKY algorithm
For details, see Xiong et al., ACL2006
![Page 11: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/11.jpg)
Confucius
An extended phrase-based systemLog-linear modelMonotone decodingWe try a phrase-based similarity model, in which a translation for a certain source phrase can be applied for other similar phrases
![Page 12: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/12.jpg)
Phrase-based Similarity Model
全省 出口 总值 的 25.5%
Find the most similar phrase pair
全市 出口 总值 的 半数
half of the entire city 's export
volume
![Page 13: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/13.jpg)
Phrase-based Similarity Model
全省 出口 总值 的 25.5%
Compare
全市 出口 总值 的 半数
half of the entire city 's export
volume
![Page 14: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/14.jpg)
Phrase-based Similarity Model
全省 出口 总值 的 25.5%
Replace
全省 出口 总值 的 25.5%
25.5% of the entire province
's export
volume
![Page 15: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/15.jpg)
Lynx
A linguistically syntax-based systemBased on tree-to-string alignment template (TAT), which map the source language tree to target language stringLog-linear Model
![Page 16: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/16.jpg)
Translation Process: Parsing中国 的 经济 发展
NP
DNP NP
NP DEG NN NN
NR
中国
的 经济 发展
parsing
![Page 17: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/17.jpg)
Translation Process: Detachment
NP
DNP NP
NP DEG NN NN
NR
中国
的 经济 发展
NP
DNP NP
DEG
的
NP
NR
中国
NP
NN NN
NN NN
经济 发展
detachment
NP
![Page 18: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/18.jpg)
Translation Process : Production
NP
DNP NP
NP DEG
的
NP
NR
中国
NP
NN NN
NN NN
经济 发展
of
China
economic development
![Page 19: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/19.jpg)
Translation Process : Combination
NP
DNP NP
DEG
的
NP
NR
中国
NP
NN NN
NN NN
经济 发展
of
China
economic development
economic development of China
combination
NP
![Page 20: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/20.jpg)
Training and DecodingTraining
Extract TATs from word-aligned, source side parsed bilingual corpus using bottom-up strategyImpose several restrictions to decrease the magnitude
Decodingbottom-up beam search
For details, see Liu et al., ACL2006
![Page 21: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/21.jpg)
OutlineOverviewMT Systems
BruinLynxConfucius
Official EvaluationDiscussionSummary
![Page 22: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/22.jpg)
Toolkits UsedWord alignment
GIZA++ plus “grow-diag-final” refinement methodLanguage model
SRILMChinese parser
Deyi Xiong’s A lexicalized PCFG model trained on PennTree bank
Chinese word segmentation ICTCLAS
![Page 23: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/23.jpg)
Preprocessing and Postprocessing
PreprocessingChinese word segmentationRule-based translations of numbers, dates and Chinese namesChinese sentences Parsing (for Lynx only)
PostprocessingRemove unknown wordsCapitalize the first word of each sentence
![Page 24: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/24.jpg)
Training data
Training Data List
![Page 25: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/25.jpg)
Development and test set
Corpus statistics of the IWSLT 2006 and 2007 development and test set
![Page 26: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/26.jpg)
Results on IWSLT 2006 developmentset and test set
Small data: The training data released by the IWSLT 2007Large data: All the training data
![Page 27: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/27.jpg)
Results on IWSLT 2007 test set
![Page 28: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/28.jpg)
OutlineOverviewMT Systems
BruinLynxConfucius
Official EvaluationDiscussionSummary
![Page 29: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/29.jpg)
Discussion
Lynx(0.1777)Training Corpus:• Training data:
– About 39k sentence pairs dialogs data– Provided by IWSLT 2007
– About 5M sentence pairs newswire data – Released by LDC
• Domain is quite different – Newswire vs. Dialogs
• Newswire data is too large
![Page 30: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/30.jpg)
Discussion
Lynx (0.1777)Parser :• Trained on Penn Chinese Treebank• Domain is quite different too
– Newswire vs. Dialogs
• Parsing error (low performance of parser)• Lynx decoder
– Only depends on the 1-best parsing tree
![Page 31: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/31.jpg)
Discussion Models:
Bruin (0.3750)• MaxEnt based reordering model• Long distance word reordering
Confucius (0.2802)• Monotone search
2007 test set (2006 test set)• 6.7words/sent (12.7words/sent)
– Bruin will do better• Punctuation marks (no )
– More positive reordering information– Bruin will do better
![Page 32: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/32.jpg)
Discussion Models:
Bruin (0.3750)• MaxEnt based reordering model• Long distance word reordering
Confucius (0.2802)• Monotone search
2007 test set (2006 test set)• 6.7words/sent (12.7words/sent)
– Bruin will do better• Punctuation marks (no )
– More positive reordering information– Bruin will do better
![Page 33: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/33.jpg)
Discussion Models:
Bruin (0.3750)• MaxEnt based reordering model• Long distance word reordering
Confucius (0.2802)• Monotone search
2007 test set (2006 test set)• 6.7words/sent (12.7words/sent)
– Bruin will do better• Punctuation marks (no )
– More positive reordering information– Bruin will do better
2006 tst 2007 tstBruin 0.2283 0.3750
Confucius
0.2042 0.2802
![Page 34: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/34.jpg)
Discussion Models:
Bruin (0.3750)• MaxEnt based reordering model• Long distance word reordering
Confucius (0.2802)• Monotone search
2007 test set (2006 test set)• 6.7words/sent (12.7words/sent)
– Bruin will do better• Punctuation marks (no )
– More positive reordering information– Bruin will do better
2006 tst 2007 tstBruin 0.2283 0.3750
Confucius
0.2042 0.2802
![Page 35: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/35.jpg)
Discussion Models:
Bruin (0.3750)• MaxEnt based reordering model• Long distance word reordering
Confucius (0.2802)• Monotone search
2007 test set (2006 test set)• 6.7words/sent (12.7words/sent)
– Bruin will do better• Punctuation marks (no )
– More positive reordering information– Bruin will do better
2006 tst 2007 tstBruin 0.2283 0.3750
Confucius
0.2042 0.2802
![Page 36: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/36.jpg)
Results on IWSLT 2007 test set
![Page 37: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/37.jpg)
Outline
OverviewSystems
BruinLynxConfucius
Official EvaluationDiscussionSummary
![Page 38: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/38.jpg)
SummaryMT
3 systems based on different translation models:• MaxEnt BTG Model• TAT model• Phrase-based Similarity Model
Future WorkMore new modelSystem combination
![Page 39: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/39.jpg)
ReferencesYang Liu, Qun Liu, and Shouxun Lin. 2006. Tree-to-String Alignment Template for Statistical Machine Translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 609-616, Sydney, Australia, July. Deyi Xiong, Qun Liu, and Shouxun Lin. 2006. Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation . In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 521-528, Sydney, Australia, July.
![Page 40: The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.](https://reader035.fdocuments.in/reader035/viewer/2022081519/56649ee75503460f94bf8a60/html5/thumbnails/40.jpg)
Thanks!