Head Finalization Reordering for Chinese-to-Japanese...
Transcript of Head Finalization Reordering for Chinese-to-Japanese...
![Page 1: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/1.jpg)
Head Finalization Reordering for
Chinese-to-Japanese
Machine Translation
Dan Han *, Katsuhito Sudoh †, Xianchao Wu $, Kevin Duh ʄ, Hajime Tsukada †, Masaaki Nagata † * National Institute of Informatics † NTT Communication Science Laboratories $ Baidu Japan, Inc. ʄ Nara Institute of Science and Technology(NAIST)
![Page 2: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/2.jpg)
2
Outline
Introduction
Motivation & Objective
Syntactic-Based Reordering Rules
Other Reordering Issues
Conclusion
![Page 3: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/3.jpg)
Introduction
English and Chinese are head-initial languages, while
Japanese is a typical head-final language.
Head Finalization Reordering Rule for English-
Japanese MT (Isozaki et al. 2010).
Using the parsed result of Enju, an HPSG parser (Miyao and
Tsujii, 2008) that outputs the syntactic heads.
Move syntactic heads to the end of the
corresponding syntactic constituents
Methodology
3
![Page 4: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/4.jpg)
Introduction (cont.)
c0
c1
c2
t0
c3
c4
t1
c5
c6
t2
c7
t3
*
*
* * *
* *
*
-- H.Isozaki, K.Sudoh 2010
John hit a ball
Head-Final English (HFE)
* Indicate the syntactic head
c0
c1
c2
t0
c3 *
*
*
c4
t1
*
*
c5
c6
t2
c7
t3
* *
*
John hit a ball
ジャン(は) 打った - ボ-ル(を)
4
![Page 5: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/5.jpg)
Motivation
HF works well for Japanese-English, but not for other
language pairs.
Attempt to remedy it at the level of syntactic-head
analysis.
Discrepancies in Head Definition among Chinese,
Japanese, and English cause reordering issues while
implementing HF.
In this work, example of analysis and solution for
Chinese-to-Japanese.
5
![Page 6: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/6.jpg)
私(は) と 京都 (に) 行く 。 東京
*
我 去 东京 和 京都 。
*
*
*
*
*
*
*
*
*
*
c0
c2
c1
t0
c10
t5
c3
c4
t1
c5
t2
c6
c7
c8
t3
c9
t4
I go to Tokyo and Kyoto .
c0
c2
c1
t0
*
c10
t5
*
*
*
c3
c4
t4
c5
*
c7
c8
t2
*
c9
t3
*
*
*
t1
c6
*
*
我 。 和 东京 去 京都
私(は) と 京都 (に) 行く 。 東京
Problem!
Head-Final Chinese (HFC)
6
![Page 7: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/7.jpg)
c0
c2
c1
t0
c10
t5
c3
c4
t4
c5
t1
c6
c7
c8
t2
c9
t3
我 东京 和 京都 去 。
*
*
*
*
*
*
*
*
*
*
*
私(は) と 京都 (に) 行く 。 東京
Perfect!
Refined-Head-Final Chinese (Refined-HFC)
私(は) と 京都 (に) 行く 。 東京
*
我 去 东京 和 京都 。
*
*
*
*
*
*
*
*
*
*
c0
c2
c1
t0
c10
t5
c3
c4
t1
c5
t2
c6
c7
c8
t3
c9
t4
I go to Tokyo and Kyoto .
7
![Page 8: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/8.jpg)
8
Objective
Present detailed syntactic analysis of reordering
issues.
Define novel reordering rules based on HF and
linguistically inspired refinements.
![Page 9: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/9.jpg)
9
Syntactic-based Reordering Rules
Aspect Particle
Adverbial Modifier 'bu4'
Sentence-final Particle
Et cetera
Punctuation
Coordination
![Page 10: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/10.jpg)
我去过东京。
我 (I) 东京(Tokyo) 过(have) 去(been to)
我 东京 去 过
私(は) 東京(に) い った
I have been to Tokyo.
HFC
R-HFC
Ch
En
Ja
Example for Aspect Particle
Aspect Particle
c0
c1
t0
c2
c8
t4
c3
c4
t3
c7
*
*
*
*
*
*
c5
t1
c6
t2
* *
*
我
东京
去 过
。
10
![Page 11: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/11.jpg)
Adverbial Modifier 'bu4'
我不看电视。
我(I) 电视(TV) 不(do not) 看(watch)
我 电视 看 不
私(は) テレビ(を) 見 ない
I do not watch TV.
HFC
R-HFC
Ch
En
Ja
Example for Adverbial Modifier bu4 c0
c1
t0
c2
c8
t4
c3
c4
t3
c7
*
*
*
*
*
*
c5
t1
c6
t2
* *
*
我
。
不 看
电视
11
![Page 12: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/12.jpg)
Sentence-final Particle
天气是真好啊。
啊 天气(weather) 真好(good) 是(is)
天气 真好 是 啊
いい 天気 です ね
It is good weather.
HFC
R-HFC
Ch
En
Ja
Example for Sentence-final Particle c0
c8
t4
c1
c7
t3
c2
c4
t2
c6
*
*
*
*
*
*
c5
t1
c3
t0
*
*
*
。
天气
是 真好
啊
12
![Page 13: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/13.jpg)
Et cetera
水果包括苹果等。
水果(fruits) 等(etc.) 苹果(apples) 包括(include)
水果 苹果 等 包括
果物(は) リンゴ など(を) 含む
Fruits include apples, etc.
HFC
R-HFC
Ch
En
Ja
Example for Et cetera. c0
c1
t0
c2
c8
t4
c3
c5
t3
c4
*
*
*
*
*
*
c6
t1
c7
t2
* *
* 。
水果
包括 苹果
等
13
![Page 14: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/14.jpg)
14
Experiments
Training data: CWMT data set (News domain, 282K sentences)
Additional training data: XINHUA parallel data sets (News
domain, 593K sentences) is used to compare the results.
Dev & Test data: CWMT data set (1K sentences)
Word alignment: GIZA++
Decoder: Moses
![Page 15: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/15.jpg)
15
Experiments (cont.)
BLEU and RIBES scores while CWMT corpus was used for
training.
![Page 16: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/16.jpg)
16
Experiments (cont.)
BLEU and RIBES scores while CWMT ext. corpus was used
for training.
![Page 17: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/17.jpg)
17
Other Reordering Issues
Serial Verb Construction
Complementizer
Verbal Nominalization and Nounal
Verbalization
Adverbial Modifier
POS tagging and Parsing Errors
![Page 18: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/18.jpg)
严格处罚违法行为
严格 违法 行为 处罚
違法 行為(を) 厳しく 処罰
Severely penalize unlawful act
HFC
Ch En
Ja
Example for Adverbial Modifier
Example for Complementizer
忙完了
完 忙 了
忙しく なくなり ました
Have finished
HFC
Ch En
Ja
Example for Serial Verb Construction
维持深化中日关系 Ch En Maintain and deepen the Japan-China relations
中日关系 深化 维持
日中関係 (を) 維持 深化 し
HFC
Ja
Example for Verbal Nominalization
健全安定发展的促进
安定 发展 的 促进 健全
健全 な 安定 し た 発展 の 促進
sound, stably and increasingly promote
HFC
Ch En
Ja
18
![Page 19: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/19.jpg)
19
POS Tagging & Parsing Errors
POS Tagging Errors
「伊朗」(イラン, Iran)
POS = “VV” or “JJ” POS=“NR”
「胡主席」(フー・チンタオ, Hu Jintao)
POS = “VV” POS=“NR”
「实施」(実施する, Implement)
POS = “NN” POS=“VV”
![Page 20: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/20.jpg)
20
POS Tagging & Parsing Errors
Parsing Errors
VV NN C M
V N
V
投资 20 亿 美元
*
*
*
20 億 ドル (を) 投資 する
P
P
NN NN
PU N
N
“
根据
东南 快报
*
*
* PU
”
N
S
S
*
投资 20 亿 美元
亿 美元 20 投资
「 東南 快報 」 に よる と
根据 “ 东南 快报 ”
“ 东南 快报 根据 ”
Invest 2 billion US dollars
According to “TONAN NEWS”
![Page 21: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/21.jpg)
21
POS Tagging & Parsing Errors
Parsing Errors
C M NN VV
N V
V
亿 美元 20 投资
*
*
*
20 億 ドル (を) 投資 する
P
P
NN NN
PU N
N
“
根据
东南 快报
*
*
* PU
”
N
S
S
*
投资 20 亿 美元
亿 美元 20 投资
「 東南 快報 」 に よる と
根据 “ 东南 快报 ”
“ 东南 快报 根据 ”
Invest 2 billion US dollars
According to “TONAN NEWS”
![Page 22: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/22.jpg)
22
Conclusion
Basic Head Finalization reordering rules improved the Chinese-
to-Japanese machine translation quality.
However, the refined-HFC substantially achieved further
improvement.
– Due to more monotonic word alignment.
![Page 23: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/23.jpg)
Thank you for your attention! Suggestions & Questions
![Page 24: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/24.jpg)
24
Experiments (cont.)
The effect frequency of each exception rule during reordering
on CWMT extended corpus.
![Page 25: Head Finalization Reordering for Chinese-to-Japanese ...dekai/ssst6/slides/HanSudohWuDuhTsukadaNagat… · Introduction English and Chinese are head-initial languages, while Japanese](https://reader033.fdocuments.in/reader033/viewer/2022060800/608460305a31d51bee293a70/html5/thumbnails/25.jpg)