Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009,...
Transcript of Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009,...
8/10/2009 ACL/IJCNLP 2009, Singapore 1
INSTITUTE O
F COM
PUTING
TECHN
OLOG
YJoint Decoding with
Multiple Translation ModelsYang Liu, Haitao Mi, Yang Feng, and Qun Liu
Institute of Computing Technology,Chinese Academy of Sciences
{yliu,htmi,fengyang,liuqun}@ict.ac.cn
8/10/2009 ACL/IJCNLP 2009, Singapore 2
INSTITUTE OF COMPUTINGTECHNOLOGY
System Combination
f
System 1
System 2
System n
E1
E2
En
Combiner e…
postprocessingdecoding?
8/10/2009 ACL/IJCNLP 2009, Singapore 3
INSTITUTE OF COMPUTINGTECHNOLOGYThis Work
l We propose a technique called joint decodingto combine different models directly in the decoding phase.
l Our preliminary work shows promising results.
8/10/2009 ACL/IJCNLP 2009, Singapore 4
INSTITUTE OF COMPUTINGTECHNOLOGYTranslation Hypergraph
S
give0-1
talk1-2
give a talk0-2
give talks0-2
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 5
INSTITUTE OF COMPUTINGTECHNOLOGYConsensus Translations
S
give0-1
talks1-2
give talks0-2
give a talk0-2
S
give0-1
talk1-2
give a talk0-2
give talks0-2
S
give0-1
talk1-2
give a talk0-2
give a speech0-2
phrase-based hierarchical phrase-based tree-to-string
8/10/2009 ACL/IJCNLP 2009, Singapore 6
INSTITUTE OF COMPUTINGTECHNOLOGYSharing Nodes
S
give0-1
talk1-2
talks1-2
give a talk0-2
give a speech0-2
give talks0-2
8/10/2009 ACL/IJCNLP 2009, Singapore 7
INSTITUTE OF COMPUTINGTECHNOLOGYScoring a Translation
( )( )
,( | ) , |
d e fp e f p e d f
∈∆
= ∑
target sentence source sentence
the set of derivations that translate f into eone derivation
a derivation can come from any model!
8/10/2009 ACL/IJCNLP 2009, Singapore 8
INSTITUTE OF COMPUTINGTECHNOLOGYMDD and MTD
( )( )
( ),
exp , ,|
i ii
d e f
h d e fp e f
Z
λ
∈∆
=∑
∑
( )( )
( )
,
,
ˆ arg max exp , ,
arg max , ,
e i id e f i
e d i ii
e h d e f
h d e f
λ
λ
∈∆
=
≈
∑ ∑
∑
max-translation decoding
max-derivation decoding
Blunsom et al. (2008)
8/10/2009 ACL/IJCNLP 2009, Singapore 9
INSTITUTE OF COMPUTINGTECHNOLOGY
An Example of MTD
31.0wc
0.71.0lmconst
41.0rc
0.21.0l(f|e)
0.31.0l(e|f)
0.11.0p(f|e)
0.21.0p(e|f)
t2s
31.0rc
0.31.0l(f|e)
0.11.0l(e|f)
0.21.0p(f|e)
0.11.0p(e|f)
hier
valueweightname
featuremodel
exp(3.7)
exp(4.8)
exp(3.7)
(exp(3.7) + exp(4.8)) ×exp(3.7)
8/10/2009 ACL/IJCNLP 2009, Singapore 10
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 11
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 12
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
talk1-2
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 13
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
talk1-2
talks1-2
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 14
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
talk1-2
talks1-2
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 15
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
talk1-2
give a talk0-2
give a speech0-2
give talks0-2
fabiao yanjiang0 1 2
8/10/2009 ACL/IJCNLP 2009, Singapore 16
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding
S
give0-1
talk1-2
give a talk0-2
fabiao yanjiang0 1 2
a pruned packed hypergraph
8/10/2009 ACL/IJCNLP 2009, Singapore 17
INSTITUTE OF COMPUTINGTECHNOLOGYSharing Hyperedges
S
give0-1
talk1-2
give a talk0-2
8/10/2009 ACL/IJCNLP 2009, Singapore 18
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding with Different Rules
fabiao yanjiang0 1 2
VV NN
VP
S
8/10/2009 ACL/IJCNLP 2009, Singapore 19
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding with Different Rules
fabiao yanjiang0 1 2
VV NN
VP
S
X -> <fabiao, give>
give0-1
8/10/2009 ACL/IJCNLP 2009, Singapore 20
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding with Different Rules
fabiao yanjiang0 1 2
VV NN
VP
S
X -> <yanjiang, talk>
give0-1
talk1-2
X -> <fabiao, give>
8/10/2009 ACL/IJCNLP 2009, Singapore 21
INSTITUTE OF COMPUTINGTECHNOLOGY
Joint Decoding with Different Rules
fabiao yanjiang0 1 2
VV NN
VP
S
X -> <yanjiang, talk>
give0-1
talk1-2
X -> <fabiao, give>
(VP (VV:x1) (NN:x2)) -> x1 a x2
give a talk0-2
8/10/2009 ACL/IJCNLP 2009, Singapore 22
INSTITUTE OF COMPUTINGTECHNOLOGY
An Example of MDD
31.0wc
0.71.0lmconst
11.0rc
0.21.0l(f|e)
0.31.0l(e|f)
0.11.0p(f|e)
0.21.0p(e|f)
t2s
21.0rc
0.31.0l(f|e)
0.11.0l(e|f)
0.21.0p(f|e)
0.11.0p(e|f)
hier
valueweightname
featuremodel
8.2
8/10/2009 ACL/IJCNLP 2009, Singapore 23
INSTITUTE OF COMPUTINGTECHNOLOGYSharing Matrix
node, edge
node, edgenodenodenodeT2T
node, edge
node, edgenodenodenodeS2T
nodenodenode, edge
node, edge
node, edgeT2S
nodenodenode, edge
node, edge
node, edgeHiero
nodenodenode, edge
node, edge
node, edgePhrase
T2TS2TT2SHieroPhrase
8/10/2009 ACL/IJCNLP 2009, Singapore 24
INSTITUTE OF COMPUTINGTECHNOLOGY
How to Tune Feature Weights for MTD?
31.0wc
0.71.0lmconst
41.0rc
0.21.0l(f|e)
0.31.0l(e|f)
0.11.0p(f|e)
0.21.0p(e|f)
t2s
31.0rc
0.31.0l(f|e)
0.11.0l(e|f)
0.21.0p(f|e)
0.11.0p(e|f)
hier
valueweightname
featuremodel
(exp(3.7) + exp(4.8)) ×exp(3.7)
( )( ),
ˆ arg max exp , ,e i id e f i
e h d e fλ∈∆
=
∑ ∑
8/10/2009 ACL/IJCNLP 2009, Singapore 25
INSTITUTE OF COMPUTINGTECHNOLOGYMERT for MDD
l fl e1
l d1 0.1 0.2 0.3 0.1l e2
l d1 0.2 0.1 0.3 0.1l e3
l d1 0.1 0.3 0.1 0.2
e1
e2
e3
8/10/2009 ACL/IJCNLP 2009, Singapore 26
INSTITUTE OF COMPUTINGTECHNOLOGYMERT for MTD
l fl e1
l d1 0.1 0.2 0.3 0.1 0.0 0.0 0.0 0.0l d2 0.0 0.0 0.0 0.0 0.2 0.3 0.4 0.1
l e2
l d1 0.0 0.0 0.0 0.0 0.3 0.1 0.2 0.1l d2 0.1 0.2 0.3 0.1 0.0 0.0 0.0 0.0
l e3
l d1 0.1 0.2 0.1 0.2 0.0 0.0 0.0 0.0l d2 0.0 0.0 0.0 0.0 0.1 0.3 0.2 0.1
model 1 model 2
8/10/2009 ACL/IJCNLP 2009, Singapore 27
INSTITUTE OF COMPUTINGTECHNOLOGYCurves
( )1
k k
Ka x b
kf x e × +
=
= ∑
e1
e2
e3
8/10/2009 ACL/IJCNLP 2009, Singapore 28
INSTITUTE OF COMPUTINGTECHNOLOGYSetup
l Modelsl Hierarchical phrase-based (Chiang, 2005)l Tree-to-string (Liu et al., 2006)
l Training set: FBIS (6.9M + 8.9M)l Language model: 4-gram trained on
GIGAWORD Xinhua portionl Development set: NIST 2002 C2El Test set: NIST 2005 C2E
8/10/2009 ACL/IJCNLP 2009, Singapore 29
INSTITUTE OF COMPUTINGTECHNOLOGY
Individual Decoding Vs. Joint Decoding
31.4954.9131.6348.45node & edge
30.7955.89--nodeboth
27.116.6927.236.13-T2S
29.8244.8730.1140.53-Hiero
BLEUTimeBLEUTime
Max-translationMax-derivationSharingModel
8/10/2009 ACL/IJCNLP 2009, Singapore 30
INSTITUTE OF COMPUTINGTECHNOLOGY
Compared with System Combination
31.63bothjoint
31.50bothsystem comb.
27.23T2S
30.11Hieroindividual
BLEUModelMethod
8/10/2009 ACL/IJCNLP 2009, Singapore 31
INSTITUTE OF COMPUTINGTECHNOLOGY
Individual Training Vs. Joint Training
30.7931.63joint
29.9530.70individual
Max-translationMax-derivationTraining
8/10/2009 ACL/IJCNLP 2009, Singapore 32
INSTITUTE OF COMPUTINGTECHNOLOGYConclusion and Future Work
l We have presented a framework for combining different translation models in the decoding phrase.
l Future workl Including more modelsl Forced decodingl Hypergraph-based MERT (Kumar et al., 2009)
8/10/2009 ACL/IJCNLP 2009, Singapore 33
INSTITUTE OF COMPUTINGTECHNOLOGY
Thanks!