Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009,...

33
8/10/2009 ACL/IJCNLP 2009, Singapore 1 INSTITUTE OF COMPUTING TECHNOLOGY Joint Decoding with Multiple Translation Models Yang Liu, Haitao Mi, Yang Feng, and Qun Liu Institute of Computing Technology, Chinese Academy of Sciences {yliu,htmi,fengyang,liuqun}@ict.ac.cn

Transcript of Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009,...

Page 1: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 1

INSTITUTE O

F COM

PUTING

TECHN

OLOG

YJoint Decoding with

Multiple Translation ModelsYang Liu, Haitao Mi, Yang Feng, and Qun Liu

Institute of Computing Technology,Chinese Academy of Sciences

{yliu,htmi,fengyang,liuqun}@ict.ac.cn

Page 2: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 2

INSTITUTE OF COMPUTINGTECHNOLOGY

System Combination

f

System 1

System 2

System n

E1

E2

En

Combiner e…

postprocessingdecoding?

Page 3: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 3

INSTITUTE OF COMPUTINGTECHNOLOGYThis Work

l We propose a technique called joint decodingto combine different models directly in the decoding phase.

l Our preliminary work shows promising results.

Page 4: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 4

INSTITUTE OF COMPUTINGTECHNOLOGYTranslation Hypergraph

S

give0-1

talk1-2

give a talk0-2

give talks0-2

fabiao yanjiang0 1 2

Page 5: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 5

INSTITUTE OF COMPUTINGTECHNOLOGYConsensus Translations

S

give0-1

talks1-2

give talks0-2

give a talk0-2

S

give0-1

talk1-2

give a talk0-2

give talks0-2

S

give0-1

talk1-2

give a talk0-2

give a speech0-2

phrase-based hierarchical phrase-based tree-to-string

Page 6: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 6

INSTITUTE OF COMPUTINGTECHNOLOGYSharing Nodes

S

give0-1

talk1-2

talks1-2

give a talk0-2

give a speech0-2

give talks0-2

Page 7: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 7

INSTITUTE OF COMPUTINGTECHNOLOGYScoring a Translation

( )( )

,( | ) , |

d e fp e f p e d f

∈∆

= ∑

target sentence source sentence

the set of derivations that translate f into eone derivation

a derivation can come from any model!

Page 8: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 8

INSTITUTE OF COMPUTINGTECHNOLOGYMDD and MTD

( )( )

( ),

exp , ,|

i ii

d e f

h d e fp e f

Z

λ

∈∆

=∑

( )( )

( )

,

,

ˆ arg max exp , ,

arg max , ,

e i id e f i

e d i ii

e h d e f

h d e f

λ

λ

∈∆

=

∑ ∑

max-translation decoding

max-derivation decoding

Blunsom et al. (2008)

Page 9: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 9

INSTITUTE OF COMPUTINGTECHNOLOGY

An Example of MTD

31.0wc

0.71.0lmconst

41.0rc

0.21.0l(f|e)

0.31.0l(e|f)

0.11.0p(f|e)

0.21.0p(e|f)

t2s

31.0rc

0.31.0l(f|e)

0.11.0l(e|f)

0.21.0p(f|e)

0.11.0p(e|f)

hier

valueweightname

featuremodel

exp(3.7)

exp(4.8)

exp(3.7)

(exp(3.7) + exp(4.8)) ×exp(3.7)

Page 10: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 10

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

fabiao yanjiang0 1 2

Page 11: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 11

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

fabiao yanjiang0 1 2

Page 12: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 12

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

talk1-2

fabiao yanjiang0 1 2

Page 13: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 13

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

talk1-2

talks1-2

fabiao yanjiang0 1 2

Page 14: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 14

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

talk1-2

talks1-2

fabiao yanjiang0 1 2

Page 15: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 15

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

talk1-2

give a talk0-2

give a speech0-2

give talks0-2

fabiao yanjiang0 1 2

Page 16: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 16

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding

S

give0-1

talk1-2

give a talk0-2

fabiao yanjiang0 1 2

a pruned packed hypergraph

Page 17: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 17

INSTITUTE OF COMPUTINGTECHNOLOGYSharing Hyperedges

S

give0-1

talk1-2

give a talk0-2

Page 18: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 18

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding with Different Rules

fabiao yanjiang0 1 2

VV NN

VP

S

Page 19: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 19

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding with Different Rules

fabiao yanjiang0 1 2

VV NN

VP

S

X -> <fabiao, give>

give0-1

Page 20: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 20

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding with Different Rules

fabiao yanjiang0 1 2

VV NN

VP

S

X -> <yanjiang, talk>

give0-1

talk1-2

X -> <fabiao, give>

Page 21: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 21

INSTITUTE OF COMPUTINGTECHNOLOGY

Joint Decoding with Different Rules

fabiao yanjiang0 1 2

VV NN

VP

S

X -> <yanjiang, talk>

give0-1

talk1-2

X -> <fabiao, give>

(VP (VV:x1) (NN:x2)) -> x1 a x2

give a talk0-2

Page 22: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 22

INSTITUTE OF COMPUTINGTECHNOLOGY

An Example of MDD

31.0wc

0.71.0lmconst

11.0rc

0.21.0l(f|e)

0.31.0l(e|f)

0.11.0p(f|e)

0.21.0p(e|f)

t2s

21.0rc

0.31.0l(f|e)

0.11.0l(e|f)

0.21.0p(f|e)

0.11.0p(e|f)

hier

valueweightname

featuremodel

8.2

Page 23: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 23

INSTITUTE OF COMPUTINGTECHNOLOGYSharing Matrix

node, edge

node, edgenodenodenodeT2T

node, edge

node, edgenodenodenodeS2T

nodenodenode, edge

node, edge

node, edgeT2S

nodenodenode, edge

node, edge

node, edgeHiero

nodenodenode, edge

node, edge

node, edgePhrase

T2TS2TT2SHieroPhrase

Page 24: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 24

INSTITUTE OF COMPUTINGTECHNOLOGY

How to Tune Feature Weights for MTD?

31.0wc

0.71.0lmconst

41.0rc

0.21.0l(f|e)

0.31.0l(e|f)

0.11.0p(f|e)

0.21.0p(e|f)

t2s

31.0rc

0.31.0l(f|e)

0.11.0l(e|f)

0.21.0p(f|e)

0.11.0p(e|f)

hier

valueweightname

featuremodel

(exp(3.7) + exp(4.8)) ×exp(3.7)

( )( ),

ˆ arg max exp , ,e i id e f i

e h d e fλ∈∆

=

∑ ∑

Page 25: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 25

INSTITUTE OF COMPUTINGTECHNOLOGYMERT for MDD

l fl e1

l d1 0.1 0.2 0.3 0.1l e2

l d1 0.2 0.1 0.3 0.1l e3

l d1 0.1 0.3 0.1 0.2

e1

e2

e3

Page 26: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 26

INSTITUTE OF COMPUTINGTECHNOLOGYMERT for MTD

l fl e1

l d1 0.1 0.2 0.3 0.1 0.0 0.0 0.0 0.0l d2 0.0 0.0 0.0 0.0 0.2 0.3 0.4 0.1

l e2

l d1 0.0 0.0 0.0 0.0 0.3 0.1 0.2 0.1l d2 0.1 0.2 0.3 0.1 0.0 0.0 0.0 0.0

l e3

l d1 0.1 0.2 0.1 0.2 0.0 0.0 0.0 0.0l d2 0.0 0.0 0.0 0.0 0.1 0.3 0.2 0.1

model 1 model 2

Page 27: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 27

INSTITUTE OF COMPUTINGTECHNOLOGYCurves

( )1

k k

Ka x b

kf x e × +

=

= ∑

e1

e2

e3

Page 28: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 28

INSTITUTE OF COMPUTINGTECHNOLOGYSetup

l Modelsl Hierarchical phrase-based (Chiang, 2005)l Tree-to-string (Liu et al., 2006)

l Training set: FBIS (6.9M + 8.9M)l Language model: 4-gram trained on

GIGAWORD Xinhua portionl Development set: NIST 2002 C2El Test set: NIST 2005 C2E

Page 29: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 29

INSTITUTE OF COMPUTINGTECHNOLOGY

Individual Decoding Vs. Joint Decoding

31.4954.9131.6348.45node & edge

30.7955.89--nodeboth

27.116.6927.236.13-T2S

29.8244.8730.1140.53-Hiero

BLEUTimeBLEUTime

Max-translationMax-derivationSharingModel

Page 30: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 30

INSTITUTE OF COMPUTINGTECHNOLOGY

Compared with System Combination

31.63bothjoint

31.50bothsystem comb.

27.23T2S

30.11Hieroindividual

BLEUModelMethod

Page 31: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 31

INSTITUTE OF COMPUTINGTECHNOLOGY

Individual Training Vs. Joint Training

30.7931.63joint

29.9530.70individual

Max-translationMax-derivationTraining

Page 32: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 32

INSTITUTE OF COMPUTINGTECHNOLOGYConclusion and Future Work

l We have presented a framework for combining different translation models in the decoding phrase.

l Future workl Including more modelsl Forced decodingl Hypergraph-based MERT (Kumar et al., 2009)

Page 33: Joint Decoding with Multiple Translation Modelsly/talks/acl09_jd.pdf · 8/10/2009 ACL/IJCNLP 2009, Singapore 1 I N S T I T U TE O F C O M P U T I N G TE C H N O L OGY Joint Decoding

8/10/2009 ACL/IJCNLP 2009, Singapore 33

INSTITUTE OF COMPUTINGTECHNOLOGY

Thanks!