Neural Machine Translation with Source Dependency ...

20
Neural Machine Translation with Source Dependency Representation Kehai Chen 1 , Rui Wang 2 , Masao Utiyama 2 , Lemao Liu 3 , Akihiro Tamura 4 , Eiichiro Sumita 2 and Tiejun Zhao 1 1 Harbin Institute of Technology, China 2 National Institute of Information and Communications Technology, Japan 3 Tencent AI Lab, China 4 Ehime University, Japan 1

Transcript of Neural Machine Translation with Source Dependency ...

Page 1: Neural Machine Translation with Source Dependency ...

NeuralMachineTranslationwithSourceDependencyRepresentation

KehaiChen1,RuiWang2,MasaoUtiyama2,Lemao Liu3,AkihiroTamura4,Eiichiro Sumita2 andTiejun Zhao1

1HarbinInstituteofTechnology,China2NationalInstituteofInformationandCommunicationsTechnology,Japan3TencentAILab,China4EhimeUniversity,Japan

1

Page 2: Neural Machine Translation with Source Dependency ...

2

Overview

StandardNMTmodel

y1 y2 y3 y4 y5 y6 y7Trg: y8

x1 x2 x3 x4 x5 x6 x7Src:

• Traditional NMT Model

Page 3: Neural Machine Translation with Source Dependency ...

3

• Our proposed NMT model

Overview

OurproposedNMTmodel

y1 y2 y3 y4 y5 y6 y7Trg: y8

x1 x2 x3 x4 x5 x6 x7

Src dep:

Src:

root

Inspired by the syntax knowledge in SMT, we want to explicitly integrate source dependency information into NMT

Page 4: Neural Machine Translation with Source Dependency ...

4

Related Work• NMT with source syntax information

-Tree2seq (Eriguchi et al., 2016; Li et al., 2017; +other)Tree-based neural network is used to encode source phrase structures

-Extending source inputs with syntax labels (Sennrich et al., 2016; Chen et al., 2017; +other)Dependency labels are concatenated to source word

Page 5: Neural Machine Translation with Source Dependency ...

5

Related Work• NMT with source syntax information

-Tree2seq (Eriguchi et al., 2016; Li et al., 2017; +other)Tree-based neural network is used to encode source phrase structures

-Extending source inputs with syntax labels (Sennrich et al., 2016; Chen et al., 2017; +other)Dependency labels are concatenated to source word

• Our work

-A compromise between the two kinds of works

-A novel double context approach to utilizing source dependency constraints

Page 6: Neural Machine Translation with Source Dependency ...

Source Dependency Representation (SDR)

6

, ,j j jj x x xU PA SI CH=< >

• Extracting a dependency unit for each source word to capture source long-distance dependency constraints:

x1 x2 x3 x4 x5 x6 x7

Src dep:

Src:

root

Page 7: Neural Machine Translation with Source Dependency ...

Source Dependency Representation (SDR)

7

, ,j j jj x x xU PA SI CH=< >

• Extracting a dependency unit for each source word to capture source long-distance dependency constraints:

x1 x2 x3 x4 x5 x6 x7

Src dep:

Src:

root

Where PAxj, SIxj, and CHxj denote the parent, siblings and children words of source word xj in a dependency structure.

Take x2 as an example: 2

2

2

3

1 4 7

,

, , ,

,

x

x

x

PA x

SI x x x

CH e

=< >

=< >

=< >

2 3, 1, 4, 7,U x x x x e=< >then,

Page 8: Neural Machine Translation with Source Dependency ...

Source Dependency Representation (SDR)

8

Convolution layer 1

Max-pooling layer 1

Output layer

x3x1x4x7ε/////

Input layer

Convolution layer 2Max-pooling layer 2

10*d8*d

4*d2*d

1*dVU2

3×d kernel

3×d kernel

• Learn sematic representation of each dependency unitTake x2 as an example:

2

2

2

3

1 4 7

,

, , ,

,

x

x

x

PA x

SI x x x

CH e

=< >

=< >

=< >

2 3, 1, 4, 7,U x x x x e=< >then,

Page 9: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDR

Src

Dep Tuples

Enc

oder

Vx1

h1

Vx2

h2

VxJ

hJ

yi

Dec

oder

si…

ci

yi-1

si-1

x1 x2 x3 x4 x5 x6 x7

,1ia ,2ia … ,i Ja

SDRNMT-1:

9

Page 10: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSrc dep

Src

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 :VU1

h1

Vx2 : VU2

h2

VxJ : VUJ

hJ

: ……

yi

Dec

oder

si…

ci

yi-1

si-1

UJ

CNNCNN CNN CNN

x1 x2 x3 x4 x5 x6 x7

root

,1ia ,2ia … ,i Ja

SDRNMT-1:

10

1( : , )j jj enc x U jh f V V h -=

Where the Vxj is 360-dim and the learned VUj is 260-dim.

Page 11: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

11

Page 12: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

1

1

( , ),

( , )j

j

j enc x j

j enc U j

h f V h

d f V d-

-

=

=x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

d1 d2 dJ…

Encoder:

12

Page 13: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

1

1

( , ),

( , )j

j

j enc x j

j enc U j

h f V h

d f V d-

-

=

=

, 1

, 1

( ),

( ).

s si j i j

d di j i j

e f s h

e f s d-

-

= +

= +

x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

d1 d2 dJ…

Attention 𝛼"

Attention:

Encoder:

, ,,

, ,1

exp( (1 ) )

exp( (1 ) )

s di j i j

i j J s di j i jj

e e

e e

l la

l l=

+ -=

+ -å

13

Page 14: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

1

1

( , ),

( , )j

j

j enc x j

j enc U j

h f V h

d f V d-

-

=

=

, 1

, 1

( ),

( ).

s si j i j

d di j i j

e f s h

e f s d-

-

= +

= +

Dec

oder

sdi-1

yi-1

ssi-1

sdi

csi

cdi

ssi

x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

d1 d2 dJ…

… …

……

……

𝛼" i,1𝛼" i,1 𝛼" i,2

𝛼" i,2 𝛼" i,J𝛼" i,J

Attention 𝛼"

Attention:

Encoder:

, ,,

, ,1

exp( (1 ) )

exp( (1 ) )

s di j i j

i j J s di j i jj

e e

e e

l la

l l=

+ -=

+ -åDecoder: , , , ,

1 1,

J Js di j i j j i j i j j

j jc h c da a

= =

= =å å1 1

1 1

( , , ),

( , , ).

s s si i i id d di i i i

s s y cs s y c

j

j- -

- -

=

=

14

Page 15: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

1

1

( , ),

( , )j

j

j enc x j

j enc U j

h f V h

d f V d-

-

=

=

, 1

, 1

( ),

( ).

s si j i j

d di j i j

e f s h

e f s d-

-

= +

= +

Dec

oder

sdi-1

yi-1

ssi-1

sdi

csi

yi

cdi

ssi

x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

d1 d2 dJ…

… …

……

……

𝛼" i,1𝛼" i,1 𝛼" i,2

𝛼" i,2 𝛼" i,J𝛼" i,J

Attention 𝛼"

Attention:

Encoder:

, ,,

, ,1

exp( (1 ) )

exp( (1 ) )

s di j i j

i j J s di j i jj

e e

e e

l la

l l=

+ -=

+ -åDecoder: , , , ,

1 1,

J Js di j i j j i j i j j

j jc h c da a

= =

= =å å

1( | , , ) ( , , , , )s d s di i i i i i i ip y y x T g y s s c c- -=

1 1

1 1

( , , ),

( , , ).

s s si i i id d di i i i

s s y cs s y c

j

j- -

- -

=

=

15

Page 16: Neural Machine Translation with Source Dependency ...

Neural Machine Translation with SDRSDRNMT-2:

1

1

( , ),

( , )j

j

j enc x j

j enc U j

h f V h

d f V d-

-

=

=

, 1

, 1

( ),

( ).

s si j i j

d di j i j

e f s h

e f s d-

-

= +

= +

Dec

oder

sdi-1

yi-1

ssi-1

sdi

csi

yi

cdi

ssi

x1 x2 x3 x4 x5 x6 x7

Src dep

Src

root

U2=<x3, x1, x4, x7 , ε>Dep Tuples U1

Enc

oder

Vx1 VU1

h1

Vx2 VU2

h2

VxJ VUJ

hJ

……

UJ

CNNCNN CNN CNN

d1 d2 dJ…

… …

……

……

𝛼" i,1𝛼" i,1 𝛼" i,2

𝛼" i,2 𝛼" i,J𝛼" i,J

Attention 𝛼"

Attention:

Encoder:

, ,,

, ,1

exp( (1 ) )

exp( (1 ) )

s di j i j

i j J s di j i jj

e e

e e

l la

l l=

+ -=

+ -åDecoder: , , , ,

1 1,

J Js di j i j j i j i j j

j jc h c da a

= =

= =å å

1( | , , ) ( , , , , )s d s di i i i i i i ip y y x T g y s s c c- -=

1 1

1 1

( , , ),

( , , ).

s s si i i id d di i i i

s s y cs s y c

j

j- -

- -

=

=

16DoubleContextNMT

Page 17: Neural Machine Translation with Source Dependency ...

Experimental• Experiments on Chinese-to-English translation task, 1.42M LDC corpus

• Parse source sentences of training data by Stanford Parser (Chang et al.,2009)

• For the SDRNMT-1 and SDRNMT-2, the dimension of Vxj is 360 and the

dimension of VUj is 260, and input embedding of the baseline is 620

• The baselines include Phrase-Based Statistical Machine Translation

(PBSMT) (Koehn et al., 2007), standard Attentional NMT (AttNMT)

(Bahdanau et al., 2014), NMT with dependency labels (Sennrich and

Haddow, 2016)

17

Page 18: Neural Machine Translation with Source Dependency ...

Experimental

System Dev(NIST02) NIST03 NIST04 NIST05 NIST06 NIST08 AVG

PBSMT 33.15 31.02 33.78 30.33 29.62 23.53 29.66

AttNMT 36.31 34.02 37.11 32.86 32.54 25.44 32.40

Sennrich-deponly 36.68 34.51 38.09 33.37 32.96 26.96 32.98

SDRNMT-1 36.88 34.98* 38.14 34.61** 33.58* 27.06 33.32

SDRNMT-2 37.34 35.91** 38.73* 34.18* 33.76** 27.64* 34.04

“*” indicates statistically significant better than “Sennrich-deponly” at p-value < 0.05 and “**” at p-value < 0.01 by bootsrap resampling (Koehn, 2004)

18

Page 19: Neural Machine Translation with Source Dependency ...

ExperimentalResults• Translation qualities for different sentence lengths

25

30

35

10 20 30 40 50 60 70 80 80+

BLE

U

SENT LENGTHS

PBSMTAttNMTSennrich-deponlySDSNMT-1SDSNMT-2

19

Page 20: Neural Machine Translation with Source Dependency ...

Conclusion

• Source dependency unit to capture source long-distance dependency constraint

• The proposed SDRNMT-1 and SDRNMT-2 consist of NMT and CNN, which are

jointly trained to learn SDR and translation instead of separately trained

• Double-Context approach to further utilize source dependency representation

20