for Syntax-aware NMT Graph Convolutional Encoders · Eriguchi et al. (ACL, 2016). Tree-to-Sequence...

Graph Convolutional Encoders for Syntax-aware NMT

Joost Bastings

Institute for Logic, Language and Computation (ILLC)University of Amsterdamhttp://joost.ninja

Collaborators

Ivan Titov Wilker Aziz Diego Marcheggiani Khalil Sima’an

Encoder-Decoder models: no syntax, no hierarchical structure

3Sutskever et al. (2014), Cho et al. (2014)

du vet ingenting jon snow

you know

you know nothing

you know nothing jon

you know nothing jon snow

Syntax can help!

Syntax:

● captures non-local dependencies between words● captures relations between words

How to incorporate syntax into NMT?

Multi-task learning / latent-graph:

● Eriguchi et al. (ACL, 2017). Learning to parse and translate improves NMT.● Hashimoto et al. (EMNLP, 2017). NMT with Source-Side Latent Graph Parsing.● Nadejde et al. (WMT, 2017). Predicting Target Language CCG Supertags Improves NMT.

Tree-RNN:

● Eriguchi et al. (ACL, 2016). Tree-to-Sequence Attentional NMT.● Chen et al. (ACL, 2017). Improved NMT with a Syntax-Aware Encoder and Decoder.● Yang et al. (EMNLP, 2017). Towards Bidirectional Hierarchical Representations for Attention-based NMT.

Serialized trees:

● Aharoni & Goldberg (ACL, 2017). Towards String-to-Tree NMT.● Li et al. (ACL, 2017). Modeling Source Syntax for NMT.

Sequence-to-Dependency:

● Wu et al. (ACL, 2017). Sequence-to-Dependency NMT.

Related Work

Graph Convolutional Networks: Neural Message Passing

Many linguistic representations can be expressed as a graph

the bastard defended the wall

det nsubjdet

Graph Convolutional Networks: message passing

14Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

Undirected graph Update of node v

Graph Convolutional Networks: message passing

15Kipf & Welling (2017). Related ideas earlier, e.g., Scarselli et al. (2009).

Undirected graph Update of node v

GCNs: multilayer convolution operation

Initial feature representations of nodes

Representations informed by nodes’ neighbourhood

X = H(0) Z = H(N)

Hidden layer

Output

Adapted from Thomas Kipf. The computation is parallelizable can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

GCNs: multilayer convolution operation

Initial feature representations of nodes

Representations informed by nodes’ neighbourhood

X = H(0) Z = H(N)

Hidden layer

Output

Adapted from Thomas Kipf. The computation is parallelizable can be made quite efficient (e.g., Hamilton, Ying and Leskovec (2017)).

Syntactic GCNs: directionality and labels

nsubj dobjdet

Syntactic GCNs: directionality and labels

Weight matrix for each direction:Wout, Win, Wloop

Bias for each label,e.g. bnsubj

nsubj dobjdet

Syntactic GCNs: edge-wise gating

Graph Convolutional Encoders

det nsubj

Encoder (BiRNN, CNN, ..)

det nsubj

GCN layer 1

det nsubj

GCN layer 1

W outW out

det nsubj

GCN layer 1

W outW out

det nsubj

GCN layer 2

W out W outW out

GCN layer 1

W outW out

det nsubj

GCN layer 2

W out W outW out

GCN layer 1

W outW out

GCNs for NMT

det nsubj

Encoder

GCN layer(s)

W out W outW out

bastarden

Context vector

Decoder (RNN)

GCNs for NMT

det nsubj

Encoder

GCN layer(s)

W out W outW out

bastarden försvarade

Context vector

Decoder (RNN)

GCNs for NMT

det nsubj

Encoder

GCN layer(s)

W out W outW out

bastarden försvarade väggen

Context vector

Decoder (RNN)

Experiments

Random sequences from 26 types

Each token is linked to its predecessor

Sequences are randomly permuted

Each token is also linked to a random token with a “fake edge” using a different label set

Artificial reordering experiment

s n o w

n s w o

Random sequences from 26 types

Each token is linked to its predecessor

Sequences are randomly permuted

Each token is also linked to a random token with a “fake edge” using a different label set

Artificial reordering experiment

A BiRNN+GCN model is able to learn how to put the permuted sequences back into order.

Real edges are distinguished from fake edges.

s n o w

n s w o

Three baselines

Bag of words Convolutional Recurrent

Machine Translation experiments

We parse English source sentences using SyntaxNet

BPE on target side (8k merges, 16k for WMT’16)

Embeddings: 256 units, GRUs/CNNs: 512 units (800 for WMT’16)

Train Validationnewstest2015

Testnewstest2016

English-German NCv11 227 k2169 2999

WMT‘16 4.5 M

English-Czech NCv11 181 k 2656 2999

Results English-German (BLEU)

Results English-Czech (BLEU)

Effect of GCN layers

English-German English-Czech

BLEU1 BLEU4 BLEU1 BLEU4

BiRNN 44.2 14.1 37.8 8.9

+ GCN 1L 45.0 14.1 38.3 9.6

+ GCN 2L 46.3 14.8 39.6 9.9

Effect of sentence length

Flexibility of GCNs

In principle we can condition on any kind of graph-based linguistic structure:

Semantic Role Labels

Co-reference chains

AMR semantic graphs

Conclusion

Simple and effective approach to integrating syntax into NMT models

Consistent BLEU improvements for two challenging language pairs

GCNs are capable of encoding any kind of graph-based structure

Thank you!

Code: Neural Monkey + GCN (TensorFlow)

joost.ninja

Code: Semantic Role Labeling (Theano)

diegma.github.io

Ivan is hiring PhDs/Postdocs. Deadline tomorrow!

This work was supported by the European Research Council (ERC StG BroadSem 678254) and the Dutch National Science Foundation (NWO VIDI 639.022.518, NWO VICI 277-89-002).

for Syntax-aware NMT Graph Convolutional Encoders · Eriguchi et al. (ACL, 2016). Tree-to-Sequence...

Documents

Transcript of for Syntax-aware NMT Graph Convolutional Encoders · Eriguchi et al. (ACL, 2016). Tree-to-Sequence...

for Syntax-aware NMT Graph Convolutional Encoderswilkeraziz.github.io/slides/emnlp2017.pdf · Eriguchi et al. (ACL, 2016). Tree-to-Sequence Attentional NMT. Chen et al. (ACL, 2017).

NMT wk 2 project

Normal ACL, Injured ACL, Reconstructed ACL, and the · PDF file · 2015-03-30Normal ACL, Injured ACL, Reconstructed ACL, and the Failed ACL Geoffrey S. Van Thiel, MD, MBA, and Bernard

Dr. George M. Ganz, President SwitzerlandMobility · NMT signalization norms Interlinking of NMT with public transport NMT corporate design NMT uniform web information NMT uniform

Untitled-1 [salvichem.com]salvichem.com/assets/SCI-IRON-SALTS-BOOKLET.pdf · NMT. 1 Oppm NMT 0.3% No turbidty is produced Wthin 5 minutes NMT 10ppm 16.5 to 18.5% FCC Passes Test NMT

Dementia nmt

Nmt 631 radioiodine_therapy_and_total_body_imaging

New Merchandising Technologies (NMT)

5 nmt industry presentation

GIZ Per Serving the NMT 3D NMT En

NMT City Wide

NMT July 2010

Heating, Climatisation NMT SMART C / Electronically ...€¦ · NMT SMART C 32/100 F220 NMT SMART C / Electronically regulated circulating pump - communication Heating, Climatisation

NMT F19 omote new

NMT PUMPS...979523886 NMT PLUS ER 25/80-180 0,20 180 Rp 1 55 2,4 979523887 NMT PLUS ER 32/80-180 0,20 180 Rp 1¼ 55 2,5 979523891 NMT PLUS PWM S …

THE SYNTAX OF DISCOURSE: WHAT AN ...homes.chass.utoronto.ca/~danhall/ACL-communications-2020...THE SYNTAX OF DISCOURSE: WHAT AN ANISHINAABEMOWIN ORAL TEXT TEACHES US Sonja Frazier*,

How To Manual de Instalación...ACL Interface Access Group Direction IP ACL ACL TEST Action Add Type P ACL IPv6 ACL ACL Name Please Select MAC ACL Apply Expert ACL From Port eth 101

WordPress.com - NMT Times is a monthly …...1 NMT TIMES NMT Times is a monthly compilation of news, views, features and articles from India and Asia on Non-Motorised Transport (NMT)

NWS - NMT | Home · 2019-07-12 · NWS NMT China: Batch Weigh Train Loader Equipment. p.2 NMT Australia: Roy Hill Project. p.4 NMT Australia: Maules Creek Project. p.5 Quality Management:

18 August 2015 53-1003833-01 Brocade NetIronpleiades.ucsc.edu/doc/brocade/netiron-05900-securityguide.pdfCommand syntax conventions ... Enabling and disabling ACL accounting on Brocade