Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal...

71

Transcript of Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal...

Page 1: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Multilingual NLP via Cross-LingualWord Embeddings

Ivan Vuli¢

LTL, University of Cambridge & PolyAI

FoTran Workshop

Helsinki; September 28, 2018

Page 2: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Joint Work with...

...plus eorts of many other researchers...1 / 70

Page 3: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

M

A cold open...

2 / 70

Page 4: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Embeddings are Everywhere (Monolingual)

Intrinsic (semantic) tasks:

Semantic similarity and synonym detection, word analogiesConcept categorization

Selectional preferences, language grounding

Extrinsic (downstream) tasks

Named entity recognition, part-of-speech tagging, chunkingSemantic role labeling, dependency parsing

Sentiment analysis, discourse parsing

Beyond NLP:

Information retrieval, Web search

Question answering, dialogue systems3 / 70

Page 5: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Embeddings are Everywhere (Cross-lingual)

Intrinsic (semantic) tasks:

Cross-lingual and multilingual semantic similarity

Bilingual lexicon learning, multi-modal representations

Extrinsic (downstream) tasks:

Cross-lingual SRL, POS tagging, NERCross-lingual dependency parsing and sentiment analysisCross-lingual annotation and model transfer

Statistical/Neural machine translation

Beyond NLP:

(Cross-Lingual) Information retrieval

(Cross-Lingual) Question answering4 / 70

Page 6: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

M

A slow intro...

5 / 70

Page 7: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Motivation

We want to understand and model the meaning of...

Source: dreamstime.com

...without manual/human input and without perfect MT6 / 70

Page 8: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Motivation

The NLP community has developed useful features for several tasks but ndingfeatures that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...)

(monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...)

(cross-lingual word embeddings → this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

7 / 70

Page 9: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Motivation

The NLP community has developed useful features for several tasks but ndingfeatures that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...)

(monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...)

(cross-lingual word embeddings → this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

Learn word-level features which generalise across tasks and languages

8 / 70

Page 10: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

The World Existed B.E. (Before Embeddings)

B.E. Example 1: Cross-lingual (Brown) clusters

[Tackstrom et al., NAACL 2012, Faruqui and Dyer, ACL 2013]

9 / 70

Page 11: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

The World Existed B.E.

B.E. Example 2a: Traditional count-based cross-lingual spaces...

[Gaussier et al., ACL 2004; Laroche and and Langlais, COLING 2010]

10 / 70

Page 12: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

The World Existed B.E.

B.E. Example 2b: ...and bootstrapping from a limited bilingual signal

[Peirsman and Padó, NAACL 2010; Vuli¢ and Moens, EMNLP 2013]

11 / 70

Page 13: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

The World Existed B.E.

B.E. Example 3: Cross-lingual topical spaces

[Mimno et al., EMNLP 2009; Vuli¢ et al., ACL 2011]12 / 70

Page 14: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Motivation Revisited

Cross-lingual word embeddings:

- simple: quick and ecient to train- state-of-the-art and omnipresent- lightweight and cheap

Cross-lingual word embedding models are to MT what(monolingual) word embedding models are to language modeling.

by Sebastian Ruder

Are they?

They support cross-lingual NLP: enabling cross-lingual modelingand cross-lingual transfer

13 / 70

Page 15: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning Word Representations

Key idea

Distributional hypothesis → words with similar meanings are likelyto appear in similar contexts

[Harris, Word 1954]

shouts:

Meaning as use!

calmly states:

You shall know a word by the company it keeps.14 / 70

Page 16: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Word Embeddings

Representation of each word w ∈ V :

vec(w) = [f1, f2, . . . , fdim]

Word representations in the same semantic (or embedding) space!

Image courtesy of [Gouws et al., ICML 2015]

15 / 70

Page 17: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Word Embeddings (CLWEs)

Representation of a word wS1 ∈ V S :

vec(wS1 ) = [f1

1 , f12 , . . . , f

1dim]

Exactly the same representation for wT2 ∈ V T :

vec(wT2 ) = [f2

1 , f22 , . . . , f

2dim]

Language-independent word representations in the same sharedsemantic (or embedding) space!

16 / 70

Page 18: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Word Embeddings

[Luong et al., NAACL 2015]17 / 70

Page 19: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Word Embeddings

Monolingual vs. Cross-lingual

Q1 → Algorithm Design: How to align semantic spaces in twodierent languages?

Q2 → Data Requirements: Which bilingual signals are used forthe alignment?

18 / 70

Page 20: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Data Requirements: Bilingual Signals

Parallel Comparable

Word Dictionaries ImagesSentence Translations CaptionsDocument - Wikipedia

Nature and alignment level of data sources required by CLWE models

Data requirements are more important for the nal CLWE modelperformance than the actual underlying architecture[Levy et al., EACL 2017]

19 / 70

Page 21: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Data Requirements: Bilingual Signals

[Upadhyay et al., ACL 2016]

Illustration of dierent data requirements and bilingual signals

1. Word-level: word alignments and translation dictionaries

2. Sentence-level: sentence alignments

3. Document-level: Wikipedia/news articles

4. Other: visual alignment: image captions, eye-tracking data

5. Without any bilingual signal?

20 / 70

Page 22: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Welcome to the CLWE Jungle I

Parallel ComparableWord Mikolov et al. (2013) Kiela et al. (2015)

Faruqui & Dyer (2014) Vuli¢ et al. (2016)Xing et al. (2015) Vuli¢ et al. (2017)Dinu et al. (2015) Zhang et al. (2017)Lazaridou et al. (2015) Hauer et al. (2017)Zhang et al. (2016)Zou et al. (2013)Ammar et al. (2016)Artexte et al. (2016)Xiao & Guo (2014)Gouws and Søgaard (2015)Duong et al. (2016)Gardner et al. (2015)Smith et al. (2017)Mrk²i¢ et al. (2017)Artetxe et al. (2017)

21 / 70

Page 23: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Welcome to the CLWE Jungle II

Parallel ComparableSentence alignments Guo et al. (2015) Calixto et al. (2017)

Vyas and Carpuat (2016) Gella et al. (2017)Hermann and Blunsom (2013)Lauly et al. (2013)Ko£iský et al. (2014)Chandar et al. (2014)Pham et al. (2015)Klementiev et al. (2012)Soyer et al. (2015)Luong et al. (2015)Gouws et al. (2015)Coulmance et al. (2015)Shi et al. (2015)Rajendran et al. (2016)Levy et al. (2017)

Document Hermann and Blunsom (2014) Vuli¢ & Korhonen (2016)Vuli¢ and Moens (2016)Søgaard et al. (2015)Mogadala and Rettinger (2016)

22 / 70

Page 24: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

General (Simplied) Methodology

(Bilingual) data sources are more important than the chosen algorithm

Most algorithms are formulated as: L1(W) + L2(V) + Ω(W,V )

23 / 70

Page 25: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information

1. Mapping-based or oine approaches: train monolingual wordrepresentations independently and then learn a transformation matrixfrom one language to the other

2. Pseudo-bilingual approaches: train a monolingual WE model onautomatically constructed corpora containing words from both languages

3. Joint online approaches: take parallel text as input and minimize thesource and target language losses jointly with the regularization term

These approaches are equivalent, modulo optimization strategies

24 / 70

Page 26: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 1

The geometric relations that hold between words are similar across languages:e.g., numbers and animals in English show a similar geometric constellation astheir Spanish counterparts

[Mikolov et al., arXiv 2013]

25 / 70

Page 27: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 1

Basic mapping approach

[Mikolov et al., arXiv 2013]

Learn to transform the pre-trained source language embeddings into a space

where the distance between a word and its translation pair is minimised26 / 70

Page 28: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 1

Basic mapping approach

Based on given word translation pairs (xsi , xti), i = 1, . . . , N

Mean-squared error between the Euclidean distances (of the actualand transformed vector)

ΩMSE =n∑

i=1

‖Wxsi − xt

i‖2

ΩMSE = ‖WXs −Xt‖2F

Standard (re)formulation:

J = L1SGNS + L2SGNS + ΩMSE27 / 70

Page 29: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Bootstrapping from Small Seed Dictionaries

The latest instance of the bootstrapping idea[Artetxe et al., ACL 2017]

Self-learning iterative framework: starting from cognates or numerals

28 / 70

Page 30: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Bootstrapping from Small Seed Dictionaries

Lexicon induction results from [Artetxe et al., ACL 2017]

Bootstrapping helps with really small seed dictionaries

A performance plateau with simple mapping approaches has been reached?29 / 70

Page 31: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning without any Bilingual Signal?

The rst instance of the fully unsupervised learning idea[Conneau et al., ICLR 2018; Artetxe et al., ACL 2018]

Adversarial training: playing the two-way game between a generator (mapping)and a discriminator

Alignment at the level of distributions

Supporting unsupervised NMT [Lample et al., ICLR 2018] and unsupervisedCLIR [Litschko et al., SIGIR 2018]

30 / 70

Page 32: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning without any Bilingual Signal?

But is it really that great?

[Søgaard Ruder, Ruder, Vuli¢; ACL 2018]

31 / 70

Page 33: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Mapping Approach: Bilingual to Multilingual

Fixing one space (English) and learning transformations for all monolingualword embeddings in other languages

This constructs a multilingual space from input monolingual spaces

[Smith et al., ICLR 2017]:

A unied multilingual space for 89 languages using fastText vectors and 5kGoogle Translate pairs for each language pair

32 / 70

Page 34: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 2

Pseudo-bilingual training

[Gouws and Søgaard, NAACL 2015]

Barista: Bilingual Adaptive Reshuing with Individual Stochastic Alternatives

1. Build a pseudo-bilingual corpus (or corrupt the original training data) usinga set of predened equivalences (e.g., POS tags, word translation pairs)

2. Train a monolingual WE model on the corrupted/shued corpus

Original : we build the house

POS : we build la voiture / they run la house

Translations: we construire the maison / nous build la house

33 / 70

Page 35: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 2

[Gouws and Søgaard, NAACL 2015]

Barista: Bilingual Adaptive Reshuing with Individual Stochastic Alternatives

1 Input: Source corpus Cs, target corpus Ct, dictionary D

2 Concatenate Cs and Ct and then shue the concatenated corpus

3 Pass over the corpus, for each word w: if w′|(w,w′) ∨ (w′, v) ∈ D isnon-empty and of cardinality k, replace w with w′ with probability 1/2k;keep w otherwise

4 Train a standard WE model on the pseudo-bilingual corpus C′

34 / 70

Page 36: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 2

The chosen equivalence class dictates the shape of the output space

Cross-lingual embedding space with POS equivalences35 / 70

Page 37: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 3

Joint (Online) Training

Bilingual Skip-Gram (BiSkip) proposed by Luong et al., NAACL 2015

Very similar to the original mapping approach, but formulated as online

monolingual and cross-lingual training on pseudo-bilingual corpora

J = L1SGNS + L2SGNS + ΩL1,L2

ΩL1,L2 = SGNSL1→L2 + SGNSL2→L1 36 / 70

Page 38: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 3

Multilingual Skip-Gram (MultiSkip) by Ammar et al., NAACL 2016

Simple extension (again) → summing up bilingual objectives for allavailable parallel corpora

J = L1SGNS + L2

SGNS + L3SGNS + ΩL1,L2

+ ΩL2,L3+ ΩL1,L3

37 / 70

Page 39: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Word-Level Information: Example 4

Post-processing/specialisation

State-of-the-art: Paragram and Attract-Repel[Wieting et al., TACL 2015; Mrk²i¢ et al., TACL 2017]

If S is the set of synonymous word pairs, the procedure iterates overmini-batches of such constraints BS , optimising the following cost function:

S(BS) =∑

(xl,xr)∈BS

(ReLU (δsim + xltl − xlxr)

+ ReLU (δsim + xrtr − xlxr))

where δsim is the similarity margin and tl and tr are negative examples for the

given word pair (xl, xr).

38 / 70

Page 40: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

CLWEs via Semantic Specialisation

Cross-lingual supervision: again word-level linguistic constraints(e.g., synonyms)

Constraint Relation Source

(en_sweet, it_dolce) SYN BabelNet(en_work, fr_travail) SYN BabelNet(en_school, de_schule) SYN BabelNet(fr_montagne, de_gebirge) SYN BabelNet(sh_gradona£elnik, en_mayor) SYN BabelNet(nl_vrouw, it_donna) SYN BabelNet

(en_sour, it_dolce) ANT BabelNet(en_asleep, fr_éveillé) ANT BabelNet(en_cheap, de_teuer) ANT BabelNet(de_langsam, es_rápido) ANT BabelNet(sh_obeshrabriti, en_encourage) ANT BabelNet(fr_jour, nl_nacht) ANT BabelNet

39 / 70

Page 41: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

CLWEs via Semantic Specialisation

Negative Examples for each Synonymy Pair

For each synonymy pair (xl,xr), the negative example pair (tl, tr)is chosen from the remaining in-batch vectors so that tl is the oneclosest (cosine similarity) to xl and tr is closest to xr.

40 / 70

Page 42: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Aligned Sentences: Example 1

Trans-gram model: very similar to BiSkip, but it relies on full sententialcontexts

[Coulmance et al., EMNLP 2015]

J = L1SGNS + L2

SGNS + ΩL1,L2

ΩL1,L2 = SGNSL1→L2 + SGNSL2→L1

Deja vu, right?41 / 70

Page 43: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Aligned Sentences: Example 2

BilBOWA model: very similar to online models, but using sentence-levelcross-lingual information

[Gouws et al., ICML 2017] Deja vu, right?

J = L1SGNS + L2

SGNS + ΩBilBOWA

ΩBilBOWA = ‖ 1

m

m∑ws

i∈sents

xsi −

1

n

n∑wt

j∈sentt

xtj‖2

42 / 70

Page 44: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Aligned Documents: Example 1

Again, pseudo-bilingual training

43 / 70

Page 45: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Learning from Aligned Documents: Example 2

Inverted indexing

[Søgaard et al. ACL 2015]; slide courtesy of Anders Søgaard

44 / 70

Page 46: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Summary

Slide courtesy of Anders Søgaard45 / 70

Page 47: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Summary

Slide courtesy of Anders Søgaard

46 / 70

Page 48: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Evaluation

Quality of Embeddings

How good are they in representing word meaning?

How good are they as features in downstream

tasks?

Intrinsic Extrinsic

Page 49: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Evaluation

Intrinsic >> Extrinsic?

Intrinsic << Extrinsic?

Intrinsic Extrinsic?

Most Ideal but hardly true!

[Schnabel et al, EMNLP 2015; Tsvetkov et al, EMNLP 2015]

Page 50: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Similarity-Based vs Feature-Based?

Parallel to the division to intrinsic and extrinsic tasks, there is anotherperspective:

Similarity-based use of embeddings: applications that rely on (constrained)nearest neighbour search in the cross-lingual word embedding graph

Feature-based use of embeddings: use cross-lingual embeddings directly asfeatures: information retrieval, cross-lingual transfer?

Page 51: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Lexicon Inductionand Cross-Lingual Word Similarity

Semantic similarity = measuring distance in the induced space

Bilingual lexicons: (en_morning, it_mattina), (en_carpet, pt_tapete), ...

50 / 70

Page 52: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Lexicon Induction

Similarity-based vs. classication-based lexicon induction

Word embeddings used as features for classication

[Irvine and Callison-Burch, CompLing 2017; Heyman et al., EACL 2017]

Page 53: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic EvaluationCross-lingual Document Classication

Quite popular in the CLWE literature

Cross-lingual knowledge transfer?

[Klementiev et al., COLING 2012; Heyman et al., DAMI 2015]

Page 54: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Document Classication

53 / 70

Page 55: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic EvaluationCross-lingual Document Classication

Using word-level information composed into a document-level representation

[Hermann and Blunsom, ACL 2014]

1. Is this really an optimal way to approach this task?2. Is this really an optimal way to evaluate word embeddings?

Page 56: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Dependency Parsing

55 / 70

Page 57: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Other ApplicationsCross-lingual Task X, Cross-lingual Task Y, ...

Cross-lingual supersense tagging

[Gouws and Søgaard, NAACL 2015]

Word alignment

[Levy et al., EACL 2017]

POS tagging

[Søgaard et al., ACL 2015; Zhang et al., NAACL 2016]

Entity linking: wikication

[Tsai and Roth, NAACL 2016]

Relation prediction

[Glava² and Vuli¢, NAACL 2018]

56 / 70

Page 58: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Other ApplicationsCross-lingual Task X, Cross-lingual Task Y, ...

Cross-lingual frame-semantic parsing and discourse parsing

[Johannsen et al., EMNLP 2015, Braud et al., EACL 2017]

Cross-lingual lexical entailment

[Vyas and Carpuat, NAACL 2016, Upadhyay et al., NAACL 2018]

Semantic role labeling

[Kozhevnikov and Titov, ACL 2013; Akbik et al., ACL 2016]

Sentiment analysis

[Zhou et al., ACL 2016; Abdalla and Hirst, arXiv 2017]

57 / 70

Page 59: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationCross-lingual Information Retrieval

Semantic representations for cross-lingual information IR

[Vuli¢ and Moens, SIGIR 2015; Mitra and Craswell, arXiv 2017]

Page 60: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationCross-lingual Information Retrieval

Document and Query Embeddings

Composition of word embeddings:[Mitchell and Lapata, ACL 2008]

−→d = −→w1 +−→w2 + . . .+−−−→w|Nd|

The dim-dimensional document embedding in the samecross-lingual word embedding space:

−→d = [fd,1, . . . , fd,k, . . . , fd,dim]

Page 61: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationCross-lingual Information Retrieval

Document and Query Embeddings

→ The same principles with queries

−→Q = −→q1 +−→q2 + . . .+−→qm

The dim-dimensional query embedding in the same bilingualword embedding space:

−→Q = [fQ,1, . . . , fQ,k, . . . , fQ,dim]

Page 62: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationLanguage Understanding: Multilingual Dialog State Tracking

61 / 70

Page 63: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationLanguage Understanding: Multilingual Dialog State Tracking

The Neural Belief Tracker is a novel DST model/framework whichaims to satisfy the following design goals:

1 End-to-end learnable (no SLU modules or semantic dictionaries).

2 Generalisation to unseen slot values.

3 Capability of leveraging the semantic content of pre-trained wordvector spaces without human supervision.

[Mrk²i¢ et al., ACL 2017; Mrk²i¢ and Vuli¢, ACL 2018]

62 / 70

Page 64: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationLanguage Understanding: Multilingual Dialog State Tracking

Representation Learning + Label Embedding +Separate Binary Decisions

To overcome data sparsity, NBT models use label embedding todecompose multi-class classication into many binary ones.

Page 65: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationLanguage Understanding: Multilingual Dialog State Tracking

DST in Italian and German

Multilingual WOZ 2.0 (Mrk²i¢ et al., 2017); Italianand GermanThe 1,200 dialogues in WOZ 2.0 were translated by native Italian and Germanspeakers instructed to consider preceding dialogue context.

Word Vector Space EN IT DE

Best Baseline Vector Space 81.6 71.8 50.5Monolingual Distributional Vectors 77.6 71.2 46.6+ Monolingual Specialisation 80.9 72.7 52.4++ Cross-Lingual Specialisation 80.3 75.3 55.7+++ Multilingual DST Model 82.8 77.1 57.7

Page 66: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Extrinsic Evaluation and ApplicationLanguage Understanding: Multilingual Dialog State Tracking

DST Models for Resource-Poor Languages

Ontology Grounding: Multilingual DST Models

The domain ontology (i.e. the concepts it expresses) is language agnostic,which means that `labels' persist across languages. Using training data for two(or more) languages, and cross-lingual vectors of high quality, we train therst-ever multilingual DST model.

Page 67: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Take-Home Messages

1. Cross-Lingual Word Embedding Models are More Similar toOld(er) NLP Ideas than It Seems

But they are simple, ecient, and state-of-the-art...

2. CLWE Models are More Similar than It SeemsWe have addressed similarities between a plethora of cross-lingual wordembedding models

3. Data is Crucial (more than the Chosen Algorithm)

Optimization and various modeling tricks matter, but the chosen multilingualsignal is the most important factor

4. CL Word Embeddings Support Cross-Lingual NLP

They are useful across a wide variety of cross-lingual NLP tasks, forcross-lingual (re-lexicalized) transfer, language understanding, IR, ...

Page 68: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Take-Home Messages

1. Cross-Lingual Word Embedding Models are More Similar toOld(er) NLP Ideas than It Seems

But they are simple, ecient, and state-of-the-art...

2. CLWE Models are More Similar than It SeemsWe have addressed similarities between a plethora of cross-lingual wordembedding models

3. Data is Crucial (more than the Chosen Algorithm)

Optimization and various modeling tricks matter, but the chosen multilingualsignal is the most important factor

4. CL Word Embeddings Support Cross-Lingual NLP

They are useful across a wide variety of cross-lingual NLP tasks, forcross-lingual (re-lexicalized) transfer, language understanding, IR, ...

Page 69: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Take-Home Messages

1. Cross-Lingual Word Embedding Models are More Similar toOld(er) NLP Ideas than It Seems

But they are simple, ecient, and state-of-the-art...

2. CLWE Models are More Similar than It SeemsWe have addressed similarities between a plethora of cross-lingual wordembedding models

3. Data is Crucial (more than the Chosen Algorithm)

Optimization and various modeling tricks matter, but the chosen multilingualsignal is the most important factor

4. CL Word Embeddings Support Cross-Lingual NLP

They are useful across a wide variety of cross-lingual NLP tasks, forcross-lingual (re-lexicalized) transfer, language understanding, IR, ...

Page 70: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Embeddings are Everywhere (Cross-lingual)

Intrinsic (semantic) tasks:

Cross-lingual and multilingual semantic similarity

Bilingual lexicon learning, multi-modal representations

Extrinsic (downstream) tasks:

Cross-lingual SRL, POS tagging, NERCross-lingual dependency parsing and sentiment analysisCross-lingual annotation and model transfer

Statistical/Neural machine translation

Beyond NLP:

(Cross-Lingual) Information retrieval

(Cross-Lingual) Question answering69 / 70

Page 71: Multilingual NLP via Cross-Lingual Word Embeddings...Bilingual lexicon learning, multi-modal representations Extrinsic (downstream) tasks: Cross-lingual SRL, POS tagging, NER Cross-lingual

Cross-Lingual Space of Thankyous

This talk was based on a recent tutorial:

http://people.ds.cam.ac.uk/iv250/tutorial/xlingrep-tutorial.pdf

Book under review: Morgan & Claypool Synthesis Lectures

with Anders Søgaard, Sebastian Ruder, Manaal Faruqui70 / 70