Dependency-Based Word Embeddings

39
Dependency-Based Word Embeddings Omer Levy Yoav Goldberg Bar-Ilan University Israel

description

Dependency-Based Word Embeddings. Omer Levy Yoav Goldberg Bar- Ilan University Israel. Neural Embeddings. Dense vectors Each dimension is a latent feature word2vec ( Mikolov et al., 2013) State-of-the-Art: Skip-Gram with Negative Sampling “Linguistic Regularities” - PowerPoint PPT Presentation

Transcript of Dependency-Based Word Embeddings

Dependency-BasedWord Embeddings

Omer Levy Yoav GoldbergBar-Ilan University

Israel

Neural Embeddings• Dense vectors• Each dimension is a latent feature• word2vec (Mikolov et al., 2013)• State-of-the-Art: Skip-Gram with Negative Sampling • “Linguistic Regularities”

king man woman queen

Linguistic Regularities in Sparse and Explicit Word RepresentationsFriday, 2:00 PM, CoNLL 2014

Our Main Contribution:

Generalizing Skip-Gram with Negative Sampling

Skip-Gram with Negative Sampling v2.0• Original implementation assumes bag-of-words contexts• We generalize to arbitrary contexts

• Dependency contexts create qualitatively different word embeddings

• Provide a new tool for linguistically analyzing embeddings

Context Types

Australian scientist discovers star with telescope

Example

Australian scientist discovers star with telescope

Target Word

Australian scientist discovers star with telescope

Bag of Words (BoW) Context

Australian scientist discovers star with telescope

Bag of Words (BoW) Context

Australian scientist discovers star with telescope

Bag of Words (BoW) Context

Australian scientist discovers star with telescope

Syntactic Dependency Context

Australian scientist discovers star with telescope

Syntactic Dependency Contextprep_wit

hnsubj

dobj

Australian scientist discovers star with telescope

Syntactic Dependency Contextprep_wit

hnsubj

dobj

Generalizing Skip-Gram with Negative Sampling

How does Skip-Gram work?• Skip-gram represents each word as a vector

• Skip-gram represents each context word as a different vector

• Same word has 2 different embeddings (as “word”, as “context”)

How does Skip-Gram work?Text

Bag of Words Context

Word-Context Pairs

Learning

How does Skip-Gram work?Text

Bag of Words Contexts

Word-Context Pairs

Learning

Our ModificationText

Arbitrary Contexts

Word-Context Pairs

Learning

Our ModificationText

Arbitrary Contexts

Word-Context Pairs

Learning

Modified word2vec publicly available!

Our Modification: ExampleText

Syntactic Contexts

Word-Context Pairs

Learning

Our Modification: ExampleText (Wikipedia)

Syntactic Contexts

Word-Context Pairs

Learning

Our Modification: ExampleText (Wikipedia)

Syntactic Contexts (Stanford Dependencies)

Word-Context Pairs

Learning

What is the effect of different context types?

What is the effect of different context types?• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…

General Conclusion:• Bag-of-words contexts induce topical similarities• Dependency contexts induce functional similarities• Share the same semantic type• Cohyponyms

• Does this hold for embeddings as well?

Embedding Similarity with Different Contexts

Target Word Bag of Words (k=5) DependenciesDumbledore Sunnydale

hallows CollinwoodHogwarts half-blood Calarts

(Harry Potter’s school) Malfoy GreendaleSnape Millfield

Related to Harry Potter Schools

Embedding Similarity with Different Contexts

Target Word Bag of Words (k=5) Dependenciesnondeterministic Paulingnon-deterministic Hotelling

Turing computability Heting(computer scientist) deterministic Lessing

finite-state Hamming

Related to computability Scientists

Online Demo!

Embedding Similarity with Different Contexts

Target Word Bag of Words (k=5) Dependenciessinging singingdance rapping

dancing dances breakdancing(dance gerund) dancers miming

tap-dancing busking

Related todance Gerunds

Embedding Similarity with Different Contexts• Dependency-based embeddings have more functional similarities

• This phenomenon goes beyond these examples

• Quantitative Analysis (in the paper)

Dependency-based embeddings have more functional similarities

Quantitative Analysis

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Prec

ision

Dependencies

BoW (k=2)

BoW (k=5)

Why do dependencies induce functional similarities?

Dependency Contexts & Functional Similarity• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…

• In explicit representations, we can look at the features and analyze

• But embeddings are a black box!• Dimensions are latent and don’t necessarily have any meaning

Analyzing Embeddings

Peeking into Skip-Gram’s Black Box• Skip-Gram allows a peek…

• Contexts are embedded in the same space!

• Given a word , find the contexts it “activates” most:

Associated ContextsTarget Word Dependencies

students/prep_at-1

educated/prep_at-1

Hogwarts student/prep_at-1

stay/prep_at-1

learned/prep_at-1

Associated ContextsTarget Word Dependencies

machine/nn-1

test/nn-1

Turing theorem/poss-1

machines/nn-1

tests/nn-1

Associated ContextsTarget Word Dependencies

dancing/conjdancing/conj-1

dancing singing/conj-1

singing/conjballroom/nn

Analyzing Embeddings• We found a way to linguistically analyze embeddings

• Together with the ability to engineer contexts…

• …we now have the tools to create task-tailored embeddings!

Conclusion

Conclusion• Generalized Skip-Gram with Negative Sampling to arbitrary contexts

• Different contexts induce different similarities

• Suggest a way to peek inside the black box of embeddings

• Code, demo, and word vectors available from our websites

• Make linguistically-motivated task-tailored embeddings today!Thank you for listening :)