Dependency-Based Word Embeddings
description
Transcript of Dependency-Based Word Embeddings
Neural Embeddings• Dense vectors• Each dimension is a latent feature• word2vec (Mikolov et al., 2013)• State-of-the-Art: Skip-Gram with Negative Sampling • “Linguistic Regularities”
king man woman queen
Linguistic Regularities in Sparse and Explicit Word RepresentationsFriday, 2:00 PM, CoNLL 2014
Skip-Gram with Negative Sampling v2.0• Original implementation assumes bag-of-words contexts• We generalize to arbitrary contexts
• Dependency contexts create qualitatively different word embeddings
• Provide a new tool for linguistically analyzing embeddings
How does Skip-Gram work?• Skip-gram represents each word as a vector
• Skip-gram represents each context word as a different vector
• Same word has 2 different embeddings (as “word”, as “context”)
Our ModificationText
Arbitrary Contexts
Word-Context Pairs
Learning
Modified word2vec publicly available!
Our Modification: ExampleText (Wikipedia)
Syntactic Contexts (Stanford Dependencies)
Word-Context Pairs
Learning
What is the effect of different context types?• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…
General Conclusion:• Bag-of-words contexts induce topical similarities• Dependency contexts induce functional similarities• Share the same semantic type• Cohyponyms
• Does this hold for embeddings as well?
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) DependenciesDumbledore Sunnydale
hallows CollinwoodHogwarts half-blood Calarts
(Harry Potter’s school) Malfoy GreendaleSnape Millfield
Related to Harry Potter Schools
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) Dependenciesnondeterministic Paulingnon-deterministic Hotelling
Turing computability Heting(computer scientist) deterministic Lessing
finite-state Hamming
Related to computability Scientists
Online Demo!
Embedding Similarity with Different Contexts
Target Word Bag of Words (k=5) Dependenciessinging singingdance rapping
dancing dances breakdancing(dance gerund) dancers miming
tap-dancing busking
Related todance Gerunds
Embedding Similarity with Different Contexts• Dependency-based embeddings have more functional similarities
• This phenomenon goes beyond these examples
• Quantitative Analysis (in the paper)
Dependency-based embeddings have more functional similarities
Quantitative Analysis
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall
Prec
ision
Dependencies
BoW (k=2)
BoW (k=5)
Dependency Contexts & Functional Similarity• Thoroughly studied in explicit representations (distributional)• Lin (1998), Padó and Lapata (2007), and many others…
• In explicit representations, we can look at the features and analyze
• But embeddings are a black box!• Dimensions are latent and don’t necessarily have any meaning
Peeking into Skip-Gram’s Black Box• Skip-Gram allows a peek…
• Contexts are embedded in the same space!
• Given a word , find the contexts it “activates” most:
Associated ContextsTarget Word Dependencies
students/prep_at-1
educated/prep_at-1
Hogwarts student/prep_at-1
stay/prep_at-1
learned/prep_at-1
Associated ContextsTarget Word Dependencies
machine/nn-1
test/nn-1
Turing theorem/poss-1
machines/nn-1
tests/nn-1
Associated ContextsTarget Word Dependencies
dancing/conjdancing/conj-1
dancing singing/conj-1
singing/conjballroom/nn
Analyzing Embeddings• We found a way to linguistically analyze embeddings
• Together with the ability to engineer contexts…
• …we now have the tools to create task-tailored embeddings!
Conclusion• Generalized Skip-Gram with Negative Sampling to arbitrary contexts
• Different contexts induce different similarities
• Suggest a way to peek inside the black box of embeddings
• Code, demo, and word vectors available from our websites
• Make linguistically-motivated task-tailored embeddings today!Thank you for listening :)