Linguistic regularities in sparse and explicit word representations conll-2014
Transcript of Linguistic regularities in sparse and explicit word representations conll-2014
Linguistic Regularities in Sparse and Explicit
Word Representations
Omer Levy Yoav Goldberg
Bar-Ilan University
Israel
Neural Embeddings
β’ Dense vectors
β’ Each dimension is a latent feature
β’ Common software package: word2vec
πΌπ‘πππ¦: (β7.35, 9.42, 0.88,β¦ ) β β100
β’ βMagicβ
king β man + woman = queen
(analogies)
Explicit Representations (Distributional)
β’ Sparse vectors
β’ Each dimension is an explicit context
β’ Common association metric: PMI, PPMI
πΌπ‘πππ¦: π πππ: 17, πππ π‘π: 5, πΉπππ‘: 2, β¦ β β πππππ β100,000
β’ Does the same βmagicβ work for explicit representations too?
β’ Baroni et al. (2014) showed that embeddings outperform explicit, butβ¦
Questions
β’ Are analogies unique to neural embeddings?
Compare neural embeddings with explicit representations
β’ Why does vector arithmetic reveal analogies?
Unravel the mystery behind neural embeddings and their βmagicβ
Mikolov et al. (2013a,b,c)
β’ Neural embeddings have interesting geometries
β’ These patterns capture βrelational similaritiesβ
β’ Can be used to solve analogies:
man is to woman as king is to queen
Mikolov et al. (2013a,b,c)
β’ Neural embeddings have interesting geometries
β’ These patterns capture βrelational similaritiesβ
β’ Can be used to solve analogies:
π is to πβ as π is to πβ
β’ Can be recovered by βsimpleβ vector arithmetic:
π β πβ = π β πβ
Mikolov et al. (2013a,b,c)
β’ Neural embeddings have interesting geometries
β’ These patterns capture βrelational similaritiesβ
β’ Can be used to solve analogies:
π is to πβ as π is to πβ
β’ With simple vector arithmetic:
π β πβ = π β πβ
best β good + strong = strongest
Mikolov et al. (2013a,b,c)
vectors in βπ
π π πβ πβ
β’ Experiment: compare embeddings to explicit representations
Are analogies unique to neural embeddings?
Are analogies unique to neural embeddings?
β’ Experiment: compare embeddings to explicit representations
Are analogies unique to neural embeddings?
β’ Experiment: compare embeddings to explicit representations
β’ Learn different representations from the same corpus:
Are analogies unique to neural embeddings?
β’ Experiment: compare embeddings to explicit representations
β’ Learn different representations from the same corpus:
β’ Evaluate with the same recovery method:
argmaxπβ
cos πβ, π β π + πβ
Analogy Datasets
β’ 4 words per analogy: π is to πβ as π is to πβ
β’ Given 3 words: π is to πβ as π is to ?
β’ Guess the best suiting πβ from the entire vocabulary πβ’ Excluding the question words π, πβ, π
β’ MSR: ~8000 syntactic analogies
β’ Google: ~19,000 syntactic and semantic analogies
Embedding vs Explicit (Round 1)
Embedding54%
Embedding63%
Explicit29%
Explicit45%
0%
10%
20%
30%
40%
50%
60%
70%
MSR Google
Acc
ura
cy
Many analogies recovered by explicit, but many more by embedding.
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
β’ This is done with cosine similarity:
argmaxπββπ
cos πβ, π β π + πβ =
argmaxπββπ
cos πβ, π β cos πβ, π + cos πβ, πβ
Problem: one similarity might dominate the rest.
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
β’ This is done with cosine similarity:
argmaxπβ
cos πβ, π β π + πβ =
argmaxπββπ
cos πβ, π β cos πβ, π + cos πβ, πβ
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
β’ This is done with cosine similarity:
argmaxπβ
cos πβ, π β π + πβ =
argmaxπβ
cos πβ, π β cos πβ, π + cos πβ, πβ
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
β’ This is done with cosine similarity:
argmaxπβ
cos πβ, π β π + πβ =
argmaxπβ
cos πβ, π β cos πβ, π + cos πβ, πβ
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest πβ to π β π + πβ
β’ This is done with cosine similarity:
argmaxπβ
cos πβ, π β π + πβ =
argmaxπβ
cos πβ, π β cos πβ, π + cos πβ, πβ
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest π₯ to ππππ βπππ + π€ππππ
β’ This is done with cosine similarity:
argmaxπ₯
cos π₯, ππππ β πππ + π€ππππ =
argmaxπ₯
cos π₯, ππππ β cos π₯,πππ + cos π₯, π€ππππ
vector arithmetic = similarity arithmetic
Why does vector arithmetic reveal analogies?
β’ We wish to find the closest π₯ to ππππ βπππ + π€ππππ
β’ This is done with cosine similarity:
argmaxπ₯
cos π₯, ππππ β πππ + π€ππππ =
argmaxπ₯
cos π₯, ππππ β cos π₯,πππ + cos π₯, π€ππππ
vector arithmetic = similarity arithmetic
royal? female?
What does each similarity term mean?
β’ Observe the joint features with explicit representations!
πππππ β© ππππ πππππ β© πππππ
uncrowned Elizabeth
majesty Katherine
second impregnate
β¦ β¦
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
The Additive Objective
cos πΌπππ, πΈππππππ β cos πΌπππ, πΏπππππ + cos πΌπππ, π΅ππβπππ
0.15 0.13 0.63 = 0.65
0.13 0.14 0.75 = 0.74
cos πππ π’π, πΈππππππ β cos πππ π’π, πΏπππππ + cos πππ π’π, π΅ππβπππ
β’ Problem: one similarity might dominate the rest
β’ Much more prevalent in explicit representation
β’ Might explain why explicit underperformed
How can we do better?
β’ Instead of adding similarities, multiply them!
argmaxπβ
cos πβ, π cos πβ, πβ
cos πβ, π
How can we do better?
β’ Instead of adding similarities, multiply them!
argmaxπβ
cos πβ, π cos πβ, πβ
cos πβ, π
Multiplication > Addition
Add54%
Add63%
Add29%
Add45%
Mul59%
Mul67% Mul
57%
Mul68%
0%
10%
20%
30%
40%
50%
60%
70%
80%
MSR Google MSR Google
Embedding Explicit
Acc
ura
cy
Explicit is on-par with Embedding
Embedding59%
Embedding67%Explicit
57%
Explicit68%
0%
10%
20%
30%
40%
50%
60%
70%
80%
MSR Google
Acc
ura
cy
Explicit is on-par with Embedding
β’ Embeddings are not βmagicalβ
β’ Embedding-based similarities have a more uniform distribution
β’ The additive objective performs better on smoother distributions
β’ The multiplicative objective overcomes this issue
Conclusion
β’ Are analogies unique to neural embeddings?
No! They occur in sparse and explicit representations as well.
β’ Why does vector arithmetic reveal analogies?
Because vector arithmetic is equivalent to similarity arithmetic.
β’ Can we do better?
Yes! The multiplicative objective is significantly better.
More Results and Analyses (in the paper)
β’ Evaluation on closed-vocabulary analogy questions (SemEval 2012)
β’ Experiments with a third objective function (PairDirection)
β’ Do different representations reveal the same analogies?
β’ Error analysis
β’ A feature-level interpretation of how word similarity reveals analogies