Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep...

141
. . Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100

Transcript of Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep...

Page 1: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Word EmbeddingsDeep Learning for NLP

Kevin Patel

ICON 2017

December 21, 2017

Kevin Patel Word Embeddings 1/100

Page 2: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Outline

...1 Introduction

...2 Word EmbeddingsCount Based EmbeddingsPrediction Based EmbeddingsMultilingual Word EmbeddingsInterpretable Word Embeddings

...3 Evaluating Word EmbeddingsIntrinsic EvaluationExtrinsic EvaluationEvaluation FrameworksVisualizing Word Embeddings

Kevin Patel Word Embeddings 2/100

Page 3: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Outline

...4 Discussion on Lower Bounds

...5 Applications of Word EmbeddingsAre Word Embeddings Useful for Sarcasm Detection?Iterative Unsupervised Most Frequent Sense Detection using WordEmbeddings

...6 Conclusion

Kevin Patel Word Embeddings 3/100

Page 4: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

Kevin Patel Word Embeddings 4/100

Page 5: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

Kevin Patel Word Embeddings 4/100

Page 6: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

Kevin Patel Word Embeddings 4/100

Page 7: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

Kevin Patel Word Embeddings 4/100

Page 8: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

Kevin Patel Word Embeddings 4/100

Page 9: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

How to place words to learn,say, Binary SentimentClassification?

Good: PositiveAwesome: PositiveBad: Negative

Kevin Patel Word Embeddings 4/100

Page 10: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Layman(ish) Intro to ML

In simple terms, Machine Learning comprises ofRepresenting data in some numeric formLearning some function on that representation

How to place words to learn,say, Binary SentimentClassification?

Good: PositiveAwesome: PositiveBad: Negative

Kevin Patel Word Embeddings 4/100

Page 11: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Representations for Learning AlgorithmsDetect whether the following image is dog or not?

Basic idea: feed raw pixels as input vectorWorks well:

Inherent structure in the image

Detect whether a word is a dog or not?Labrador

Nothing in spelling of labrador that can connect it to dogNeed a representation of labrador which indicates that it is adog

Kevin Patel Word Embeddings 5/100

Page 12: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Local Representations

Information about a particular item located solely in thecorresponding representational element (dimension)Effectively one unit is turned on in a network, all the othersare offNo sharing between represented dataEach feature is independentNo generalization on the basis of similarity between features

Kevin Patel Word Embeddings 6/100

Page 13: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Distributed Representations

Information about a particular item distributed among a set of(not necessarily) mutually exclusive representational elements(dimensions)

One item spread over multiple dimensionsOne dimension contributing to multiple items

A new input is processed similar to samples in training datawhich were similar

Better generalization

Kevin Patel Word Embeddings 7/100

Page 14: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Distributed Representations: Example

Number Local Representation Distributed Representation0 1 0 0 0 0 0 0 0 0 0 01 0 1 0 0 0 0 0 0 0 0 12 0 0 1 0 0 0 0 0 0 1 03 0 0 0 1 0 0 0 0 0 1 14 0 0 0 0 1 0 0 0 1 0 05 0 0 0 0 0 1 0 0 1 0 16 0 0 0 0 0 0 1 0 1 1 07 0 0 0 0 0 0 0 1 1 1 1

Kevin Patel Word Embeddings 8/100

Page 15: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Embeddings : IntuitionWord Embeddings: distributed vector representations of wordssuch that the similarity among vectors correlate with semanticsimilarity among the corresponding words

Given that sim(dog, cat) is more than sim(dog, furniture),cos(−→dog, −→cat) is greater than cos(−→dog, −−−−−→furniture)

Such similarity information uncovered from context

Consider the following sentences:I like sweet food .You like spicy food .They like xyzabc food .

What is xyzabc ?Meaning of words can be inferred from their neighbors(context) and words that share neighbors

Neighbors of xyzabc: like, foodWords that share neighbors of xyzabc: sweet, spicy

Kevin Patel Word Embeddings 9/100

Page 16: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Embeddings : IntuitionWord Embeddings: distributed vector representations of wordssuch that the similarity among vectors correlate with semanticsimilarity among the corresponding words

Given that sim(dog, cat) is more than sim(dog, furniture),cos(−→dog, −→cat) is greater than cos(−→dog, −−−−−→furniture)

Such similarity information uncovered from contextConsider the following sentences:

I like sweet food .You like spicy food .They like xyzabc food .

What is xyzabc ?Meaning of words can be inferred from their neighbors(context) and words that share neighbors

Neighbors of xyzabc: like, foodWords that share neighbors of xyzabc: sweet, spicy

Kevin Patel Word Embeddings 9/100

Page 17: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Geometric metaphor of meaning(Sahlgren, 2006):

Meanings are locations insemantic space, and semanticsimilarity is proximity betweenthe locations.

Distributional Hypothesis (Harris,1970)

Words with similar distributionalproperties have similar meaningsOnly differences in meaningcan be modelled

Kevin Patel Word Embeddings 10/100

Page 18: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Geometric metaphor of meaning(Sahlgren, 2006):

Meanings are locations insemantic space, and semanticsimilarity is proximity betweenthe locations.

Distributional Hypothesis (Harris,1970)

Words with similar distributionalproperties have similar meanings

Only differences in meaningcan be modelled

Kevin Patel Word Embeddings 10/100

Page 19: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Geometric metaphor of meaning(Sahlgren, 2006):

Meanings are locations insemantic space, and semanticsimilarity is proximity betweenthe locations.

Distributional Hypothesis (Harris,1970)

Words with similar distributionalproperties have similar meaningsOnly differences in meaningcan be modelled

Kevin Patel Word Embeddings 10/100

Page 20: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Geometric metaphor of meaning(Sahlgren, 2006):

Meanings are locations insemantic space, and semanticsimilarity is proximity betweenthe locations.

Distributional Hypothesis (Harris,1970)

Words with similar distributionalproperties have similar meaningsOnly differences in meaningcan be modelled

Kevin Patel Word Embeddings 10/100

Page 21: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Geometric metaphor of meaning(Sahlgren, 2006):

Meanings are locations insemantic space, and semanticsimilarity is proximity betweenthe locations.

Distributional Hypothesis (Harris,1970)

Words with similar distributionalproperties have similar meaningsOnly differences in meaningcan be modelled

Kevin Patel Word Embeddings 10/100

Page 22: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Entire Vector vs. Individual dimensions

Only proximity in the entire space is representedNo phenomenological correlations with dimensions ofhigh-dimensional space (in majority of algorithms)

Those models who do have some correlations, are known asinterpretable models

Kevin Patel Word Embeddings 11/100

Page 23: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Modelling Meaning via Word Embeddings

Co-occurrence matrix (Rubenstein and Goodenough, 1965)A mechanism to capture distributional propertiesRows of co-occurrence matrix can be directly considered asword vectors

Neural Word EmbeddingsVector representations learnt using neural networks - Bengioet al. (2003); Collobert and Weston (2008a); Mikolov et al.(2013b)

Kevin Patel Word Embeddings 12/100

Page 24: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Co-occurrence Matrix

Originally proposed by Schütze (1992)Foundation of count based approaches that followAutomatic derivation of vectorsCollect co-occurrence counts in a matrixRows or columns are the vectors of corresponding wordIf counting in both directions, matrix is symmetricalIf counting in one side, matrix is asymmetrical, and is knownas directional co-occurrence

Kevin Patel Word Embeddings 13/100

Page 25: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Co-occurrence Matrix (contd.)<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Co-occurrence Matrix

word <> I like love hate rate rats cats dogs bats<> 0 4 0 0 0 0 1 1 1 1

I 4 0 1 1 1 1 0 0 0 0like 0 1 0 0 0 0 0 1 0 0love 0 1 0 0 0 0 0 0 1 0hate 0 1 0 0 0 0 1 0 0 0rate 0 1 0 0 0 0 0 0 0 1rats 1 0 0 0 1 0 0 0 0 0cats 1 0 1 0 0 0 0 0 0 0dogs 1 0 0 1 0 0 0 0 0 0bats 1 0 0 0 0 1 0 0 0 0

Kevin Patel Word Embeddings 14/100

Page 26: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Word EmbeddingsCount Based Embeddings

Kevin Patel Word Embeddings 15/100

Page 27: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

LSA

Latent Semantic AnalysisOriginally developed as Latent Semantic Indexing (LSI)(Dumais et al., 1988)Adapted for word-space modelsDeveloped to tackle inability of models of co-occurrencematrices to handle synonymy

Query about hotels cannot retrieve results about motelsWords and Documents dimensions → Latent dimensions

Uses Singular Value Decomposition (SVD) for dimensionalityreduction

Kevin Patel Word Embeddings 16/100

Page 28: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

LSA (contd.)

Words-by-documents matrixEntropy based weighting of co-occurrences

fij = (log(TFij) + 1)× (1− (∑

j(pijlogpij

logD ))) (1)

where D is number of documents, TFij is frequency of term iin document j, fi is frequency of term i in documentcollection, and pij =

TFijfi

Truncated SVD to reduce dimensionalityCosine measure to compute vector similarities

Kevin Patel Word Embeddings 17/100

Page 29: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

HAL

Hyperspace Analogous to Language (Lund and Burgess,1996a)Developed specifically for word representationsUses directional co-occurrence

Kevin Patel Word Embeddings 18/100

Page 30: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

HAL (contd.)

Directional word by word matrixDistance weighting of the co-occurrencesConcatenation of row-column vectorsDimensionality reduction optional

Discard low variant dimensions

Normalization of vectors to unit lengthSimilarities computed through either Manhattan or Euclideandistance

Kevin Patel Word Embeddings 19/100

Page 31: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Word EmbeddingsPrediction Based Embeddings

Kevin Patel Word Embeddings 20/100

Page 32: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

NNLM

Neural Network Language ModelProposed by Bengio et al. (2003)Predict word given contextWord Vectors learnt as a by-product of language modelling

Kevin Patel Word Embeddings 21/100

Page 33: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

NNLM: Original Model

Kevin Patel Word Embeddings 22/100

Page 34: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

NNLM: Simplified (1)

Kevin Patel Word Embeddings 23/100

Page 35: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

NNLM: Simplified (2)

Kevin Patel Word Embeddings 24/100

Page 36: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Skip Gram

Proposed by Mikolov et al. (2013b)Predict Context given word

Kevin Patel Word Embeddings 25/100

Page 37: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Skip Gram (contd.)

Given a sequence of training words w1,w2, . . . ,wT, maximize

1

T

T∑t=1

∑−c≤j≤c,j

logp (wt+j|wt) (2)

wherep(wO|wI) =

exp(uTwOvwI)∑W

w=1 exp(uTwvwI)

(3)

Kevin Patel Word Embeddings 26/100

Page 38: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Global Vectors (GloVe)

Proposed by Pennington et al. (2014)Predict Context given wordSimilar to Skip-gram, but objective function is different

J =

V∑i,j=1

f(Xij)(wTi wj + bi + bj − logXij)

2 (4)

where Xij can be likelihood of ith and jth word occuringtogether, and f is a weightage function

Kevin Patel Word Embeddings 27/100

Page 39: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Tuning word embeddings

Techniques which intend to tune already trained wordembeddings to various tasks using additional informationLing et al. (2015) improve quality of word2vec for syntactictasks such as POS

Take word positioning into accountStructured Skip-Gram and Continuous Windows: available aswang2vec

Levy and Goldberg (2014) use dependency parse treesLinear windows capture broad topical similarities, anddependency context captures functional similarities

Patel et al. (2017) use medical code hierarchy to improvemedical domain specific word embeddings

Kevin Patel Word Embeddings 28/100

Page 40: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Word EmbeddingsMultilingual Word Embeddings

Kevin Patel Word Embeddings 29/100

Page 41: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ObjectiveEnglish data >> Data for other languagesLanguage independent phenomenon learnt on English shouldbe applicable to other languagesSolution via word embeddings:

Project words of different languages into a common subspaceGoal of multilingual word embeddings: Shared subspace for alllanguagesNeural MT learns such embeddings implicitly by optimizingthe MT objectiveWe shall discuss explicit models

These models are for MT what word2vec, GloVe are for NLPMuch lower cost of training as compared to Neural MT

Applications: Machine Translation, Automated BilingualDictionary Generation, Cross-lingual Information Retrieval,etc.

Kevin Patel Word Embeddings 30/100

Page 42: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Types of Cross-lingual Embeddings

Based on the underlying approaches:Monolingual mappingCross-lingual trainingJoint optimization

Based on the resource used:Word-aligned dataSentence-aligned dataDocument-aligned dataLexiconNo parallel data

Kevin Patel Word Embeddings 31/100

Page 43: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Monolingual Mapping

Learning in two step:Train separate embeddings we and wf on large monolingualcorpora of corresponding languages e and fLearn transformations g1 and g2 such that we = g1(wf) andwf = g2(we)

Transformations learnt using bilingual word mappings (lexicon)

Kevin Patel Word Embeddings 32/100

Page 44: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Monolingual Mapping (contd.)

Linear Projection proposed by Mikolov et al. (2013a)

Learn matrix W s.t

we ≈ W.wf

which minimizesn∑i=1

∥Wwf − we∥2

We adapted this method for automatic synset linking inmultilingual wordnets (accepted at GWC 2018)

Kevin Patel Word Embeddings 33/100

Page 45: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Monolingual Mapping (Contd.)

Linear projection (Mikolov et al., 2013a): LexiconProjection via CCA (Faruqui and Dyer, 2014b): LexiconAlignment-based projection (Guo et al., 2015): Word-aligneddataAdversarial auto-encoder (Barone, 2016): No parallel data

Kevin Patel Word Embeddings 34/100

Page 46: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Cross Lingual Training

Goal: optimizing cross-lingual objectiveMainly rely on sentence alignmentsRequire parallel corpus for training

Kevin Patel Word Embeddings 35/100

Page 47: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Cross Lingual Training (contd.)

Bilingual CompositionalSentence Model proposed byHermann and Blunsom(2013)Train two models to producesentence representations ofaligned sentences in twolanguagesMinimize distance betweensentence representations ofaligned sentences

Kevin Patel Word Embeddings 36/100

Page 48: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Cross Lingual Training (contd.)

Bilingual compositional sentence model (Hermann andBlunsom, 2013): Sentence-aligned dataDistributed word alignment (Kočiskỳ et al., 2014):Sentence-aligned dataTranslation-invariant LSA (Huang et al., 2015): LexiconInverted Indexing on Wikipedia (Søgaard et al., 2015):Document-aligned data

Kevin Patel Word Embeddings 37/100

Page 49: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Joint Optimization

Jointly optimize both monolingual M and cross-lingual ΩconstraintsObjective: minimize Ml1 + Ml2 + λ.Ωl1→l2 +Ωl2→l1where λ decides weightage of cross-lingual constraints

Kevin Patel Word Embeddings 38/100

Page 50: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Joint Optimization (contd.)

Multitask Language Model proposed by Klementiev et al.(2012):

Train neural language model (NNLM)Jointly optimize monolingual maximum likelihood (M) withword alignment based MT regularization term (Ω)

Kevin Patel Word Embeddings 39/100

Page 51: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Joint Optimization (contd.)

Multi-task language model (Klementiev et al., 2012):Word-aligned dataBilingual skip-gram (Luong et al., 2015): Word-aligned dataBilingual bag-of-words without alignment (Gouws et al.,2015): Sentence-aligned dataBilingual sparse representations (Vyas and Carpuat, 2016):Word-aligned data

Kevin Patel Word Embeddings 40/100

Page 52: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Word EmbeddingsInterpretable Word Embeddings

Kevin Patel Word Embeddings 41/100

Page 53: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Interpretability and Explainability

A model is interpretable if a human can make sense out of itExample: Decision treesInterpretable models enable one to explain the performance ofthe system and tune it accordinglyHowever, in practice, interpretable models generally performpoor compared to other systems

Kevin Patel Word Embeddings 42/100

Page 54: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Interpretable Word EmbeddingsDimensions interpretable by ordering words based on value

Word d205

iguana 0.599371bueller 0.584335chimpanzee 0.577834wasp 0.556845chimp 0.553980hamster 0.534810giraffe 0.532316unicorn 0.529533caterpillar 0.528376baboon 0.526324gorilla 0.521590tortoise 0.519941sparrow 0.516842lizard 0.515716cockroach 0.505015crocodile 0.491139alligator 0.486275moth 0.471682kangaroo 0.469284toad 0.463514

Word d272

thigh 0.875286knee 0.872282shoulder 0.866209elbow 0.857403wrist 0.852959ankle 0.851555groin 0.841347forearm 0.837988leg 0.836661pelvis 0.777564neck 0.758420spine 0.754774torso 0.707458hamstring 0.701921buttocks 0.689092knees 0.676485ankles 0.658485jaw 0.653126biceps 0.650972hips 0.647000

Examples from NNSE embeddings Murphy et al. (2012)

Kevin Patel Word Embeddings 43/100

Page 55: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

NNSENon Negative Sparse Embeddings proposed by Murphy et al.(2012)Word embeddings interpretable and cognitively plausiblePerforms a mixture of topical and taxonomical semanticsComputation

Dependency co-occurrence adjusted with PPMI (to normalizefor word frequency) and reduced with sparse SVDDocument co-occurrence adjusted with PPMI and reducedwith sparse SVDTheir union factorized using a variant of non-negative sparsecoding

Resulting word embeddings have both topical neighbors(judge is near to prison) and taxonomical neighbors (judge isnear to referee)Code unavailable, embeddings available athttp://www.cs.cmu.edu/~bmurphy/NNSE/

Kevin Patel Word Embeddings 44/100

Page 56: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

OIWE

Online Interpretable Word Embeddings proposed by Luo et al.(2015)Main idea: apply sparsity to skip gramAchieve sparsity by setting to 0 any dimensions of a vectorthat falls below 0Propose two techniques to do this via gradient descentThey outperform NNSE at word intrusion taskCode available on Github athttps://github.com/SkTim/OIWE

Kevin Patel Word Embeddings 45/100

Page 57: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Evaluating Word EmbeddingsIntrinsic Evaluation

Kevin Patel Word Embeddings 46/100

Page 58: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Pair Similarity

Evaluates generalizability of word embeddingsOne of the most widely used evaluationsMany datasets available: WS353, RG65, MEN, SimLex,SCWS, etc.

Word1 Word2 HumanScore

Model1Score

Model2Score

street street 10.00 1.0 1.0street avenue 8.88 0.04 0.38street block 6.88 0.14 0.26street place 6.44 0.21 0.18street children 4.94 -0.08 0.15Spearman Correlation 0.6 1.0

Kevin Patel Word Embeddings 47/100

Page 59: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Analogy task

Proposed by Mikolov et al. (2013b)Try to answer the questionman is to woman as king is to ?Often discussed in media

Kevin Patel Word Embeddings 48/100

Page 60: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Categorization

Evaluates the ability of embeddings to form proper clustersGiven sets of words with different labels, try to cluster them,and check the correspondence between clusters and sets.The purer the cluster, the better is the embeddingsDatasets available: Bless, Battig, etc.

Kevin Patel Word Embeddings 49/100

Page 61: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Intrusion Detection

Proposed by Murphy et al. (2012)Provides a way to interpret dimensionsMost approaches do not report results on this task

Experiments done by us suggest many of them are notinterpretable

Kevin Patel Word Embeddings 50/100

Page 62: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Word Intrusion Detection (contd.)

The approach:...1 Select a dimension...2 Reverse sort all vectors based on this dimension...3 Select top 5 words...4 Select a word, which is in bottom half of this list, and is in top

10 percentile in some other columns...5 Give a random permutation of these 6 words to a human

evaluatorExample: bathroom, closet, attic, balcony, quickly, toilet

...6 Check precision

Kevin Patel Word Embeddings 51/100

Page 63: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Evaluating Word EmbeddingsExtrinsic Evaluation

Kevin Patel Word Embeddings 52/100

Page 64: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Extrinsic Evaluations

Evaluating word embeddings on downstream NLP tasks suchas Part of speech tagging, Named Entity Recognition, etc.Makes more sense as we ultimately want to use embeddingsfor such tasksHowever, performance does not solely rely on embeddings

Improvement/Degradation could be due to other factors suchas network architecture, hyperparameters, etc.

If an embedding E1 is better than another embedding E2

when used with some network architecture for NER, does thatmean E1 will be better for all architectures of NER?

Kevin Patel Word Embeddings 53/100

Page 65: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Evaluations on Unified Architectures

Unified architectures such as Collobert and Weston (2008b)used for extrinsic evaluationsFor different tasks, the architecture remains same, except thelast layer, where the output neurons are changed according tothe task at handIf an embedding E1 is better than another embedding E2 onall tasks on such a unified architecture, then we can expect itto be truly better

Kevin Patel Word Embeddings 54/100

Page 66: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Evaluating Word EmbeddingsEvaluation Frameworks

Kevin Patel Word Embeddings 55/100

Page 67: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

WordVectors.org

Proposed by Faruqui and Dyer (2014a)A web interface for evaluating a collection of word pairsimilarity datasets on your embeddings available athttp://wordvectors.org/Also provides visualization for common sets of words like(Male,Female) and (Antonym,Synonym) pairs

Kevin Patel Word Embeddings 56/100

Page 68: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

VecEval

Proposed by Nayak et al. (2016)A web based tool for performing extrinsic evaluationshttp://www.veceval.com/Claimed to support six different tasks: POS, NER, Chunking,Sentiment Analysis, Natural Language Inference, QuestionAnsweringHas never worked for meWeb interface no longer available inactive, code available onGithub at https://github.com/NehaNayak/veceval

Kevin Patel Word Embeddings 57/100

Page 69: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Anago

A Keras implementation of sequence labelling based onLample et al. (2016)’s architectureCan perform POS, NER, SRL, etc.Used in our lab for extrinsic evaluationCode available on Github athttps://github.com/Hironsan/anago

Kevin Patel Word Embeddings 58/100

Page 70: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Evaluating Word EmbeddingsVisualizing Word Embeddings

Kevin Patel Word Embeddings 59/100

Page 71: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Visualizing Word Embeddings

Various ways to visualize word embeddings: PCA, Isomap,tSNE, etc., available in scikit-learn

from sklearn import decomposition, manifoldvis = decomposition.TruncatedSVD(n_components=2) - PCAE_vis = vis.fit_transform( E)plot E_vis here

Check out http://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html for manymethods applied to MNIST visualization

Kevin Patel Word Embeddings 60/100

Page 72: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Related Work

Baroni et al. (2014): Neural word embeddings are better thantraditional methods such as LSA, HAL, RI (Landauer andDumais, 1997; Lund and Burgess, 1996b; Sahlgren, 2005)Levy et al. (2015): Superiority of neural word embeddings notdue to the embedding algorithm, but due to certain designchoices and hyperparameters optimizations

Varies other hyperparameters; keeps number of dimensions =500

Schnabel et al. (2015); Zhai et al. (2016); Ghannay et al.(2016): No justification for chosen number of dimensions intheir evaluationsMelamud et al. (2016): Optimal number of dimensionsdifferent for different evaluations of word embeddings

Kevin Patel Word Embeddings 61/100

Page 73: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Why Dimensions matter?: A Practical Example

Various app developers want to utilize word embeddingsExample memory limit for app: 200 MBSize of Google Pre-trained vectors file: 3.4 GBNatural thought process: decrease dimensions

To what value? 100? 50? 20?

Depends on the words/entities we want to place in the space

Kevin Patel Word Embeddings 62/100

Page 74: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Number of Dimensions and Equidistant points

Kevin Patel Word Embeddings 63/100

Page 75: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Number of Dimensions and Equidistant pointsNumber of dimensions of a vector space imposes a restrictionon the number of equidistant points it can haveGiven that distance is euclidean, if the number of dimensionsλ = N, then maximum number of equidistant points E in thecorresponding space is N + 1 (Swanepoel, 2004)Given that distance is cosine, no closed form solution exists

Dimensions λ and max. no. of equiangular lines E(Barg and Yu, 2014)λ E λ E3 6 18 614 6 19 765 10 20 966 16 21 126

7<=n<=13 28 22 17614 30 23 27615 36 24<=n<=41 27616 42 42 28817 51 43 344

Kevin Patel Word Embeddings 64/100

Page 76: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Objective

.Problem Statement..

......Does the number of pairwise equidistant words enforce a lowerbound on the number of dimensions for word embeddings?

’Equidistance’ determined using co-occurrence matrixPlan of Action:

Verify using a toy corpusEvaluate on actual corpus

Kevin Patel Word Embeddings 65/100

Page 77: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Motivation (1/4)

Consider the following toy corpus:<>I like cats <>I love dogs <>I hate rats <>I rate bats <>Corresponding co-occurrence matrix:

word <> I like love hate rate rats cats dogs batslike 0 1 0 0 0 0 0 1 0 0love 0 1 0 0 0 0 0 0 1 0hate 0 1 0 0 0 0 1 0 0 0rate 0 1 0 0 0 0 0 0 0 1

Distance between any pair of words =√2

The words form a regular tetrahedron

Kevin Patel Word Embeddings 66/100

Page 78: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Motivation (2/4)Mean and Std Dev of Mean of a point’s distance with other points

Dimension Mean Stddev1 0.94 0.942 1.77 0.803 2.63 0.10

−1.5−1.0−0.50.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

.hate .like.love .rate

1d(Before)−1.5−1.0−0.50.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

.hate.like .love.rate

2d(Before)−1.5−1.0−0.50.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

.hate

.like.love

.rate

3d(Before)

−1.5−1.0−0.50.0 0.5 1.0 1.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

.hate.like.love .rate

1d(After)−1.5−1.0−0.50.00.51.01.5

−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

.hate

.like

.love

.rate

2d(After)−2 −1 0 1 2

−2

−1

0

1

2

.hate .like

.love

.rate

3d(After)

Kevin Patel Word Embeddings 67/100

Page 79: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Motivation (3/4)Hypothesis:

If the learning algorithm of word embeddings does not getenough dimensions, then it will fail to uphold the equalityconstraint

Standard deviation of the mean of all pairwise distances will behigher

As we increase the dimension, the algorithm will get moredegrees of freedom to model the equality constraint in abetter way

There will be statistically significant changes in the standarddeviation

Once the lower bound of dimensions is reached, the algorithmgets enough degrees of freedom.

From this point onwards, even if we increase dimensions, therewill not be any statistically significant difference in thestandard deviation

Kevin Patel Word Embeddings 68/100

Page 80: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Motivation (4/4)

Dim σ P-value Dim σ P-value7 0.358 12 0.154 0.00588 0.293 0.0020 13 0.111 0.00019 0.273 0.0248 14 0.044 0.0001

10 0.238 0.0313 15 0.047 0.309611 0.189 0.0013 16 0.054 0.1659

Avg standard deviation (σ) for 15 pairwise equidistant words (along withtwo tail p-values of Welch’s unpaired t-test for statistical significance)

Kevin Patel Word Embeddings 69/100

Page 81: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (1/5)1. Compute the word × word co-occurrence matrix from the corpus

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

word <> I like love hate rate rats cats dogs bats<> 0 4 0 0 0 0 1 1 1 1

I 4 0 1 1 1 1 0 0 0 0like 0 1 0 0 0 0 0 1 0 0love 0 1 0 0 0 0 0 0 1 0hate 0 1 0 0 0 0 1 0 0 0rate 0 1 0 0 0 0 0 0 0 1rats 1 0 0 0 1 0 0 0 0 0cats 1 0 1 0 0 0 0 0 0 0dogs 1 0 0 1 0 0 0 0 0 0bats 1 0 0 0 0 1 0 0 0 0

Kevin Patel Word Embeddings 70/100

Page 82: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (2/5)2. Create the corresponding word × word cosine similarity matrix

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

<> I like love hate rate rats cats dogs bats<> 1.0 0.0 0.8 0.8 0.8 0.8 0.0 0.0 0.0 0.0

I 0.0 1.0 0.0 0.0 0.0 0.0 0.8 0.8 0.8 0.8like 0.8 0.0 1.0 0.5 0.5 0.5 0.0 0.0 0.0 0.0love 0.8 0.0 0.5 1.0 0.5 0.5 0.0 0.0 0.0 0.0hate 0.8 0.0 0.5 0.5 1.0 0.5 0.0 0.0 0.0 0.0rate 0.8 0.0 0.5 0.5 0.5 1.0 0.0 0.0 0.0 0.0rats 0.0 0.8 0.0 0.0 0.0 0.0 1.0 0.5 0.5 0.5cats 0.0 0.8 0.0 0.0 0.0 0.0 0.5 1.0 0.5 0.5dogs 0.0 0.8 0.0 0.0 0.0 0.0 0.5 0.5 1.0 0.5bats 0.0 0.8 0.0 0.0 0.0 0.0 0.5 0.5 0.5 1.0

Kevin Patel Word Embeddings 71/100

Page 83: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (3/5)3. For each similarity value sk, create a graph, where the words are

nodes, and an edge between node i and node j if sim( i, j) = sk

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim=0.0Kevin Patel Word Embeddings 72/100

Page 84: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (3/5)3. For each similarity value sk, create a graph, where the words are

nodes, and an edge between node i and node j if sim( i, j) = sk

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim=0.5Kevin Patel Word Embeddings 72/100

Page 85: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (3/5)3. For each similarity value sk, create a graph, where the words are

nodes, and an edge between node i and node j if sim( i, j) = sk

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim=0.8Kevin Patel Word Embeddings 72/100

Page 86: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (3/5)3. For each similarity value sk, create a graph, where the words are

nodes, and an edge between node i and node j if sim( i, j) = sk

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim=1.0Kevin Patel Word Embeddings 72/100

Page 87: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (4/5)4. Find maximum clique on this graph. The number of nodes inthis clique is the maximum number of pairwise equidistant points

Ek

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim Ek0.5 4

0.8 01.0 0

Sim=0.5Kevin Patel Word Embeddings 73/100

Page 88: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (4/5)4. Find maximum clique on this graph. The number of nodes inthis clique is the maximum number of pairwise equidistant points

Ek

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim Ek0.5 40.8 0

1.0 0

Sim=0.8Kevin Patel Word Embeddings 73/100

Page 89: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (4/5)4. Find maximum clique on this graph. The number of nodes inthis clique is the maximum number of pairwise equidistant points

Ek

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim Ek0.5 40.8 01.0 0

Sim=1.0Kevin Patel Word Embeddings 73/100

Page 90: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach (5/5)5. Reverse lookup Ek to get the number of dimension λ

<> I like cats <> I love dogs <> I hate rats <> I rate bats <>

Sim Ek0.5 40.8 01.0 0

Max 4

λ E λ E3 6 18 614 6 19 765 10 20 966 16 21 126

7<=n<=13 28 22 17614 30 23 27615 36 24<=n<=41 27616 42 42 28817 51 43 344

Kevin Patel Word Embeddings 74/100

Page 91: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Evaluation

Used Brown CorpusFound 19 as lower bound using our approachContext window: 1 to the left and 1 to the rightNumber of dimensions: 1 to 355 randomly initialized models for each configuration (averageresults reported)Intrinsic Evaluation

Word Pair Similarity: Predicting sim(wa,wb) usingcorresponding word embeddingsWord Analogy: Finding missing wd in the relation: a is to b asc is to dCategorization: Checking the purity of clusters formed by wordembeddings

Kevin Patel Word Embeddings 75/100

Page 92: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Results

Performance for Word Pair Similarity task with respect to number ofdimensions

Kevin Patel Word Embeddings 76/100

Page 93: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Results

Performance for Word Analogy task with respect to number of dimensions

Kevin Patel Word Embeddings 76/100

Page 94: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Results

Performance for Categorization task with respect to number ofdimensions

Kevin Patel Word Embeddings 76/100

Page 95: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

AnalysisFound lower bound consistent with experimental evaluation

Poltawa snakestrike burnings Tsar’smiswritten brows maintained South-Eastfar-famed 27% non-dramas octagonalboatyards U-2 Devol mournersHearing sideshow third-story upcomingpram dolphins Croydon neuromuscularGladius pvt littered annoyingvuhranduh athletes eraser provincialismDaly wreaths villain suspiciousnooks fielder belly Gogol’sinterchange two-to-three resemble discountedkidneys Hangman’s commend accordionsummarizing optimality Orlando Leamingtonswift Taras-Tchaikovsky puts groomedspit firmer rosy-fingered Bechhofercampfire Tomas

Set of pairwise equiangular points (vectors) from Brown corpus

Kevin Patel Word Embeddings 77/100

Page 96: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Limitations

The Max Clique finding component of the approachRenders approach intractable for larger corporaNeed to find an alternative

Kevin Patel Word Embeddings 78/100

Page 97: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Applications of Word EmbeddingsAre Word Embeddings Useful for Sarcasm Detection?

Kevin Patel Word Embeddings 79/100

Page 98: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Problem Statement

Detect whether a sentence is sarcastic or not?Especially among those sentences which do not containsentiment bearing words

Example: A woman needs a man just like a fish needs a bicycle

Kevin Patel Word Embeddings 80/100

Page 99: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Motivation

Similarity measure among word embeddings a proxy formeasuring contextual incongruityExample: A woman needs a man just like a fish needs a bicycle

similarity(man,woman) = 0.766similarity(fish,bicycle) = 0.131

Imbalance in similarities above an indication of contextualincongruity

Kevin Patel Word Embeddings 81/100

Page 100: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Approach

Gist of the approach is adding similarity of word embeddingsbased features, such as

Maximum similarity between all pairs of words in a sentenceMinimum similarity between all pairs of words in a sentence

Kevin Patel Word Embeddings 82/100

Page 101: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Evaluation...1 Liebrecht et al. (2013): They consider unigrams, bigrams

and trigrams as features....2 González-Ibánez et al. (2011): Two sets of features:

unigrams and dictionary-based....3 Buschmeier et al. (2014):

Hyperbole (captured by 3 positive or negative words in a row)Quotation marks and ellipsisPositive/Negative Sentiment words followed by an exclamationor question markPositive/Negative Sentiment Scores followed by ellipsis (‘...’)Punctuation, Interjections, and Laughter expressions.

...4 Joshi et al. (2015): In addition to unigrams, they usefeatures based on implicit and explicit incongruity

Implicit incongruity features - patterns with implicit sentiment, extracted in a pre-processing step.Explicit incongruity features - number of sentiment flips, lengthof positive and negative sub-sequences and lexical polarity.

Kevin Patel Word Embeddings 83/100

Page 102: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Results

Word Embedding Average F-score GainLSA 0.453Glove 0.651

Dependency 1.048Word2Vec 1.143

Average gain in F-scores for the four types of word embeddings; Thesevalues are computed for a subset of these embeddings consisting of wordscommon to all four

Kevin Patel Word Embeddings 84/100

Page 103: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

.

......

Applications of Word EmbeddingsIterative Unsupervised Most Frequent Sense Detection using

Word Embeddings

Kevin Patel Word Embeddings 85/100

Page 104: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

WordNet

Groups synonymous words into synsetsSynset example:

Synset ID: 02139199Synset Members: bat, chiropteran Gloss: nocturnal mouselike mammal with forelimbs modified toform membranous wings and anatomical adaptations forecholocation by which they navigateExample: Bats are creatures of the night.

Relations with other synsets (hypernym/hyponym:parent/child, meronym/holonym: part/whole)

Kevin Patel Word Embeddings 86/100

Page 105: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Introduction

Word Sense Disambiguation (WSD) : one of the relativelyhard problems in NLP

Both supervised and unsupervised ML explored in literatureMost Frequent Sense (MFS) baseline: strong baseline forWSD

Given a WSD problem instance, simply assign the mostfrequent sense of that word

Ignores contextReally strong results

Due to skew in sense distribution of dataComputing MFS:

Trivial for sense-annotated corpora, which is not available inlarge amounts.Need to learn from raw data

Kevin Patel Word Embeddings 87/100

Page 106: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Problem Statement.Problem Statement..

......Given a raw corpus, estimate most frequent sense of differentwords in that corpus

Bhingardive et al. (2015) showed that pretrained wordembeddings can be used to compute most frequent senseOur work further strengthens the claim by Bhingardive et al.(2015) that word embeddings indeed capture most frequentsenseOur approach outperforms others at the task of MFSextractionTo compute MFS using our approach:

...1 Train word embeddings on the raw corpus.

...2 Apply our approach on the trained word embeddings.

Kevin Patel Word Embeddings 88/100

Page 107: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Intuition

Strive for consistency in assignment of senses to maintainsemantic congruityExample:

If cricket and bat co-occur a lot, then cricket taking insectsense and bat taking reptile sense is less likely

If cricket and bat co-occur a lot, and cricket’s MFS is sports,then bat taking reptile sense is extremely unlikely

Key point: solve easy words, then use them for difficult wordsIn other words, iterate over degree of polysemy from 2 onward

Kevin Patel Word Embeddings 89/100

Page 108: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Intuition

Strive for consistency in assignment of senses to maintainsemantic congruityExample:

If cricket and bat co-occur a lot, then cricket taking insectsense and bat taking reptile sense is less likelyIf cricket and bat co-occur a lot, and cricket’s MFS is sports,then bat taking reptile sense is extremely unlikely

Key point: solve easy words, then use them for difficult wordsIn other words, iterate over degree of polysemy from 2 onward

Kevin Patel Word Embeddings 89/100

Page 109: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 110: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 111: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 112: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 113: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

wisj is vote for sj due to wiTwo components

Wordnet similarity between mfs(wi) and sjEmbedding space similarity between wi and current word

Kevin Patel Word Embeddings 90/100

Page 114: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 115: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Algorithm

Kevin Patel Word Embeddings 90/100

Page 116: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Parameters

KSimilarity MeasureUnweighted (no vector space component) vs. Weighted

Kevin Patel Word Embeddings 91/100

Page 117: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Evaluation

Two setups:Evaluating MFS as solution for WSDEvaluating MFS as a classification task

Kevin Patel Word Embeddings 92/100

Page 118: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

MFS as solution for WSD

Method Senseval2 Senseval3Bhingardive(reported) 52.34 43.28SemCor(reported) 59.88 65.72Bhingardive 48.27 36.67Iterative 63.2 56.72SemCor 67.61 71.06

Accuracy of WSD using MFS (Nouns)

Kevin Patel Word Embeddings 93/100

Page 119: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

MFS as solution for WSD (contd.)

Method Senseval2 Senseval3Bhingardive(reported) 37.79 26.79Bhingardive(optimal) 43.51 33.78Iterative 48.1 40.4SemCor 60.03 60.98Accuracy of WSD using MFS (All Parts of Speech)

Kevin Patel Word Embeddings 94/100

Page 120: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

MFS as classification task

Method Nouns Adjectives Adverbs Verbs TotalBhingardive 43.93 81.79 46.55 37.84 58.75Iterative 48.27 80.77 46.55 44.32 61.07

Percentage match between predicted MFS and WFS

Kevin Patel Word Embeddings 95/100

Page 121: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

MFS as classification task (contd.)

Nouns(49.20)

Verbs(26.44)

Adjectives(19.22)

Adverbs(5.14) Total

Bhingardive 29.18 25.57 26.00 33.50 27.83Iterative 35.46 31.90 30.43 47.78 34.19

Percentage match between predicted MFS and true SemCor MFS. Notethat numbers in column headers indicate what percent of total wordsbelong to that part of speech

Kevin Patel Word Embeddings 96/100

Page 122: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

AnalysisBetter than Bhingardive et al. (2015); not able to beatSemCor and WFS.

There are words for which WFS doesn’t give proper dominantsense. Consider the following examples:

tiger - an audacious personlife - characteristic state or mode of living (social life, city life,real life)option - right to buy or sell property at an agreed priceflavor - general atmosphere of place or situationseason - period of year marked by special events

Tagged words ranking very low to make a significant impact.For example:

While detecting MFS for a bisemous word, the firstmonosemous neighbour actually ranks 1101i.e. a 1000 polysemous words are closer than this monosemousword.Monosemous word may not be the one who can influence theMFS.

Kevin Patel Word Embeddings 97/100

Page 123: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Summary

Proposed an iterative approach for unsupervised mostfrequent sense detection using word embeddingsSimilar trends, yet better overall results from Bhingardiveet al. (2015)Future Work

Apply approach to other languages

Kevin Patel Word Embeddings 98/100

Page 124: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Conclusion

Discussed why we need word embeddingsBriefly looked at classical word embeddingsDiscussed a few cross-lingual word embeddings andinterpretable word embeddingsMentioned evaluation mechanisms and toolsArgued on existence of lower bounds for number ofdimensions of word embeddingsDiscussed some in-house applications

Kevin Patel Word Embeddings 99/100

Page 125: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Thank You

Kevin Patel Word Embeddings 100/100

Page 126: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Barg, A. and Yu, W.-H. (2014). New bounds for equiangular lines.Contemporary Mathematics, 625:111–121.

Barone, A. V. M. (2016). Towards cross-lingual distributedrepresentations without parallel text trained with adversarialautoencoders. arXiv preprint arXiv:1608.02996.

Baroni, M., Dinu, G., and Kruszewski, G. (2014). Don’t count,predict! a systematic comparison of context-counting vs.context-predicting semantic vectors. In Proceedings of the 52ndAnnual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers), pages 238–247. Association forComputational Linguistics.

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. (2003). Aneural probabilistic language model. J. Mach. Learn. Res.,3:1137–1155.

Kevin Patel Word Embeddings 101/100

Page 127: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Bhingardive, S., Singh, D., V, R., Redkar, H., and Bhattacharyya,P. (2015). Unsupervised most frequent sense detection usingword embeddings. In Proceedings of the 2015 Conference of theNorth American Chapter of the Association for ComputationalLinguistics: Human Language Technologies, pages 1238–1243,Denver, Colorado. Association for Computational Linguistics.

Buschmeier, K., Cimiano, P., and Klinger, R. (2014). An impactanalysis of features in a classification approach to ironydetection in product reviews. In Proceedings of the 5thWorkshop on Computational Approaches to Subjectivity,Sentiment and Social Media Analysis, pages 42–49.

Kevin Patel Word Embeddings 102/100

Page 128: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ReferencesCollobert, R. and Weston, J. (2008a). A unified architecture for

natural language processing: deep neural networks withmultitask learning. In Cohen, W. W., McCallum, A., andRoweis, S. T., editors, ICML, volume 307 of ACM InternationalConference Proceeding Series, pages 160–167. ACM.

Collobert, R. and Weston, J. (2008b). A unified architecture fornatural language processing: deep neural networks withmultitask learning. In Cohen, W. W., McCallum, A., andRoweis, S. T., editors, ICML, volume 307 of ACM InternationalConference Proceeding Series, pages 160–167. ACM.

Dumais, S. T., Furnas, G. W., Landauer, T. K., Deerwester, S.,and Harshman, R. (1988). Using latent semantic analysis toimprove access to textual information. In Proceedings of theSIGCHI conference on Human factors in computing systems,pages 281–285. ACM.

Kevin Patel Word Embeddings 103/100

Page 129: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ReferencesFaruqui, M. and Dyer, C. (2014a). Community evaluation and

exchange of word vectors at wordvectors.org. In Proceedings ofthe 52nd Annual Meeting of the Association for ComputationalLinguistics: System Demonstrations, Baltimore, USA.Association for Computational Linguistics.

Faruqui, M. and Dyer, C. (2014b). Improving vector space wordrepresentations using multilingual correlation. Association forComputational Linguistics.

Ghannay, S., Favre, B., Estéve, Y., and Camelin, N. (2016). Wordembedding evaluation and combination. In Chair), N. C. C.,Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard,B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis,S., editors, Proceedings of the Tenth International Conferenceon Language Resources and Evaluation (LREC 2016), Paris,France. European Language Resources Association (ELRA).

Kevin Patel Word Embeddings 104/100

Page 130: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ReferencesGonzález-Ibánez, R., Muresan, S., and Wacholder, N. (2011).

Identifying sarcasm in twitter: a closer look. In Proceedings ofthe 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies: shortpapers-Volume 2, pages 581–586. Association for ComputationalLinguistics.

Gouws, S., Bengio, Y., and Corrado, G. (2015). Bilbowa: Fastbilingual distributed representations without word alignments. InProceedings of the 32nd International Conference on MachineLearning (ICML-15), pages 748–756.

Guo, J., Che, W., Yarowsky, D., Wang, H., and Liu, T. (2015).Cross-lingual dependency parsing based on distributedrepresentations. In ACL (1), pages 1234–1244.

Harris, Z. S. (1970). Distributional structure. Springer.

Kevin Patel Word Embeddings 105/100

Page 131: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Hermann, K. M. and Blunsom, P. (2013). Multilingual distributedrepresentations without word alignment. arXiv preprintarXiv:1312.6173.

Huang, K., Gardner, M., Papalexakis, E., Faloutsos, C.,Sidiropoulos, N., Mitchell, T., Talukdar, P. P., and Fu, X.(2015). Translation invariant word embeddings. In Proceedingsof the 2015 Conference on Empirical Methods in NaturalLanguage Processing, pages 1084–1088.

Joshi, A., Sharma, V., and Bhattacharyya, P. (2015). Harnessingcontext incongruity for sarcasm detection. In Proceedings of the53rd Annual Meeting of the Association for ComputationalLinguistics and the 7th International Joint Conference onNatural Language Processing, volume 2, pages 757–762.

Kevin Patel Word Embeddings 106/100

Page 132: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Klementiev, A., Titov, I., and Bhattarai, B. (2012). Inducingcrosslingual distributed representations of words. Proceedings ofCOLING 2012, pages 1459–1474.

Kočiskỳ, T., Hermann, K. M., and Blunsom, P. (2014). Learningbilingual word representations by marginalizing alignments. arXivpreprint arXiv:1405.0947.

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., andDyer, C. (2016). Neural architectures for named entityrecognition. arXiv preprint arXiv:1603.01360.

Landauer, T. K. and Dumais, S. T. (1997). A solution to plato’sproblem: The latent semantic analysis theory of acquisition,induction, and representation of knowledge. PSYCHOLOGICALREVIEW, 104(2):211–240.

Kevin Patel Word Embeddings 107/100

Page 133: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Levy, O. and Goldberg, Y. (2014). Dependency-based wordembeddings. In Proceedings of the 52nd Annual Meeting of theAssociation for Computational Linguistics, ACL 2014, June22-27, 2014, Baltimore, MD, USA, Volume 2: Short Papers,pages 302–308.

Levy, O., Goldberg, Y., and Dagan, I. (2015). Improvingdistributional similarity with lessons learned from wordembeddings. Transactions of the Association for ComputationalLinguistics, 3:211–225.

Liebrecht, C., Kunneman, F., and van den Bosch, A. (2013). Theperfect solution for detecting sarcasm in tweets# not. Workshopon Computational Approaches to Subjectivity, Sentiment andSocial Media Analysis, WASSA.

Kevin Patel Word Embeddings 108/100

Page 134: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Ling, W., Dyer, C., Black, A. W., and Trancoso, I. (2015).Two/too simple adaptations of word2vec for syntax problems. InProceedings of the 2015 Conference of the North AmericanChapter of the Association for Computational Linguistics:Human Language Technologies, pages 1299–1304.

Lund, K. and Burgess, C. (1996a). Producing high-dimensionalsemantic spaces from lexical co-occurrence. Behavior ResearchMethods, Instruments, & Computers, 28(2):203–208.

Lund, K. and Burgess, C. (1996b). Producing high-dimensionalsemantic spaces from lexical co-occurrence. Behavior ResearchMethods, Instruments, & Computers, 28(2):203–208.

Kevin Patel Word Embeddings 109/100

Page 135: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ReferencesLuo, H., Liu, Z., Luan, H., and Sun, M. (2015). Online learning of

interpretable word embeddings. In Proceedings of the 2015Conference on Empirical Methods in Natural LanguageProcessing, pages 1687–1692.

Luong, T., Pham, H., and Manning, C. D. (2015). Bilingual wordrepresentations with monolingual quality in mind. In VS@HLT-NAACL, pages 151–159.

Melamud, O., McClosky, D., Patwardhan, S., and Bansal, M.(2016). The role of context types and dimensionality in learningword embeddings. In Knight, K., Nenkova, A., and Rambow, O.,editors, NAACL HLT 2016, The 2016 Conference of the NorthAmerican Chapter of the Association for ComputationalLinguistics: Human Language Technologies, San DiegoCalifornia, USA, June 12-17, 2016, pages 1030–1040. TheAssociation for Computational Linguistics.

Kevin Patel Word Embeddings 110/100

Page 136: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Mikolov, T., Le, Q. V., and Sutskever, I. (2013a). Exploitingsimilarities among languages for machine translation. CoRR,abs/1309.4168.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.(2013b). Distributed representations of words and phrases andtheir compositionality. In Burges, C., Bottou, L., Welling, M.,Ghahramani, Z., and Weinberger, K., editors, Advances inNeural Information Processing Systems 26, pages 3111–3119.Curran Associates, Inc.

Murphy, B., Talukdar, P., and Mitchell, T. (2012). LearningEffective and Interpretable Semantic Models using Non-NegativeSparse Embedding, pages 1933–1949. Association forComputational Linguistics.

Kevin Patel Word Embeddings 111/100

Page 137: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Nayak, N., Angeli, G., and Manning, C. D. (2016). Evaluatingword embeddings using a representative suite of practical tasks.ACL 2016, page 19.

Patel, K., Patel, D., Golakiya, M., Bhattacharyya, P., and Birari,N. (2017). Adapting pre-trained word embeddings for use inmedical coding. In BioNLP 2017, pages 302–306, Vancouver,Canada,. Association for Computational Linguistics.

Pennington, J., Socher, R., and Manning, C. D. (2014). Glove:Global vectors for word representation. Proceedings of theEmpiricial Methods in Natural Language Processing (EMNLP2014), 12.

Rubenstein, H. and Goodenough, J. B. (1965). Contextualcorrelates of synonymy. Commun. ACM, 8(10):627–633.

Kevin Patel Word Embeddings 112/100

Page 138: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

ReferencesSahlgren, M. (2005). An introduction to random indexing. In In

Methods and Applications of Semantic Indexing Workshop atthe 7th International Conference on Terminology and KnowledgeEngineering, TKE 2005.

Sahlgren, M. (2006). The Word-Space Model: Using distributionalanalysis to represent syntagmatic and paradigmatic relationsbetween words in high-dimensional vector spaces. PhD thesis,Institutionen för lingvistik.

Schnabel, T., Labutov, I., Mimno, D., and Joachims, T. (2015).Evaluation methods for unsupervised word embeddings. InProceedings of the Conference on Empirical Methods in NaturalLanguage Processing (EMNLP), pages 298–307.

Schütze, H. (1992). Dimensions of meaning. InSupercomputing’92., Proceedings, pages 787–796. IEEE.

Kevin Patel Word Embeddings 113/100

Page 139: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Søgaard, A., Agić, Ž., Alonso, H. M., Plank, B., Bohnet, B., andJohannsen, A. (2015). Inverted indexing for cross-lingual nlp. InThe 53rd Annual Meeting of the Association for ComputationalLinguistics and the 7th International Joint Conference of theAsian Federation of Natural Language Processing (ACL-IJCNLP2015).

Swanepoel, K. J. (2004). Equilateral sets in finite-dimensionalnormed spaces. In Seminar of Mathematical Analysis,volume 71, pages 195–237. Secretariado de Publicationes,Universidad de Sevilla, Seville.

Vyas, Y. and Carpuat, M. (2016). Sparse bilingual wordrepresentations for cross-lingual lexical entailment. InHLT-NAACL, pages 1187–1197.

Kevin Patel Word Embeddings 114/100

Page 140: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

References

Zhai, M., Tan, J., and Choi, J. D. (2016). Intrinsic and extrinsicevaluations of word embeddings. In Proceedings of the ThirtiethAAAI Conference on Artificial Intelligence, AAAI’16, pages4282–4283. AAAI Press.

Kevin Patel Word Embeddings 115/100

Page 141: Word Embeddings - Deep Learning for NLPshad.pcs15/data/we-kevin.pdf · Word Embeddings Deep Learning for NLP Kevin Patel ICON 2017 December 21, 2017 Kevin Patel Word Embeddings 1/100.

Introduction Word Embeddings Evaluating WordEmbeddings

Discussion on LowerBounds

Applications ofWord Embeddings

Conclusion References

.

Web References

http://www.indiana.edu/~gasser/Q530/Notes/representation.html

Kevin Patel Word Embeddings 116/100