Introduction to word embeddings with Python

Introduction to word embeddings

Pavel Kalaidin@facultyofwonder

Moscow Data Fest, September, 12th, 2015

distributional hypothesis

лойс

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

годно, лойслойс за песню

из принципа не поставлю лойсвзаимные лойсы

лойс, если согласен

What is the meaning of лойс?

кек

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

кек, что ли?кек)))))))ну ты кек

What is the meaning of кек?

vectorial representations of words

simple and flexible platform for

understanding text and probably not messing up

one-hot encoding?

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

co-occurrence matrix

recall: word-document co-occurrence matrix for LSA

credits: [x]

from entire document to window (length 5-10)

still seems suboptimal -> big, sparse, etc.

lower dimensions, we want dense vectors

(say, 25-1000)

matrix factorization?

SVD of co-occurrence matrix

lots of memory?

idea: directly learn low-dimensional vectors

here comes word2vec

Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al: [paper]

idea: instead of capturing co-occurrence counts

predict surrounding words

Two models:C-BOW

predicting the word given its context

skip-grampredicting the context given a word

Explained in great detail here, so we’ll skip it for now Also see: word2vec Parameter Learning Explained, Rong, paper

CBOW: several times faster than skip-gram, slightly better accuracy for the frequent wordsSkip-Gram: works well with small amount of

data, represents well rare words or phrases

Examples?

Wwoman- Wman= Wqueen- Wking

classic example

word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method, Goldberg et al, 2014 [arxiv]

all done with gensim:github.com/piskvorky/gensim/

...failing to take advantage of the vast amount of repetition

in the data

so back to co-occurrences

GloVe for Global VectorsPennington et al, 2014: nlp.stanford.

edu/pubs/glove.pdf

Ratios seem to cancel noise

The gist: model ratios with vectors

The model

Preserving linearity

Preventing mixing dimensions

Restoring symmetry, part 1

recall:

Restoring symmetry, part 2

Least squares problem it is now

SGD->AdaGrad

ok, Python code

glove-python:github.com/maciejkula/glove-python

two sets of vectorsinput and context + bias

average/sum/drop

complexity |V|2

complexity |C|0.8

Evaluation: it works

#spb#gatchina#msk#kyiv#minsk#helsinki

Compared to word2vec

#spb#gatchina#msk#kyiv#minsk#helsinki

t-SNE:github.com/oreillymedia/t-SNE-tutorial

seaborn:stanford.edu/~mwaskom/software/seaborn/

Abusing models

music playlists:github.com/mattdennewitz/playlist-to-vec

deep walk:DeepWalk: Online Learning of Social

Representations [link]

user interestsParagraph vectors: cs.stanford.

edu/~quocle/paragraph_vector.pdf

predicting hashtagsinteresting read: #TAGSPACE: Semantic

Embeddings from Hashtags [link]

RusVectōrēs: distributional semantic models for Russian: ling.go.mail.ru/dsm/en/

corpus matters

building block forbigger models╰(*´︶`*)╯

</slides>

Introduction to word embeddings with Python

Data & Analytics

Transcript of Introduction to word embeddings with Python

Semantic Word Cloud Generation Based on Word Embeddings · (“queen”) in the word analogy task. Thus, this paper uses the word embeddings learned from large related corpora to

Word Embeddings with HD Computing/VSA

Word Meaning Vector Semantics & Embeddings

Dense Word Embeddings

Word embeddings as a service - PyData NYC 2015

Unsupervised morphology induction using word embeddings

Text mining, word embeddings, & wikipedia

Learning Word Embeddings from Tagging Data: A ...ceur-ws.org/Vol-1917/paper32.pdfWord Embeddings. The concept of word embeddings, i.e., word representations in low dimensional vector

ASAPPpy: a Python Framework for Portuguese STSceur-ws.org/Vol-2583/2_ASAPPpy.pdf · · Semantic Relations · Word Embeddings · Supervised Machine Learning. 1 Introduction Semantic

Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word Embeddings with Paraphrases Weijia Shi1*, Muhao Chen1*, Pei Zhou2, Kai-Wei Chang1

Automated disease cohort selection using word embeddings ... · Automated disease cohort selection using word embeddings from Electronic Health Records Benjamin S. Glicksberg1,2*,

pair2vec: Compositional Word-Pair Embeddings for Cross ...

Word Embeddings through Hellinger PCA - Lebret

Word embeddings (II)

LearningToAdapt with Word Embeddings: Domain …

From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

19 word embeddings - Colorado State University

FREDDY: Fast Word Embeddings in Database Systems

Word Embeddings, why the hype ?

Word Embeddings - w4nderlustw4nderlu.st/content/4-teaching/3-word-embeddings/... · Word Embeddings: hot trend in NLP ... Different languages use different ... Zellig Harris, Mathematical

Retrofitting Contextualized Word Embeddings with Paraphrases · Retrofitting Contextualized Word Embeddings with Paraphrases Weijia Shi1, Muhao Chen1, Pei Zhou2, Kai-Wei Chang1