Neural Network Language Models and word2vec...Neural network language models •A neural network...

23
Neural Network Language Models and word2vec Tambet Matiisen 8.10.2014

Transcript of Neural Network Language Models and word2vec...Neural network language models •A neural network...

Page 1: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Neural Network Language Models and word2vec

Tambet Matiisen

8.10.2014

Page 2: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Sources

• Yoshua Bengio. Neural net language models.

• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space.

• Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality.

• Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations.

• Tomas Mikolov, Quoc V. Le and Ilya Sutskever. Exploiting Similarities among Languages for Machine Translation.

Page 3: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Language models

• A language model captures the statistical characteristics of sequences of words in a natural language, typically allowing one to make probabilistic predictions of the next word given preceding ones.

• E.g. the standard “trigram” method:

)(

)(),|(

12

1212

tt

tttttt

wwcount

wwwcountwwwP

Page 4: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Neural network language models

• A neural network language model is a language model based on neural networks, exploiting their ability to learn distributed representations.

• A distributed representation of a word is a vector of activations of neurons (real values) which characterizes the meaning of the word.

• A distributed representation is opposed to a local representation, in which only one neuron (or very few) is active at each time.

Page 5: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Softmax output layer (one unit per next word)

Hidden layer to predict output from features of the input words

Learned distributed representation of word t-2

Learned distributed representation of word t-1

Sparse representation of word t-2

Sparse representation of word t-1

NNLM architecture

V nodes

H nodes

D nodes D nodes

V nodes V nodes

HxV weights

HxD weights HxD weights

VxD weights (shared)

Page 6: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

word2vec

• An efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words.

• The word vectors can be used to significantly improve and simplify many NLP applications.

Page 7: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

CBOW architecture

Predicts current word given the context.

sparse representation

weights = distributed representation NB! Shared!

softmax

Page 8: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Skip-gram architecture

Predicts the surrounding words given the current word

sparse representation

weights = distributed representation

softmax

softmax

softmax

softmax

output weights

Page 9: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Linguistic regularities

The word vector space implicitly encodes many regularities among words, i.e. vector(KINGS) – vector(KING) +

vector(QUEEN) is close to vector(QUEENS)

Page 10: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Semantic-Syntactic Word Relationship test set

Page 11: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Accuracy

days

minutes

hours

Page 12: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

From words to phrases

• Find words that appear frequently together and infrequently in other contexts.

• The bigrams with score above the chosen threshold are then used as phrases.

• The δ is used as a discounting coefficient and prevents too many phrases consisting of very infrequent words to be formed.

)()(

)(),(

ji

ji

jiwcountwcount

wwcountwwscore

Page 13: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Examples - analogy

Page 14: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Examples – distance (rare words)

Page 15: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Examples – addition

Page 16: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Parameters

• Architecture: skip-gram (slower, better for infrequent words) vs CBOW (fast)

• The training algorithm: hierarchical softmax (better for infrequent words) vs negative sampling (better for frequent words, better with low dimensional vectors)

• Sub-sampling of frequent words: can improve both accuracy and speed for large data sets (useful values are in range 1e-3 to 1e-5)

• Dimensionality of the word vectors: usually more is better, but not always

• Context (window) size: for skip-gram usually around 10, for CBOW around 5

Page 17: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Machine translation using distributed representations

1. Build monolingual models of languages using large amounts of text.

2. Use a small bilingual dictionary to learn a linear projection between the languages.

3. Translate a word by projecting its vector representation from the source language space to the target language space.

4. Output the most similar word vector from target language space as the translation.

Page 18: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

English vs Spanish

Page 19: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Translation accuracy

English Spanish English Vietnamese

Page 20: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

How is this related to neuroscience?

Page 21: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

How to calculate similarity matrix

import sys

import gensim

if len(sys.argv) < 2:

print "Usage: matrix.py <vectorfile> <wordfile>"

sys.exit(1)

model = gensim.models.Word2Vec.load_word2vec_format(sys.argv[1], binary=True)

with open(sys.argv[2]) as f:

words = f.read().splitlines()

for w1 in words:

s = ""

for w2 in words:

if s != "": s += ","

s += str(model.similarity(w1, w2))

print s

Page 22: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Discovery of structural form - animals

Page 23: Neural Network Language Models and word2vec...Neural network language models •A neural network language model is a language model based on neural networks, exploiting their ability

Discovery of structural form - cities