From Word Embeddings To Document...
Transcript of From Word Embeddings To Document...
From Word Embeddings To Document DistancesMatt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger
Goal: a distance between two documents
?
Applicationsdocument classification multi-lingual document matching
song identification
Word Embeddingword2vec
trained on 100 billion words
3 million different words embedded
[Mikolov et al., 2013]
Rd
different from [Collobert & Weston, 2008]
[Mnih & Hinton, 2009] word2vec is not deep!
words
Word Embeddingword2vec
[Mikolov et al., 2013]
Rd
X 2 Rd⇥n
distance between words i and j:
kxi � xjk2
is roughly their dissimilarity
xixj
words
Word Embeddingword2vec
[Mikolov et al., 2013]
Rd
KingMan
Woman Queen
How can we leverage these high quality word embeddings
to compute document distances?
catdogruntree
carrace
frog
oatpicklefly
patt
ern ocean
steam
rock rocket
upbabybunny win
Word Mover’s Distance
Goal
?
Word Mover’s Distance
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
Word Mover’s Distance
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
Word Mover’s Distance
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
word mover’s distance = minimum word distance to transform mass d into d’
Word Mover’s Distance
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
Word Mover’s Distance
Word Mover’s Distance
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Chicago
Word Mover’s Distance
word embeddingRd
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
i
j
1/4
IllinoisChicago
T
Word Mover’s Distance
word embeddingRd
Obama
Illinois
press
President
greets
speaks
PPresidentgreetspressChicago
1/4
d0
Chicago
P
Obama
speaksIllinois
d1/41/2
Word Mover’s Distance
word embeddingRd
Obama
Illinois
press
President
greets
speaks
PPresidentgreetspressChicago
1/4
d0
Chicago
P
Obama
speaksIllinois
d1/41/2
Word Mover’s Distance
word embeddingRd
Obama
Illinois
press
President
greets
speaks
PPresidentgreetspressChicago
1/4
d0
Chicago
P
Obama
speaksIllinois
d1/41/2
Word Mover’s DistanceP
PresidentgreetspressChicago
1/4
d0WMD(d,d0) ,P
Obama
speaksIllinois
d1/41/2
minT�0
nX
i,j=1
Tijkxi � xjk2
Obama
press
Presidentxi
xj
xk
1/4
1/4
1/2
Word Mover’s DistanceP
PresidentgreetspressChicago
1/4
d0WMD(d,d0) ,WMD(d,d0) ,
8i
8j
WMD(d,d0) ,
s.t.
P
Obama
speaksIllinois
d1/41/2
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
Remarks
in CV this is the Earth Mover’s Distance (EMD) [Rubner et al., 1998]
an old optimal transport problem [Monge, 1781]
nX
j=1
Fij = di
nX
i=1
Fij = d0j
s.t. 8i
8j
WMD(d,d0) ,
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
How well does WMD perform on document classification via k-
nearest neighbors (k-NN)?
Classic Approaches
3
0
0
1
0
2
0
0
0
0
1
‘campaign’
‘Obama’
‘speech’
‘Washington’
bag-of-words TF-IDF
0.01
0
0
0.2
0
0.04
0
0
0
0
0.13
LSI LDA
‘Oba
ma’
‘speech’
‘a’
‘Oba
ma’
‘speech’
‘a’
topic distributions
sports
politics
music
Civil War
politics topic
‘Obam
a’‘speech’‘W
ashington’
‘football’‘M
adonna’‘Vicksburg’‘guitar’
‘soccer’
[Salton & Buckley, 1988] [Deerwester et al., 1990] [Blei et al., 2003]
Results: k-NN
1 2 3 4 5 6 7 8010203040506070
twitter recipe ohsumed classic reuters amazon
test
err
or % 43
33
44
33 32 3229
6663 61
49 51
44
36
8.0 9.7
62
44 4135
6.95.06.72.8
3329
148.16.96.3
3.5
59
42
28
1417
129.37.4
34
1722 21
8.46.44.3
21
4.6
53 5359
5448
4543
5156 54
58
3640
31 29 27
20newsbbcsport
k-nearest neighbor errorBOW [Frakes & Baeza-Yates, 1992] TF-IDF [Jones, 1972]Okapi BM25 [Robertson & Walker, 1994]
LSI [Deerwester et al., 1990]LDA [Blei et al., 2003]mSDA [Chen et al., 2012]Componential Counting Grid [Perina et al., 2013]
Word Mover's Distance
train inputs:
BOW dim:
517
13243 6344
2176 3059
5708
3999
31789
4965
24277
5485
22425
5600
42063
11293
29671
All hyper-parameters set with bayesopt.m [Gardner et al. 2014]
1 2 3 4 5 6 7 80
0.20.40.60.81
1.21.41.6
aver
age
erro
r w.r.
t. BO
W 1.61.41.21.00.80.60.40.2
0
1.291.15
1.00.72
0.60 0.55 0.49 0.42
BOW TF-IDF Okapi BM25
LSI LDA mSDA CCG
WMD
Results: k-NN
LP with 2n constraintsO(n3 log n)
[Pele & Werman, 2009]
minF�0
nX
i,j=1
Fijkxi � xjk2
nX
j=1
Fij = di
nX
i=1
Fij = d0j
s.t. 8i
8j
WMD(d,d0) ,
Computational Complexity
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
[Rubner et al., 1998]; [Levina & Bickel, 2001]; [Grauman & Darrell, 2004]; [Shirdhonkar & Jacobs, 2008]
approximations:
minF�0
nX
i,j=1
Fijkxi � xjk2
nX
j=1
Fij = di
nX
i=1
Fij = d0j
s.t. 8i
8j
WMD(d,d0) ,
Computational Complexity
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
Approximation 1[Rubner et al., 1998]
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
Chicago
P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
[Rubner et al., 1998]
word embeddingRd
Obama
Illinois
mediapress
President
greets
speaks
Chicago
Approximation 1P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Rd
Obama
Illinois
mediapress
President
greets
speaks
Chicago
[Rubner et al., 1998]
word embedding
Approximation 1P
Obama
speaksmediaIllinois
d1/4
PPresidentgreetspressChicago
1/4
d0
Word Centroid DistanceWCD(d,d0) , kXd�Xd0k2
O(nd)
Xd0Xd
Faster Approximations
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.2
1.00.8
0.6
0.4
0.2
1.4
0500 1500 25001000 20000
amazon
WMDRWMDWCD
training input index
dist
ance
1.6
1.4
1.2
1.0
0.80.6
0.4
1.6
0.2
200 400 800 10006000training input index
0
for a random test input...
minF�0
nX
i,j=1
Fijkxi � xjk2
nX
j=1
Fij = di
nX
i=1
Fij = d0j
s.t. 8i
8j
Approximation 2
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
Approximation 2
D1
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
just a nearest-neighbor search!
Approximation 2
D1
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
Approximation 2
just a nearest-neighbor search!
D2
Relaxed Word Mover’s Distance
O(n2d)
Approximation 2
RWMD(d,d0) , max(D1, D2)
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
D1
8i
8j
s.t.
minT�0
nX
i,j=1
Tijkxi � xjk2
nX
j=1
Tij = di
nX
i=1
Tij = d0j
D2
Faster Approximations
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.2
1.00.8
0.6
0.4
0.2
1.4
0500 1500 25001000 20000
amazon
WMDRWMDWCD
training input index
dist
ance
1.6
1.4
1.2
1.0
0.80.6
0.4
1.6
0.2
200 400 800 10006000training input index
0
for a random test input...
Faster Approximations
0 500 1000 1500 2000 25000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 100 200 300 400 500 600 700 800 900 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.2
1.00.8
0.6
0.4
0.2
1.4
0500 1500 25001000 20000
amazon
WMDRWMDWCD
training input index
dist
ance
1.6
1.4
1.2
1.0
0.80.6
0.4
1.6
0.2
200 400 800 10006000training input index
0
for a random test input...
Faster Approximations
10
0.20.40.60.81
1.21.21.0
0.8
0.6
0.4
0.2
RWMD WMDWCDRWMDRWMD
0.93
0.56 0.540.45 0.42
0
average kNN error w.r.t. BOW
c2 c1
1.21.0
0.8
0.6
0.4
0.2
RWMD WMDWCDRWMDRWMD
0.92 0.92
0.30
0.96 1.0
0
average distance w.r.t. WMD
c2 c1 D1 D2
Other Embeddings
Conclusion
• Word Mover’s Distance:
• document distances fromword embeddings
• Very accurate as it leverages high quality word2vec embedding
• Fast through approximationsO(n2d)O(n3 log n) O(nd)
WCD RWMDWMD
1 2 3 4 5 6 7 80
0.20.40.60.81
1.21.41.6
aver
age
erro
r w.r.
t. BO
W 1.61.41.21.00.80.60.40.2
0
1.291.15
1.00.72
0.60 0.55 0.49 0.42
BOW TF-IDF Okapi BM25
LSI LDA mSDA CCG
WMD
Obama
Illinois
mediapress
President
greets
speaks
Chicago
Thank you. Questions?Code: http://matthewkusner.com