From Word Embeddings To Document...

43
From Word Embeddings To Document Distances Matt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger

Transcript of From Word Embeddings To Document...

Page 1: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

From Word Embeddings To Document DistancesMatt J. Kusner Yu Sun Nicholas I. Kolkin Kilian Q. Weinberger

Page 2: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Goal: a distance between two documents

?

Page 3: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Applicationsdocument classification multi-lingual document matching

song identification

Page 4: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Embeddingword2vec

trained on 100 billion words

3 million different words embedded

[Mikolov et al., 2013]

Rd

different from [Collobert & Weston, 2008]

[Mnih & Hinton, 2009] word2vec is not deep!

words

Page 5: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Embeddingword2vec

[Mikolov et al., 2013]

Rd

X 2 Rd⇥n

distance between words i and j:

kxi � xjk2

is roughly their dissimilarity

xixj

words

Page 6: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Embeddingword2vec

[Mikolov et al., 2013]

Rd

KingMan

Woman Queen

Page 7: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

How can we leverage these high quality word embeddings

to compute document distances?

Page 8: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

catdogruntree

carrace

frog

oatpicklefly

patt

ern ocean

steam

rock rocket

upbabybunny win

Word Mover’s Distance

Page 9: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Goal

?

Page 10: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

Page 11: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

Page 12: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

word mover’s distance = minimum word distance to transform mass d into d’

Page 13: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

Page 14: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

Word Mover’s Distance

Page 15: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Chicago

Page 16: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

i

j

1/4

IllinoisChicago

T

Page 17: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

press

President

greets

speaks

PPresidentgreetspressChicago

1/4

d0

Chicago

P

Obama

speaksIllinois

d1/41/2

Page 18: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

press

President

greets

speaks

PPresidentgreetspressChicago

1/4

d0

Chicago

P

Obama

speaksIllinois

d1/41/2

Page 19: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s Distance

word embeddingRd

Obama

Illinois

press

President

greets

speaks

PPresidentgreetspressChicago

1/4

d0

Chicago

P

Obama

speaksIllinois

d1/41/2

Page 20: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s DistanceP

PresidentgreetspressChicago

1/4

d0WMD(d,d0) ,P

Obama

speaksIllinois

d1/41/2

minT�0

nX

i,j=1

Tijkxi � xjk2

Obama

press

Presidentxi

xj

xk

1/4

1/4

1/2

Page 21: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Word Mover’s DistanceP

PresidentgreetspressChicago

1/4

d0WMD(d,d0) ,WMD(d,d0) ,

8i

8j

WMD(d,d0) ,

s.t.

P

Obama

speaksIllinois

d1/41/2

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Page 22: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Remarks

in CV this is the Earth Mover’s Distance (EMD) [Rubner et al., 1998]

an old optimal transport problem [Monge, 1781]

nX

j=1

Fij = di

nX

i=1

Fij = d0j

s.t. 8i

8j

WMD(d,d0) ,

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Page 23: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

How well does WMD perform on document classification via k-

nearest neighbors (k-NN)?

Page 24: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Classic Approaches

3

0

0

1

0

2

0

0

0

0

1

‘campaign’

‘Obama’

‘speech’

‘Washington’

bag-of-words TF-IDF

0.01

0

0

0.2

0

0.04

0

0

0

0

0.13

LSI LDA

‘Oba

ma’

‘speech’

‘a’

‘Oba

ma’

‘speech’

‘a’

topic distributions

sports

politics

music

Civil War

politics topic

‘Obam

a’‘speech’‘W

ashington’

‘football’‘M

adonna’‘Vicksburg’‘guitar’

‘soccer’

[Salton & Buckley, 1988] [Deerwester et al., 1990] [Blei et al., 2003]

Page 25: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Results: k-NN

1 2 3 4 5 6 7 8010203040506070

twitter recipe ohsumed classic reuters amazon

test

err

or % 43

33

44

33 32 3229

6663 61

49 51

44

36

8.0 9.7

62

44 4135

6.95.06.72.8

3329

148.16.96.3

3.5

59

42

28

1417

129.37.4

34

1722 21

8.46.44.3

21

4.6

53 5359

5448

4543

5156 54

58

3640

31 29 27

20newsbbcsport

k-nearest neighbor errorBOW [Frakes & Baeza-Yates, 1992] TF-IDF [Jones, 1972]Okapi BM25 [Robertson & Walker, 1994]

LSI [Deerwester et al., 1990]LDA [Blei et al., 2003]mSDA [Chen et al., 2012]Componential Counting Grid [Perina et al., 2013]

Word Mover's Distance

train inputs:

BOW dim:

517

13243 6344

2176 3059

5708

3999

31789

4965

24277

5485

22425

5600

42063

11293

29671

All hyper-parameters set with bayesopt.m [Gardner et al. 2014]

Page 26: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

1 2 3 4 5 6 7 80

0.20.40.60.81

1.21.41.6

aver

age

erro

r w.r.

t. BO

W 1.61.41.21.00.80.60.40.2

0

1.291.15

1.00.72

0.60 0.55 0.49 0.42

BOW TF-IDF Okapi BM25

LSI LDA mSDA CCG

WMD

Results: k-NN

Page 27: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

LP with 2n constraintsO(n3 log n)

[Pele & Werman, 2009]

minF�0

nX

i,j=1

Fijkxi � xjk2

nX

j=1

Fij = di

nX

i=1

Fij = d0j

s.t. 8i

8j

WMD(d,d0) ,

Computational Complexity

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Page 28: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

[Rubner et al., 1998]; [Levina & Bickel, 2001]; [Grauman & Darrell, 2004]; [Shirdhonkar & Jacobs, 2008]

approximations:

minF�0

nX

i,j=1

Fijkxi � xjk2

nX

j=1

Fij = di

nX

i=1

Fij = d0j

s.t. 8i

8j

WMD(d,d0) ,

Computational Complexity

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Page 29: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Approximation 1[Rubner et al., 1998]

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

Chicago

P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Page 30: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

[Rubner et al., 1998]

word embeddingRd

Obama

Illinois

mediapress

President

greets

speaks

Chicago

Approximation 1P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Page 31: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Rd

Obama

Illinois

mediapress

President

greets

speaks

Chicago

[Rubner et al., 1998]

word embedding

Approximation 1P

Obama

speaksmediaIllinois

d1/4

PPresidentgreetspressChicago

1/4

d0

Word Centroid DistanceWCD(d,d0) , kXd�Xd0k2

O(nd)

Xd0Xd

Page 32: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Faster Approximations

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.2

1.00.8

0.6

0.4

0.2

1.4

0500 1500 25001000 20000

amazon

WMDRWMDWCD

training input index

dist

ance

1.6

1.4

1.2

1.0

0.80.6

0.4

1.6

0.2

twitter

200 400 800 10006000training input index

0

for a random test input...

Page 33: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

minF�0

nX

i,j=1

Fijkxi � xjk2

nX

j=1

Fij = di

nX

i=1

Fij = d0j

s.t. 8i

8j

Approximation 2

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Page 34: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Approximation 2

D1

Page 35: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

just a nearest-neighbor search!

Approximation 2

D1

Page 36: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

Approximation 2

just a nearest-neighbor search!

D2

Page 37: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Relaxed Word Mover’s Distance

O(n2d)

Approximation 2

RWMD(d,d0) , max(D1, D2)

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

D1

8i

8j

s.t.

minT�0

nX

i,j=1

Tijkxi � xjk2

nX

j=1

Tij = di

nX

i=1

Tij = d0j

D2

Page 38: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Faster Approximations

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.2

1.00.8

0.6

0.4

0.2

1.4

0500 1500 25001000 20000

amazon

WMDRWMDWCD

training input index

dist

ance

1.6

1.4

1.2

1.0

0.80.6

0.4

1.6

0.2

twitter

200 400 800 10006000training input index

0

for a random test input...

Page 39: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Faster Approximations

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 100 200 300 400 500 600 700 800 900 10000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.2

1.00.8

0.6

0.4

0.2

1.4

0500 1500 25001000 20000

amazon

WMDRWMDWCD

training input index

dist

ance

1.6

1.4

1.2

1.0

0.80.6

0.4

1.6

0.2

twitter

200 400 800 10006000training input index

0

for a random test input...

Page 40: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Faster Approximations

10

0.20.40.60.81

1.21.21.0

0.8

0.6

0.4

0.2

RWMD WMDWCDRWMDRWMD

0.93

0.56 0.540.45 0.42

0

average kNN error w.r.t. BOW

c2 c1

1.21.0

0.8

0.6

0.4

0.2

RWMD WMDWCDRWMDRWMD

0.92 0.92

0.30

0.96 1.0

0

average distance w.r.t. WMD

c2 c1 D1 D2

Page 41: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Other Embeddings

Page 42: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Conclusion

• Word Mover’s Distance:

• document distances fromword embeddings

• Very accurate as it leverages high quality word2vec embedding

• Fast through approximationsO(n2d)O(n3 log n) O(nd)

WCD RWMDWMD

1 2 3 4 5 6 7 80

0.20.40.60.81

1.21.41.6

aver

age

erro

r w.r.

t. BO

W 1.61.41.21.00.80.60.40.2

0

1.291.15

1.00.72

0.60 0.55 0.49 0.42

BOW TF-IDF Okapi BM25

LSI LDA mSDA CCG

WMD

Obama

Illinois

mediapress

President

greets

speaks

Chicago

Page 43: From Word Embeddings To Document Distancesmkusner.github.io/presentations/From_Word_Embeddings_To... · 2020-05-16 · From Word Embeddings To Document Distances ... word embeddings

Thank you. Questions?Code: http://matthewkusner.com