Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

23
Query Expansion with Locally- Trained Word Embeddings Fernando Bhaskar Mitra Nick Craswell Microsoft

Transcript of Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

Page 1: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

Query Expansion with Locally-Trained Word Embeddings

Fernando Bhaskar Mitra Nick CraswellMicrosoft

Page 2: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

p(d)

d

Page 3: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

p(d)

d

q

p(d|q)

Page 4: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

cutglobal local*cutting taxsqueeze deficitreduce voteslash budget

reduction reductionspend houselower billhalve plansoften spendfreeze billion

global: trained using full corpus

local: trained using topically-

*gas

Page 5: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

global local

t-SNE projection: top words by p̃(d|q) (blue: query; red: top words by p(d|q))

Page 6: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Page 7: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)
Page 8: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

• local term clustering [Lesk 1968, Attar and Fraenkel 1977]

• local latent semantic analysis [Hull 1995, Hull, 1994; Schutze et al., 1995; Singhal et al., 1997]

• local document clustering [Tombros and van Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]

• one sense per discourse [Gale et al., 1992]

Page 9: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

targetcorpus

query

results

Page 10: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]

query = gas tax

Page 11: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]

query = gas tax

d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]

Page 12: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]

query = gas tax

… gas petroleum:0.9 indigestion:0.6 … tax tariff:0.7 strain:0.4 … …[ ]W=

Page 13: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …]

query = gas tax

d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]

Page 14: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

W = UUT

U m⇥ k embedding matrix

Page 15: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

p(d)

d

q

p(d|q)

Page 16: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

p(d)

d

q

p̃(d|q)

Page 17: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

targetcorpus

query

results

externalcorpus

query

results

Page 18: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

U =

8>>><

>>>:

uniform p(d) on the target corpus

uniform p(d) on an external corpus

p(d|q) on the target corpus

p(d|q) on an external corpus

Page 19: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

docs words queries

trec12 469,949 438,338 150

robust 528,155 665,128 250

web 50,220,423 90,411,624 200

Page 20: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

global local

target target

wikipedia+gigaword* gigaword†

google news* wikipedia†

*publicly available embedding; †publicly available external corpus

targetcorpus

query

results

externalcorpus

query

results

targetcorpus

query

results

targetcorpus

query

results

externalcorpus

query

results

Page 21: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

trec12 robust web

local vs global

NDCG@10

0.0

0.1

0.2

0.3

0.4

0.5

expansion

nonegloballocal

Page 22: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

trec12 robust web

local embedding

NDCG@10

0.0

0.1

0.2

0.3

0.4

0.5

corpus

targetgigawordwikipedia

Page 23: Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

• local embedding provides a stronger representation than global embedding

• potential impact for other topic-specific natural language processing tasks

• future work

• effectiveness improvements

• efficiency improvements