Robust Query Expansion - gatech.eduzha/CSE8801/query-expansion/0... · 2009. 8. 27. · Robust...

Robust Query ExpansionInternship Closing Talk

Joshua V Dillon1 Kevyn Collins-Thompson2

1Georgia Institute of Technology, Atlanta, Georgia

2Microsoft Research, Redmond, Washington

August 11, 2009

Note: This �le can opened in Adobe Illustrator for high resolution use.

Introduction Objective Experiments The Problem The Approach

What is query expansion? . . .Who is John Galt?!

User submits the query term John Galt

Standard retrieval: documents without John Galt but with

Dagny Taggart will not be retrieved

Query expansion: query is augmented with related terms e.g.,

Atlas Shrugged and Ayn Rand , then those documents are

retrieved

Reduces query/document vocabulary mismatch by expanding the queryusing words or phrases with “similar meaning.”

And its a BIG deal!

Large upside potential

Correct alteration: 6+ NDCG gain (oracle)Query expansion research: 10-15% MAP gain

Many diverse approaches: alteration, expansion, reduction (longqueries)

Josh Dillon Robust Query Expansion 2

retrieved

And its a BIG deal!

retrieved

And its a BIG deal!

retrieved

And its a BIG deal!

retrieved

And its a BIG deal!

Robust? Risk? Reward? Hogwash!

State-of-the art query expansion methods perform well on average buthave limited real-world deployment.

Risky : large variance across queries & optimal parameter settings

Increasingly complex decision environments

Personalization, implicit/explicit relevance, computation budget, . . .

Need a framework for principled, selective query model estimationcapable of handling diverse constraints. . .

Existing work:

Self-tuning methods, [Tao/Zhai, SIGIR ’06]

Non-convex, Expectation-MaximizationExpands relevant words “into” top-k documentsPicks relevant documents for fixed terms

Risk-aware methods, [Collins-Thompson, NIPS ’08]

Casts risk/reward as quadratic program with linear constraintsDomain knowledge: aspect balance/coverage, query support, . . .Picks (possibly zero) terms but has no notion of documents

My Contribution:

Model parameter space under large-scale computing environment

Improve results by employing translation model while providing amore theoretically motivated risk model

Unified framework which elegantly combines advantages of bothself-tuning and risk-aware methods

Existing work:

My Contribution:

Existing work:

My Contribution:

Existing work:

My Contribution:

Existing work:

My Contribution:

Existing work:

My Contribution:

Definition

A query expansion is measured by the relative improvement it providesover no expansion, for a bounded positive performance measure s(q),viz., Is(q, q) = 1− s(q)/s(q).

Definition

We use mean average precision (MAP) as query performance measures(q), viz.,

s(q) = |rel (q, C)|−1N∑

P(q, k)δ(dk ∈ rel (q, C))

P(q, k) = k−1|rel (q,Fk(q))|, rel (q, C) = {documents in C relevant to q}

So, performance under a MAP criterion emphasizes returning morerelevant documents higher in rank. Other measures include P5, P20, . . .

Definition

s(q) = |rel (q, C)|−1N∑

Definition

s(q) = |rel (q, C)|−1N∑

Definition

Risk represents the extent of downside loss in relative improvement, viz.,R(q, q) = −P(Is(q, q) ≤ 0)E [Is(q, q)|I (q, q)) ≤ 0].

Definition

Conversely, reward represents the extent of upside gain in relativeimprovement, viz., V (q, q) = P(Is(q, q) > 0)E [Is(q, q)|I (q, q)) > 0].

Making, E [Is(q, q)] = V (q, q)− R(q, q) the overall expected relativeimprovement.

Definition

Introduction Objective Experiments Reward (Relevance Model) Risk (Uncertainty Model)

Wait a minute . . .

In some sense, current approaches actually do address the risk/rewardtradeoff by interpolating the original query with the expanded query, ie,

Naıve Risk/Reward Tradeoff

q′ = λq + (1− λ)q, λ ∈ [0, 1] (1)

Can we improve this tradeoff?

To account for uncertainty of a given expanded query q, we employ thequadratic program,

Robust Risk/Reward Tradeoff

arg minx∈X

J (x) = −xTµ+κ

2xTΣx (2)

where,

x = [xR ; xR ], xR = [P(R1), . . . ,P(Rm)]T, xR = 1− xR

X encodes our domain knowledge

µi represents our expected belief in relevance of term i ,

and Σij the risk of terms i , j

This objective is the robust counterpart of a linear program withellipsoidal uncertainty set and is theoretically motivated by Ben-Tal &Nemirovski, OR Letters ’99.

We find µi as,

µi = E [Ri |q, α, β]

= P(Ri |α)δ(wi ∈ q) + P(Ri |β)δ(wi /∈ q) (3)

P(Ri |α) , α + (1− α)P(Ri |wi )

P(Ri |β) , βP(Ri |wi )

using P(Ri |wi ) = P(wi |Ri )

P(wi |Ri )+P(wi |Ri )and assuming P(Ri ) = P(Ri ) = 1/2.

Hence µi is cast as a function of P(wi |Ri ), P(wi |Ri ), which we obtainfrom a query expansion algorithm, as follows.

Ponte/Lavrenko Relevance Model

Standard query expansion of the Lemur toolkit

Works surprisingly well in practice (when it works, that is. . .)1 P(w) ≈ |C|−1 P

d∈C tf (w , d)2 P(w |d) ≈ tf (w , d)3 Return words and relevance,

P(Ri |q) ∝X

d∈Fk (q)

e−s′(d)P(wi |d),

as sorted by P(wi |Fk(q))/P(wi ).

Tao/Zhai Relevance Model

Use EM to estimate a mixture of word (non-)relevance multinomials,regularized by the original query.

Interesting twist #1: gradually relax the affect of the query as a prior

Interesting twist #2: quit after expected relevance reaches a certainthreshold.

Goal: eliminate interpolation as θR should be the interpolated queryexpansion. Such interpolation, we can suppose, will be smootherthan the naıve tradeoff.

A bit more detail. . .

1 E-step:

P(Zw ,d) = αdP(w |θR)/ (αdP(w |θR) + (1− αd)P(w |θN))

2 M-step:

αd =∑w∈V

P(Zw ,d)tf (w , d)

P(w |θR) =µP(w |θq) +

∑d∈Fk (q) c(w , d)P(Zw ,d)

µ+∑

µ = δµ

3 quit when expected relevance is greater than µ

. . . you’re feeling sleeeeepy, so sleeeeeeepy

A bit more detail. . .

1 E-step:

P(Zw ,d) = αdP(w |θR)/ (αdP(w |θR) + (1− αd)P(w |θN))

2 M-step:

αd =∑w∈V

P(Zw ,d)tf (w , d)

P(w |θR) =µP(w |θq) +

µ+∑

µ = δµ

3 quit when expected relevance is greater than µ

. . . you’re feeling sleeeeepy, so sleeeeeeepy

Recall our objective,

arg minx∈X

−xTµ+κ

2xTΣx

We construct Σ as a super-matrix, viz,

[Σ1 00 Σ2

We now examine 2× 2 approaches for estimating Σ and the motivationbehind each.

On one hand, we can interpret Σ1,Σ2 as intrinsic term-term uncertainty,possibly suggesting Σ1 , Σ2 , ΣR .

Alternatively, we could posit the uncertainty set varies for relevant andnon-relevant terms, ie, Σ1 , ΣR , Σ2 , ΣR .

In both cases our source of relevance information comes from the top-k(feedback) documents for a given query, denoted Fk(q). Thenon-relevant uncertainty ΣR could estimated from the bottom-kdocuments or a secondary dataset.

Constructing Σ: jac

Smoothed Jaccard similarity heuristic (previous work).

Jaccard similarity coefficient

Measures similarity between sample sets (no longer treating documents

as multisets) and is defined as, J(A,B) = |A∩B||A∪B|

Dijexp∝ Jij (5)

Sij = γ exp

{− 1

σ2Dij

Σij =

{||S(i , q)||p, i = j

S(i , j), i 6= j(7)

Use “dilated” Jaccard coefficient to quantify word-word similarity

Set diagonal elements of Σ to “distance from query”

Constructing Σ: hco

Heat kernel-based stochastic translation of word co-occurrencedistributions (new work).

1 Estimate word coocurrence distributions

2 Compute normalized graph Laplacian of geodesic distances between[above]

3 Compute expected word-word distance under this translation

Σij =

{expected (under translation) word-query distance, i = j

expected (under translation) word-word distance, i 6= j(8)

Estimating Tij = P(wi → wj) [hco, 1 of 6]

General approach: diffusion kernel Kt(qu, qv ) on graph (V ,E ) whosenodes are distributions that correspond to words

V : each vertex is a contextual distribution qv (w) = P(w |v)corresponding to a word v

E : graph edge weights are the Fisher diffusion kernel on multinomialsimplex

T is from diffusion kernel on (V ,E )

qv (w) ∝∑

tf (w , d)tf (v , d)

e(u, v) = exp

(− 1

σ2arccos2

√qu(w)qv (w)

T ∝ exp(−tL)

where L is the normalized Laplacian

t controls the amount of translationlimt→0

T = I and limt→∞

T = stationary

Expected Distance [hco, 2 of 6]

Two words x ,w stochastically translate into words y , z and arerepresented by unit vectors θmle

y = 1y and θmlez = 1z .

Distance d(θmley , θmle

z ) is a random variable, summarized by itsexpectation (given in closed form), ie.,

Ep(y|x)p(z|w)‖θmley − θmle

z ‖22 = N−2

N1Xi=1

Xj∈{1,...,N1}\{i}

(TT>)xi ,xj

+ N−22

N2Xi=1

Xj∈{1,...,N2}\{i}

(TT>)wi ,wj

− 2N−11 N−2

N1Xi=1

N2Xj=1

(TT>)xi ,wj + N−11 + N−1

Note : obviously this formula is more general than needed as in our caseN1 = N2 = 1.

Example, Simplex [hco, 3 of 6]

qDagny

qMicrosoft

Example, Simplex [hco, 3 of 6]

qDagny

qMicrosoft

Example, expected distances near “german” [hco, 4 of 6]

x 10−4

Terms Near ’german’

Example, expected distances far from “german” [hco, 5 of 6]

arillo

Terms Far From ’german’

Large Deviation Interpretation [hco, 6 of 6]

By the Chernoff-Stein lemma, KL-divergence is the best exponent in theprobability of type II error (and bounded type I error), i.e.,

βoptn ≈ exp(−γnD(qu||qv )).

Examining the Taylor series expansion of KL-divergence for nearby qu, qv ,one also finds that for the Fisher geodesic distance, d(p, q),

d2(qu, qv ) ≈ 2D(qu||qv ).

Thus one may interpret the heat kernel translation model as being basedon a graph whose edge weights approximate the optimal error ratebetween a test of Q = qu vs. Q = qv .

Introduction Objective Experiments Results

Game-plan:

Compiled MatlabMicrosoft Computing Resources

+ Hyperparameter Sweepkajabillions of embarrassingly parallel experiments

Reality:

Devil’s in the details. . .

[Sad Seattle Josh]

Game-plan:

Compiled MatlabMicrosoft Computing Resources

+ Hyperparameter Sweepkajabillions of embarrassingly parallel experiments

Reality:

Devil’s in the details. . .

[Sad Seattle Josh]

Robust Tao/Zhai, hco of query: “1938 german mauser”

Term Relevance for ’1938 german mauser’

Robust Ponte/Lavrenko, hco of query: “1938 german mauser”

tion ii

Contributions/Closing Remarks

Employed heat kernel-based stochastic translation as a risk modelfor query expansion

Presented initial results for a term and document aware risk/rewardquery expansion model

Conducted initial analysis of hyperparameter space to isolate keyparameter interactions

Built large-scale Matlab experiment test-bed using MS ComputingResources

Continue to formulate a more “elegant” unification of the Tao/Zhairelevance model directly into the optimization objective

Thanks!

Contributions/Closing Remarks

Employed heat kernel-based stochastic translation as a risk modelfor query expansion

Presented initial results for a term and document aware risk/rewardquery expansion model

Conducted initial analysis of hyperparameter space to isolate keyparameter interactions

Built large-scale Matlab experiment test-bed using MS ComputingResources

Continue to formulate a more “elegant” unification of the Tao/Zhairelevance model directly into the optimization objective

Thanks!

Related Work:

Kevyn Collins-Thompson, NIPS 2008

Aharon Ben-Tal & Arkadi Nemirovski, OR Letters 1999

Victor Lavrenko, James Allan, SIGIR 2005

Tao Tao, ChengXiang Zhai, SIGIR 2005

Joshua V Dillon, et. al., UAI 2007

[This slide intentionally blank.]

Robust Query Expansion - gatech.eduzha/CSE8801/query-expansion/0... · 2009. 8. 27. · Robust...

Documents

Transcript of Robust Query Expansion - gatech.eduzha/CSE8801/query-expansion/0... · 2009. 8. 27. · Robust...

Information Retrieval - Query expansion Jian-Yun Nie 1.

Query expansion using medical information extraction for ...usir.salford.ac.uk › 46922 › 1 › Query expansion using medical inform… · Query expansion using medical information

Ontology-based Spatial Query Expansion in Information Retrieval

Relevance Feedback & Query Expansion. Tema: come migliorare la recall di una query? 1.Relevance feedback - Direct feedback - Pseudo feedback 2.Query expansion.

Using WordNet and WSD in Conceptual Query Expansion

Query Expansion By: Sean McGettrick. What is Query Expansion? Query Expansion is the term given when a search engine adding search terms to a user’s weighted.

Query-drift prevention for robust query expansion - presentation

Query Expansion for Hash-based Image Object Retrievaljzwang/ustc11/mm09_kuo.pdf · Query Expansion for Hash-based Image Object Retrieval ... - The Lion King ... Highlights of Our

Query Expansion with LING 573 D3 Deep Processing

Information Retrieval - Query expansion

Sequential Query Expansion using Concept Graph

Query Expansion with LING 573 D3 Deep Processingcourses.washington.edu/ling573/SPR2014/slides/D3/wpack... · 2014. 5. 23. · LING 573 D3 Query Expansion with Deep Processing Melanie

Query Completion / Expansion - COMP90042 LECTURE 4, THE ...

Query Expansion Approaches for Image Retrieval in ...

Personalized Query Expansion for the Web

McGettrick Query Expansion

A Study on Query Expansion Methods for Patent Retrieval

Conceptual Query Expansion and Visual Search Results Exploration

Query Expansion Based on Clustered Results - VLDB Endowment Inc

9. Relevance Feedback and Query Expansion1. Doc2 . 2. Doc4 . 3. Doc5 .. Query. Reformulation. 4 Query Reformulation • Revise query to account for feedback: – Query Expansion: Add