Modeling Diversity in Information Retrieval
-
Upload
klarika-farkas -
Category
Documents
-
view
34 -
download
1
description
Transcript of Modeling Diversity in Information Retrieval
ACM SIGIR 2009 Workshop on Redundancy, Diversity, andInterdependent Document Relevance, July 23, 2009, Boston, MA
1
Modeling Diversity in
Information Retrieval
ChengXiang (“Cheng”) Zhai
Department of Computer Science
Graduate School of Library & Information Science
Institute for Genomic Biology
Department of Statistics
University of Illinois, Urbana-Champaign
Joint work with John Lafferty, William Cohen, and Xuehua Shen
Different Reasons for Diversification
• Redundancy reduction
• Diverse information needs – Mixture of users
– Single user with an under-specified query
– Aspect retrieval
– Overview of results
• Active relevance feedback
• …
2
Outline
• Risk minimization framework
• Capturing different needs for diversification
• Language models for diversification
3
4
IR as Sequential Decision Making
User System
A1 : Enter a query Which documents to present?How to present them?
Ri: results (i=1, 2, 3, …)Which documents to view?
A2 : View documentWhich part of the document
to show? How?
R’: Document contentView more?
A3 : Click on “Back” button
(Information Need) (Model of Information Need)
5
Retrieval Decisions
User U: A1 A2 … … At-1 At
System: R1 R2 … … Rt-1
Given U, C, At , and H, choosethe best Rt from all possible
responses to At
History H={(Ai,Ri)} i=1, …, t-1
DocumentCollection
C
Query=“Jaguar”
All possible rankings of C
The best ranking for the query
Click on “Next” button
All possible size-k subsets of unseen docs
The best k unseen docs
Rt r(At)
Rt =?
6
A Risk Minimization Framework
User: U Interaction history: HCurrent user action: At
Document collection: C
Observed
All possible responses: r(At)={r1, …, rn}
User Model
M=(S, U…) Seen docs
Information need
L(ri,At,M) Loss Function
Optimal response: r* (minimum loss)
( )arg min ( , , ) ( | , , , )tt r r A t tM
R L r A M P M U H A C dM ObservedInferredBayes risk
7
• Approximate the Bayes risk by the loss at the mode of the posterior distribution
• Two-step procedure– Step 1: Compute an updated user model M* based on
the currently available information– Step 2: Given M*, choose a response to minimize the
loss function
A Simplified Two-Step Decision-Making Procedure
( )
( )
( )
arg min ( , , ) ( | , , , )
arg min ( , , *) ( * | , , , )
arg min ( , , *)
* arg max ( | , , , )
t
t
t
t r r A t tM
r r A t t
r r A t
M t
R L r A M P M U H A C dM
L r A M P M U H A C
L r A M
where M P M U H A C
8
Optimal Interactive Retrieval
User
A1
U C
M*1P(M1|U,H,A1,C)
L(r,A1,M*1)
R1A2
L(r,A2,M*2)
R2
M*2P(M2|U,H,A2,C)
A3 …
Collection
IR system
• At {“enter a query”, “click on Back button”, “click on Next button, …}
• r(At): decision space (At dependent)– r(At) = all possible subsets of C + presentation strategies– r(At) = all possible rankings of docs in C – r(At) = all possible rankings of unseen docs– …
• M: user model – Essential component: U = user information need– S = seen documents– n = “Topic is new to the user”
• L(Rt ,At,M): loss function– Generally measures the utility of Rt for a user modeled as M– Often encodes retrieval criteria (e.g., using M to select a ranking of docs)
• P(M|U, H, At, C): user model inference– Often involves estimating a unigram language model U
9
Refinement of Risk Minimization
10
Generative Model of Document & Query [Lafferty & Zhai 01]
observedPartiallyobserved
QU)|( Up QUser
DS)|( Sp D
Source
inferred
),|( Sdp Dd Document
),|( Uqp Q q Query
( | , )Q Dp R R
11
Risk Minimization with Language Models [Lafferty & Zhai 01, Zhai & Lafferty 06]
Choice: (D1,1)
Choice: (D2,2)
Choice: (Dn,n)
...
query quser U
doc set Csource S
q
1
N
dSCUqpDLDD
),,,|(),,(minarg*)*,(,
Loss
L
L
L
12
Optimal Ranking for Independent Loss
1 11 1
1 1
1
1 1
1
1 1
1
1 1
* arg min ( , ) ( | , , , )
( , ) ( | ... )
( )
( ) ( )
* arg min ( ) ( ) ( | , , , )
arg min ( ) ( ) (
j j
j
j
j
j
N i
ii j
N i
ii j
N jN
ij i
N jN
ij i
N jN
ij i
L p q U C S d
L s l
s l
s l
s l p q U C S d
s l p
| , , , )
( | , , , ) ( ) ( | , , , )
* ( | , , , )
j j
k k k k
k
q U C S d
r d q U C S l p q U C S d
Ranking based on r d q U C S
Decision space = {rankings}
Sequential browsing
Independent loss
Independent risk= independent scoring
“Risk ranking principle”[Zhai 02, Zhai & Lafferty 06]
Risk Minimization for Diversification
• Redundancy reduction: loss function includes a redundancy/novelty measure– Special case: list presentation + MMR [Zhai et al. 03]
• Diverse information needs: loss function defined on latent topics– Special case: PLSA/LDA + aspect retrieval [Zhai 02]
• Active relevance feedback: loss function considers both relevance and benefit for feedback– Special case: feedback only (hard queries) [Shen & Zhai 05]
13
Subtopic Retrieval
Query: What are the applications of robotics in the world today?
Find as many DIFFERENT applications as possible.
Example subtopics: A1: spot-welding robotics
A2: controlling inventory A3: pipe-laying robotsA4: talking robotA5: robots for loading & unloading memory tapesA6: robot [telephone] operatorsA7: robot cranes… …
Subtopic judgments A1 A2 A3 … ... Ak
d1 1 1 0 0 … 0 0d2 0 1 1 1 … 0 0d3 0 0 0 0 … 1 0….dk 1 0 1 0 ... 0 1
Need to model interdependent document relevance
Diversify = Remove Redundancy [Zhai et al. 03]
15
1,
))|(1()|(
))|(1)(|1(
))|1(1())|(1)(|1()}{,,,...,|(
),,,|(),,...,|(),...,|(
),...,|(minarg),,,|(),(minarg*
2
3
321
111
1111
111
c
cwhere
dNewpdqp
dNewpdRp
dRpcdNewpdRpcdddl
dSCUqpdddldddr
dddrsdSCUqpL
kk
Rank
kk
Rank
kkkkiiQkk
kkkk
N
j
N
jii jj
“Willingness to tolerate redundancy”
Cost NEW NOT-NEW REL 0 C2 NON-REL C3 C3
C2<C3, since a redundant relevant doc is better than a non-relevant doc
Greedy Algorithm for Ranking: Maximal Marginal Relevance (MMR)
A Mixture Model for Redundancy
P(w|Background)Collection
P(w|Old)
Ref. document
1-
=?
p(New|d)= = probability of “new” (estimated using EM)p(New|d) can also be estimated using KL-divergence
Evaluation metrics
• Intuitive goals:– Should see documents from many different
subtopics appear early in a ranking (subtopic coverage/recall)
– Should not see many different documents that cover the same subtopics (redundancy).
• How do we quantify these?– One problem: the “intrinsic difficulty” of
queries can vary.
Evaluation metrics: a proposal
• Definition: Subtopic recall at rank K is the fraction of subtopics a so that one of d1,..,dK is relevant to a.
• Definition: minRank(S,r) is the smallest rank K such that the ranking produced by IR system S has subtopic recall r at rank K.
• Definition: Subtopic precision at recall level r for IR system S is:
),minRank(S
),minRank(Sopt
r
r
This generalizes ordinary recall-precision metrics.
It does not explicitly penalize redundancy.
Evaluation metrics: rationale
recall
K
minRank(Sopt,r)
minRank(S,r)),minRank(S
),minRank(Sopt
r
r precision
1.0
0.0
For subtopics, theminRank(Sopt,r) curve’s shape is not predictable and linear.
Evaluating redundancy
Definition: the cost of a ranking d1,…,dK is
where b is cost of seeing document, a is cost of seeing a subtopic inside a document (before a=0).Definition: minCost(S,r) is the minimal cost at which recall r is obtained.Definition: weighted subtopic precision at r is
),minCost(S
),minCost(Sopt
r
rwill use a=b=1
Evaluation Metrics Summary
• Measure performance (size of ranking minRank,
cost of ranking minCost) relative to optimal.
• Generalizes ordinary precision/recall.
• Possible problems:– Computing minRank, minCost is NP-hard!
– A greedy approximation seems to work well for our data set
Experiment Design
• Dataset: TREC “interactive track” data.– London Financial Times: 210k docs, 500Mb
– 20 queries from TREC 6-8• Subtopics: average 20, min 7, max 56
• Judged docs: average 40, min 5, max 100
• Non-judged docs assumed not relevant to any subtopic.
• Baseline: relevance-based ranking (using language models)
• Two experiments– Ranking only relevant documents
– Ranking all documents
S-Precision: re-ranking relevant docs
WS-precision: re-ranking relevant docs
Results for ranking all documents
“Upper bound”: use subtopic names to build an explicit subtopic model.
Summary: Remove Redundancy• Mixture model is effective for identifying novelty in relevant
documents
• Trading off novelty and relevance is hard
• Relevance seems to be dominating factor in TREC interactive-track data
Diversity = Satisfy Diverse Info. Need[Zhai 02]
• Need to directly model latent aspects and then optimize results based on aspect/topic matching
• Reducing redundancy doesn’t ensure complete coverage of diverse aspects
27
Aspect Generative Model of Document & Query
QU),|( Up Q
User),|( Qqp
q Query
DS),|( Sp D
SourceDdp ,|(
d Document
=( 1,…, k)
n
n
i
A
aDaiD dddwhereapdpdp ...,)|()|(),|( 1
1 1
dDirapdpdpn
i
A
aai )|()|()|(),|(
1 1
PLSI:
LDA:
Aspect Loss Function
)|()1()|(1
)|(
,
)||()}{,,,...,|(
1
11,...,1
1,...,11111
k
k
ii
kk
kkQ
kiiQkk
apapk
ap
where
Ddddl
QU),|( Up Q ),|( Qqp
q
DS),|( Sp D Ddp ,|(
d
)ˆ||ˆ( 1,...,1k
kQD
Aspect Loss Function: Illustration
Desired coverage
p(a|Q)
“Already covered”
p(a|1)... p(a|k -
1)Combined coverage
p(a|k)
New candidate p(a|k)
non-relevant
redundant
perfect
Evaluation Measures• Aspect Coverage (AC): measures per-doc
coverage– #distinct-aspects/#docs
• Aspect Uniqueness(AU): measures redundancy– #distinct-aspects/#aspects
• Examples0001001
0101100
#doc 1 2 3 … …#asp 2 5 8 … …#uniq-asp 2 4 5AC: 2/1=2.0 4/2=2.0 5/3=1.67AU: 2/2=1.0 4/5=0.8 5/8=0.625
1000101
… ...d1 d3d2
Effectiveness of Aspect Loss Function (PLSI)
Aspect Coverage Aspect UniquenessData set NoveltyCoefficient Prec() AC() Prec() AU()=0 0.265(0) 0.845(0) 0.265(0) 0.355(0)0 0.249(0.8) 1.286(0.8) 0.263(0.6) 0.344(0.6)
MixedData
Improve -6.0% +52.2% -0.8% -3.1%=0 1(0) 1.772(0) 1(0) 0.611(0)0 1(0.1) 2.153(0.1) 1(0.9) 0.685(0.9)
RelevantData
Improve 0% +21.5% 0% +12.1%
)|()1()|(1
)|(1
11,...,1 k
k
ii
kk apap
kap
Effectiveness of Aspect Loss Function (LDA)
Aspect Coverage Aspect UniquenessData set NoveltyCoefficient Prec AC Prec AC=0 0.277(0) 0.863(0) 0.277(0) 0.318(0)0 0.273(0.5) 0.897(0.5) 0.259(0.9) 0.348(0.9)
MixedData
Improve -1.4% +3.9% -6.5% +9.4%=0 1(0) 1.804(0) 1(0) 0.631(0)0 1(0.99) 1.866(0.99) 1(0.99) 0.705(0.99)
RelevantData
Improve 0% +3.4% 0% +11.7%
)|()1()|(1
)|(1
11,...,1 k
k
ii
kk apap
kap
Comparison of 4 MMR Methods
Mixed Data Relevant DataMMRMethod AC Improve AU Improve AC Improve AU ImproveCC() 0%(+) 0%(+) +2.6%(1.5) +13.8%(1.5)
QB() 0%(0) 0%(0) +1.8%(0.6) +5.6%(0.99)
MQM() +0.2%(0.4) +1.0%(0.95) +0.2%(0.1) +1.2%(0.9)
MDM() +1.5%(0.5) +2.2%(0.5) 0%(0.1) +1.1%(0.5)
CC - Cost-based CombinationQB - Query Background ModelMQM - Query Marginal ModelMDM - Document Marginal Model
Summary: Diverse Information Need• Mixture model is effective for capturing latent topics
• Direct modeling of latent aspects/topics is more effective than indirect modeling through MMR in improving aspect coverage, but MMR is better for improving aspect uniqueness
• With direct topic modeling and matching, aspect coverage can be improved at the price of lower relevance-based precision
Diversify = Active Feedback [Shen & Zhai 05]
* arg min ( , ) ( | , , )D
D L D p U q C d
Decision problem: Decide subset of documents for relevance judgment
1
( , ) ( , , ) ( | , , )
( , , ) ( | , , )
j
k
i ii
j
L D l D j p j D U
l D j p j d U
Independent Loss
1
( , ) ( , , ) ( | , , )k
i ii
j
L D l D j p j d U
1
( , , ) ( , , )k
i ii
l D j l d j
Independent Loss
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
*
1
arg min ( , , ) ( | , , ) ( | , , )i
k
i i i iD i j
D l d j p j d U p U q C d
1 1
( , ) ( , , ) ( | , , )kk
i i i ii i
j
L D l d j p j d U
Independent Loss (cont.)
Uncertainty Sampling
( ,1, ) log ( 1 | , )
( ,0, ) log ( 0 | , ) i i i
i i i
l d p R d d C
l d p R d d C
( ) ( | , ) ( | , , )i ir d H R d p U q C d
( ) ( , , ) ( | , , ) ( | , , )i
i i i i ij
r d l d j p j d U p U q C d
Top K
1
, 0 1 0
, ( ,1, ) ,
( 0, ) , i i
i
d C l d C
l d C C C
0 1 0( ) ( ) ( 1 | , , ) ( | , , )i i ir d C C C p j d U p U q C d
Dependent Loss
1
( , , ) ( 1 | , , ) ( , )k
i ii
L D U p j d U D
Heuristics: consider relevance
first, then diversity
( 1)N G K
Gapped Top K
Select Top N documents
Cluster N docs into K clusters
K Cluster CentroidMMR
…
Illustration of Three AF Methods
Top-K (normal feedback)
123456789
10111213141516…
GappedTop-K
K-cluster centroid
Aiming at high diversity …
Evaluating Active Feedback
Query Select K
docs
K docs
Judgment File
+
Judged docs
+ +
+
-
-
InitialResultsNo feedback
(Top-k, gapped, clustering)
FeedbackFeedbackResults
Retrieval Methods (Lemur toolkit)
Query Q
DDocument D
Q
)||( DQD Results
Kullback-Leibler Divergence Scoring
Feedback Docs F={d1, …, dn}
Active Feedback
Default parameter settings
unless otherwise stated
FQQ )1('F
Mixture Model Feedback
Only learn from relevant docs
Comparison of Three AF Methods
Collection Active FB Method
#Rel
Include judged docs
MAP Pr@10doc
HARD
Top-K 146 0.325 0.527
Gapped 150 0.330 0.548
Clustering 105 0.332 0.565
AP88-89
Top-K 198 0.228 0.351
Gapped 180 0.234* 0.389*
Clustering 118 0.237 0.393Top-K is the worst!
bold font = worst * = best
Clustering uses fewest relevant docs
Appropriate Evaluation of Active Feedback
New DB(AP88-89,
AP90)
Original DBwith judged docs(AP88-89, HARD)
+ -+
Original DBwithout judged
docs
+ -+
Can’t tell if the ranking of un-judged documents is improved
Different methods
have different test documents
See the learning effectmore explicitly
But the docs must be similar to original docs
Comparison of Different Test Data
Test Data Active FB Method
#Rel MAP Pr@10doc
AP88-89
Including
judged docs
Top-K 198 0.228 0.351
Gapped 180 0.234 0.389
Clustering 118 0.237 0.393
AP90 Top-K 198 0.220 0.321
Gapped 180 0.222 0.326
Clustering 118 0.223 0.325
Clustering generates fewer, but higher quality examples
Top-K is consistently the worst!
Summary: Active Feedback
• Presenting the top-k is not the best strategy
• Clustering can generate fewer, higher quality feedback examples
Conclusions
• There are many reasons for diversifying search results (redundancy, diverse information needs, active feedback)
• Risk minimization framework can model all these cases of diversification
• Different scenarios may need different techniques and different evaluation measures
47
References• Risk Minimization
– [Lafferty & Zhai 01] John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. In Proceedings of the ACM SIGIR 2001, pages 111-119.
– [Zhai & Lafferty 06] ChengXiang Zhai and John Lafferty, A risk minimization framework for information retrieval, Information Processing and Management, 42(1), Jan. 2006, pages 31-55.
• Subtopic Retrieval
– [Zhai et al. 03] ChengXiang Zhai, William Cohen, and John Lafferty, Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval, In Proceedings of ACM SIGIR 2003.
– [Zhai 02] ChengXiang Zhai, Language Modeling and Risk Minimization in Text Retrieval, Ph.D. thesis, Carnegie Mellon University, 2002.
• Active Feedback
– [Shen & Zhai 05] Xuehua Shen, ChengXiang Zhai, Active Feedback in Ad Hoc Information Retrieval, Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ( SIGIR'05), 59-66, 2005
ACM SIGIR 2009 Workshop on Redundancy, Diversity, andInterdependent Document Relevance, July 23, 2009, Boston, MA
48