Risk-Sensitive Dynamic Asset Management - Applied Mathematics
Risk-sensitive Information Retrieval
description
Transcript of Risk-sensitive Information Retrieval
![Page 1: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/1.jpg)
Risk-sensitive Information Retrieval
Kevyn Collins-ThompsonAssociate Professor, University of Michigan
FIRE Invited talk, Friday Dec. 6, 2013
![Page 2: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/2.jpg)
2
We tend to remember that 1 failure, rather than the previous 200 successes
![Page 3: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/3.jpg)
Current retrieval algorithms work well on average across queries…
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Query expansion:Current state-of-the-art method
Queries hurt Queries helped
Mean Average Precision gain: +30%
3
Model ≤ Baseline Model > Baseline
![Page 4: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/4.jpg)
…but are high risk = significant expectation of failure due to high variance across individual queries.
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Query expansion:Current state-of-the-art method
Queries hurt Queries helped
4
This is one of the reasons that even state-of-the-art algorithms are
impractical for many real-world scenarios.
Queries hurt Queries helped
Model ≤ Baseline Model > Baseline
Failure = Your algorithm makes results worse than if it had not been applied.
![Page 5: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/5.jpg)
We want more robust IR algorithms having as objective:1. Maximize average effectiveness
2. Minimize risk of significant failures
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Query expansion:Current state-of-the-art method
Robust version
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Queries hurt Queries helped
Average gain: +30% Average gain: +30%
5
![Page 6: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/6.jpg)
Defining risk and reward in IR
1. Reward = Effectiveness measure - NDCG, ERR, MAP, …
2. Define failure for a single query– Typically relative to a baseline– e.g. 25% loss in MAP– e.g. Query hurt (ΔMAP < 0)
3. Risk= aggregate failure across queries– e.g. P(> 25% MAP loss)– e.g. Average NDCG loss > 10%– e.g. # of queries hurt
6
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
![Page 7: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/7.jpg)
Some examples of risky IR operations
• Query rewriting and expansion– Spelling correction, common word variants, synonyms and
related words, acronym normalization, …– Baseline: the unmodified query
• Personalized search– Trying to disambiguate queries, given unknown user intent– Personalized, groupized and contextual re-ranking– Baseline: the original, non-adjusted ranking. Or: ranking from
previous version of ranking algorithm.• Resource allocation
– Choice of index tiering, collection selection
7
![Page 8: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/8.jpg)
DDR 2012 Seattle
Example: Gain/loss distribution of topic-based personalization across queries
[Sontag et al. WSDM 2012]
-6 -5 -4 -3 -2 -1 1 2 3 4 5 6-0.009999999999999973.29597460435593E-17
0.010.020.030.040.050.060.07
Reliability of Personalization Models
Change in Rank Position of Last Satisfied Click
Prop
ortio
n of
Que
ries
Relative to Bing production
ranking
![Page 9: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/9.jpg)
DDR 2012 Seattle
Another example: Gain/loss distribution of location-based personalization across queries
[Bennett et al., SIGIR 2011]
[-100,-9
0)
[-90,-8
0)
[-80,-7
0)
[-70,-6
0)
[-60,-5
0)
[-50,-4
0)
[-40,-3
0)
[-30,-2
0)
[-20,-1
0)
[-10,0)
[10,20)
[20,30)
[30,40)
[40,50)
[50,60)
[60,70)
[70,80)
[80,90)
[90,100)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
P(Loss > 20%) = 8%when ranking is
affected
![Page 10: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/10.jpg)
The three key points of this talk
1. Many key IR operations are risky to apply.2. This risk can often be reduced by better
algorithm design.3. Evaluation should include risk analysis.
– Look at the nature of gain and loss distribution– Not just averages.
10
![Page 11: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/11.jpg)
This risk-reward tradeoff occurs again and again in search… but is often ignored
• A search engine provider must choose between two personalization algorithms:– Algorithm A has expected NDCG gain = +2.5 points
• But P(Loss > 20%) = 60%– Algorithm B has NDCG gain = +2.1 points
• But P(Loss > 20%) = 10%
• Which one will be deployed?
![Page 12: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/12.jpg)
Algorithm deployment typically driven by focus on average NDCG/ERR/MAP/… gain
• Little/no consideration of downside risk.• Benefits of reducing risk:
– User perception: failures are memorable– Desire to avoid churn – predictability, stability– Increased statistical power of experiments
• Goal: Understand, optimize, and control risk-reward tradeoffs in search algorithms
![Page 13: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/13.jpg)
Motivating questions
• How can effectiveness and robustness be jointly optimized for key IR tasks?
• What tradeoffs are possible?• What are effective definitions of “risk” for
different IR tasks?• When and how can search engines effectively
“hedge” their bets for uncertain choices?• How can we improve our valuation models for
more complex needs, multiple queries or sessions
![Page 14: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/14.jpg)
14
Scenario 1: Query expansion[Collins-Thompson, NIPS 2008; CIKM 2009]
![Page 15: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/15.jpg)
Example: Ignoring aspect balance increases algorithm risk
court 0.026appeals 0.018federal 0.012employees 0.010case 0.010education 0.009school 0.008union 0.007seniority 0.007salary 0.006
Hypothetical query: ‘merit pay law for teachers’
legal aspect is modeled…
education & pay aspects thrown away..
BUT
![Page 16: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/16.jpg)
A better approach is to optimize selection of terms as a set
court 0.026appeals 0.018federal 0.012employees 0.010case 0.010education 0.009school 0.008union 0.007seniority 0.007salary 0.006
Hypothetical query: ‘merit pay law for teachers’
More balanced query model
16
Empirical evidence: Udupa, Bhole and Bhattacharya. ICTIR 2009
![Page 17: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/17.jpg)
Using financial optimization based on portfolio theory to mitigate risk in query expansion [Collins-Thompson, NIPS 2008]
• Reward: – Baseline provides initial weight vector c – Prefer words with higher ci values: R(x) = cTx
• Risk: – Model uncertainty in c using a covariance matrix Σ– Model uncertainty in Σ using regularized Σγ = Σ + γD – Diagonal: captures individual term variance (term centrality)– Off-diagonal: term covariance (co-occurrence/term association)
• Combined objective:
• Markowitz-type model
17
xDxxcxVxRxU TT )(2
)()()(
![Page 18: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/18.jpg)
Black-box approach works with any expansion algorithm via post-process optimizer
[Collins-Thompson, NIPS 2008]
18
Query
Baseline expansion algorithm
Convexoptimizer
Top-ranked documents(or other source of term
associations)
Robust query model
Constraints on word sets
We don’t assume the baseline is good or
reliable!
Word graph (Σ):• Individual term risk (diagonal)
• Conditional term risk (off-diagonal)
![Page 19: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/19.jpg)
Controlling the risk of using query expansion terms
10
, Q ,
subject to
2
- minimize
x sparsity/ Budgetyxw
supporttermQuery Qwuxlcoverage Aspectwxg
anceAspect balAx
Budget reward; & iskRyxxxc
T
iiii
iiT
i
TT
Aspect balance Term centrality Aspect coverage
19
Bad Good
Χ
Y
Χ
Y
Variable Centered
Χ
Y
Χ
Y
Low High
Χ
Y
Χ
Y
REXPalgorithm
![Page 20: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/20.jpg)
Example solution output
parkinson 0.996disease 0.848syndrome 0.495disorders 0.492parkinsons 0.491patient 0.483brain 0.360patients 0.313treatment 0.289diseases 0.153alzheimers 0.114...and 90 more...
parkinson 0.9900disease 0.9900syndrome 0.2077parkinsons 0.1350patients 0.0918brain 0.0256
Baseline expansion Post-processed robust version
(All other terms zero)
Query: parkinson’s disease
![Page 21: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/21.jpg)
Evaluating Risk-Reward Tradeoffs: Introducing Risk-Reward Curves
21
Aver
age
Effe
ctiv
enes
s(o
ver b
asel
ine)
Risk (Probability of Failure)
Robust algorithm:Higher effectiveness for any given level of risk
Given a baseline Mb, can we improve average effectiveness over Mb without hurting too many queries?
Gain-only model Risk-averse model
0
5
10
15
20
25
-100 -9
0-8
0
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
![Page 22: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/22.jpg)
Risk-reward curves as a function of algorithm risk-aversion parameter
![Page 23: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/23.jpg)
Risk-Reward Tradeoff Curves
0
5
10
15
20
25
30
35
40
0 500 1000 1500 2000 2500
R-Loss (Risk increase)
Perc
ent M
AP
Gai
n
Risk-reward curves: Algorithm A dominates algorithm B with consistently superior tradeoff
Algorithm A
Algorithm B
23
Curves UP and to the LEFT are better
![Page 24: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/24.jpg)
Risk-aversion parameter in query expansion: weight given to original vs expansion query
0
5
10
15
20
25
0 500 1000 1500R-Loss
QMOD trec7a
Baseline
0
5
10
15
20
25
0 500 1000 1500 2000 2500
R-Loss
QMOD trec8a
Baseline
0
5
10
15
20
25
30
35
40
45
0 500 1000 1500 2000 2500
R-Loss
QMOD trec12
Baseline
0
2
4
6
8
10
12
14
16
18
20
0 2000 4000 6000 8000 10000 12000
R-Loss
QMOD Robust2004
Baseline
-16-14-12-10
-8-6-4-202468
1012
0 2000 4000 6000 8000 10000 12000
R-Loss
QMOD gov2
Baseline
-16-14-12-10-8-6-4-202468
101214161820
0 1000 2000 3000 4000 5000
R-Loss
QMOD wt10g
Baseline
24
![Page 25: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/25.jpg)
Robust version dominates baseline version (MAP)
0
5
10
15
20
25
0 500 1000 1500
R-Loss
QMOD trec7a
QMOD
Baseline
0
5
10
15
20
25
0 500 1000 1500 2000 2500
R-Loss
QMOD trec8a
QMOD
Baseline
0
5
10
15
20
25
30
35
40
45
0 500 1000 1500 2000 2500
R-Loss
QMOD trec12
QMOD
Baseline
0
2
4
6
8
10
12
14
16
18
20
0 2000 4000 6000 8000 10000 12000
R-Loss
QMOD Robust2004
QMOD
Baseline
-16-14-12-10
-8-6-4-202468
1012
0 2000 4000 6000 8000 10000 12000
R-Loss
QMOD gov2
QMOD
Baseline
-16-14-12-10-8-6-4-202468
101214161820
0 1000 2000 3000 4000 5000
R-Loss
QMOD wt10g
QMOD
Baseline
25
![Page 26: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/26.jpg)
Robust version significantly reduces the worst expansion failures
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD trec12
Baseline
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD trec7a
Baseline
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD trec8a
Baseline
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD wt10g
Baseline
0
5
10
15
20
25
30
35
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD robust2004
Baseline
0
5
10
15
20
25
30
35
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Num
ber o
f que
ries
Percent MAP gain
QMOD gov2
Baseline
26
![Page 27: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/27.jpg)
Robust version significantly reduces the worst expansion failures
QMOD trec12
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
QMOD trec7a
0
5
10
15
20
25
-100 -9
0
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
QMOD trec8a
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
QMOD wt10g
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
QMOD robust2004
0
5
10
15
20
25
30
35
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
QMOD gov2
0
5
10
15
20
25
30
35
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries
Baseline
QMOD
27
![Page 28: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/28.jpg)
Aspect constraints are well-calibrated to actual expansion benefit
• About 15% of queries have infeasible programs (constraints can’t be satisfied)
• Infeasible → No expansion
28
-0.35
-0.25
-0.15
-0.05
0.05
0.15
0.25
0.35
[-100
%, -
75%
)
[-75%
, -50
%)
[-50%
, -25
%)
[-25%
, 0%
)
[0%
, 25%
)
[25%
, 50%
)
[50%
, 75%
)
[75%
, 100
%)
>= 1
00%
Percent MAP gain using baseline expansion
Log-
odds
of r
ever
ting
to o
rigin
al q
uery
![Page 29: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/29.jpg)
29
Scenario 2:Risk-sensitive objectives in learning to rank
[Wang, Bennett, Collins-Thompson SIGIR 2012]
![Page 30: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/30.jpg)
What Causes Risk in Ranking?
30
Significant differences exist between queries- Click entropies, clarity, length- Transactional, informational, navigational
Many ways to rank / re-rank- What features to use?- What learning algorithm to use?- How much personalization?
“Risk”: One intuitive definition: probability that this is the wrong technique for a particular query (i.e. hurts performance relative to baseline)
![Page 31: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/31.jpg)
Framing the Learning Problem
31
Learning
Ranking Model
Training data Ranked retrieval Top-K
Model class
Objective DocumentsQuery= =
Ranking model?
Optimization objective?
How to learn?
CHALLENGES:
Low-risk and effective (relative to baseline)
Optimally balancerisk & reward
Captures risk & reward
=
Baseline model
![Page 32: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/32.jpg)
A Combined Risk-Reward Optimization Objective
32
Queries hurt Queries helped
Reward: average positive gain (over all queries)
Risk: average negative gain (over all queries)
Objective: T(α) = Reward – (1+α) Risk# queries
baseline new model
TQQ
bm (Q)]M - (Q)M max[0,N1
new model baseline
![Page 33: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/33.jpg)
33
A General Family of Risk-Sensitive Objectives
Objective: T(α) = Reward – (1+α) Risk
Gives a family of tradeoff objectives that captures a spectrum of risk/reward tradeoffs
Some special cases: : standard average performance optimization
(high reward, high risk) = very large (low risk, low reward) Robustness of model increases with larger
Optimal value of can be chosen based on application
Can substitute in any effectiveness measure
![Page 34: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/34.jpg)
Integrating Risk-Sensitive Objective into LambdaMART
• Extension of LambdaMART (MART + LambdaRank)
• Each tree models gradient of tradeoff wrt doc scores
34
+ +… +
ij
Derivative of cross-entropy
Change in tradeoff due to swapping i and j
Sorted by scores
0
5
10
15
20
25
-100 -90
-80
-70
-60
-50
-40
-30
-20
-10 0 10 20 30 40 50 60 70 80 90 100
>100
Percent MAP gain
Num
ber o
f que
ries Queries
hurtQueries helped
Heavily promote
Heavily penalize
![Page 35: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/35.jpg)
Experiment Setup
• Task: Personalization– Dataset: Location (Bennett et al., 2011)– Selective per-query strategy: Min location entropy
• Low location entropy predicts likely local intent– Baseline: Re-ranking model learned on all personalization
features
35
![Page 36: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/36.jpg)
Risk-sensitive re-ranking for location personalization(α = 0, no risk-aversion)
[-100,-9
0)
[-90,-8
0)
[-80,-7
0)
[-70,-6
0)
[-60,-5
0)
[-50,-4
0)
[-40,-3
0)
[-30,-2
0)
[-20,-1
0)
[-10,0)
[10,20)
[20,30)
[30,40)
[40,50)
[50,60)
[60,70)
[70,80)
[80,90)
[90,100)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
alpha = 0
![Page 37: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/37.jpg)
Risk-sensitive re-ranking for location personalization(α = 1, mild risk-aversion)
[-100,-9
0)
[-90,-8
0)
[-80,-7
0)
[-70,-6
0)
[-60,-5
0)
[-50,-4
0)
[-40,-3
0)
[-30,-2
0)
[-20,-1
0)
[-10,0)
[10,20)
[20,30)
[30,40)
[40,50)
[50,60)
[60,70)
[70,80)
[80,90)
[90,100)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
alpha = 0
alpha = 1
![Page 38: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/38.jpg)
Risk-sensitive re-ranking for location personalization(α = 5, medium risk-aversion)
[-100,-9
0)
[-90,-8
0)
[-80,-7
0)
[-70,-6
0)
[-60,-5
0)
[-50,-4
0)
[-40,-3
0)
[-30,-2
0)
[-20,-1
0)
[-10,0)
[10,20)
[20,30)
[30,40)
[40,50)
[50,60)
[60,70)
[70,80)
[80,90)
[90,100)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
alpha = 0
alpha = 1
alpha = 5
![Page 39: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/39.jpg)
DDR 2012 Seattle
[-100,-9
0)
[-90,-8
0)
[-80,-7
0)
[-70,-6
0)
[-60,-5
0)
[-50,-4
0)
[-40,-3
0)
[-30,-2
0)
[-20,-1
0)
[-10,0)
[10,20)
[20,30)
[30,40)
[40,50)
[50,60)
[60,70)
[70,80)
[80,90)
[90,100)0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
alpha = 0
alpha = 1
alpha = 5
alpha = 10
Risk-sensitive re-ranking for location personalization(α = 10, highly risk-averse)
P(Loss > 20%) → 0 while maintaining significant
gains
![Page 40: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/40.jpg)
41
Gain vs Risk
![Page 41: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/41.jpg)
TREC Web Track 2013:Promoting research on risk-sensitive retrieval
• New collection:– ClueWeb12
• New task:– Risk-sensitive retrieval
• New topics:– Single + multi-faceted topics
![Page 42: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/42.jpg)
Participating groups
TU Delft (CWI)TU Delft (wistud)Univ. MontrealOmarTech, BeijingChinese Acad. SciencesMSR/CMURMITTechnionUniv. Delaware (Fang)Univ. Delaware (udel)Jiangsu Univ. Univ. GlasgowUniv. TwenteUniv. WaterlooUniv. Weimar
TREC 2013: 15 groups, 61 runs (TREC 2012: 12 groups, 48 runs)
Automatic runs: 53Manual runs: 8
Category A runs: 52Category B runs: 9
![Page 43: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/43.jpg)
Topic development
• Multi-faceted vs single-faceted topics• Faceted type and structure were not revealed
until after run submission• Initial topic release provided the query only
201:raspberry pi202:uss carl vinson203:reviews of les miserable204:rules of golf205:average charitable donation
![Page 44: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/44.jpg)
Example multi-faceted topicsshowing informational, navigational subtopics
<topic number="235" type="faceted"><query>ham radio</query><description> How do you get a ham radio license? </description>
<subtopic number="1" type="inf">How do you get a ham radio license?</subtopic><subtopic number="2" type="nav">What are the ham radio license classes?</subtopic><subtopic number="3" type="inf">How do you build a ham radio station?</subtopic><subtopic number="4" type="inf">Find information on ham radio antennas.</subtopic><subtopic number="5" type="nav">What are the ham radio call signs?</subtopic><subtopic number="6" type="nav">Find the web site of Ham Radio Outlet.</subtopic>
</topic>
![Page 45: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/45.jpg)
Example single-facet topics<topic number="227" type="single"><query>i will survive lyrics</query><description>Find the lyrics to the song "I Will Survive".</description></topic>
<topic number="229" type="single"><query>beef stroganoff recipe</query><description>Find complete (not partial) recipes for beef stroganoff.</description></topic>
![Page 46: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/46.jpg)
Track instructions• Via github, participants were provided:
– Baseline runs (ClueWeb09 and ClueWeb12)– Risk-sensitive versions of standard evaluation tools
• Compute risk-sensitive versions of ERR-IA, NDCG, etc.• gdeval, ndeval: new alpha parameter
• Ad-hoc task– Submit up to 3 runs, each with top 10k results, etc.
• Risk-sensitive task– Submit up to 3 runs: alpha = 1,5,10– Could perform new retrieval, not just re-ranking– Participants asked to self-identify alpha-level for each run
![Page 47: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/47.jpg)
Ad-hoc run rank (ERR@10)
0 2 4 6 8 10 12 14 16 180
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
![Page 48: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/48.jpg)
Visualization of adhoc runs ERR@10 vs nDCG@10
![Page 49: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/49.jpg)
Baseline run for risk evaluation
• Goals:– Good ad-hoc effectiveness (ERR and NDCG)– Standard, easily reproducible algorithm
• Approach:– Selected based on ClueWeb09 performance– RM3 Pseudo-relevance feedback from Indri retrieval engine.– For each query:
• 10 feedback documents, 20 feedback terms• Linear interpolation weight of 0.60 with original query.
– Waterloo spam classifier filtered out all documents with percentile-score < 70.
![Page 50: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/50.jpg)
Ad-hoc run performance (ERR@10) by topic
Topics201-225
Topics226-250
Baseline in red
![Page 51: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/51.jpg)
Two systems with strong average performance but different per-query variability profiles
Technion201-225
clustmrfaf
Glasgow201-225
uogTrAIwLmb
![Page 52: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/52.jpg)
Two systems with strong average performance but different per-query variability profiles
Technion226-250
clustmrfaf
Glasgow226-250
uogTrAIwLmb
![Page 53: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/53.jpg)
Risk-sensitive evaluation measures
Losses are weighted times as heavily as successes.
When the system will ignore the baseline.When the system will try to avoid large losses w.r.t. baseline.
The ad-hoc task corresponds to case.
Set of queries that gain over baseline by Set of queries that lose over baseline by
![Page 54: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/54.jpg)
Risk-sensitive results summary(ordered by alpha = 1)
![Page 55: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/55.jpg)
Relative ad-hoc vs risk-sensitive ERR@20(alpha = 1)
Ad-hoc vs risk-averse ERR@10
![Page 56: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/56.jpg)
Relative ad-hoc vs risk-sensitive ERR@20(alpha = 5)
Ad-hoc vs risk-averse ERR@10
![Page 57: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/57.jpg)
Relative ad-hoc vs risk-sensitive ERR@20(alpha = 10)
Ad-hoc vs risk-averse ERR@10
![Page 58: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/58.jpg)
Change in relative ranking of the top 10 systems as risk-aversion (alpha) increases (ERR@10)
![Page 59: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/59.jpg)
Did runs self-identified as risk-sensitive do better under the corresponding risk-sensitive measure?
0 2 4 6 8 10 12
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
diro_web_13ICTNETMSR_RedmondudelUJSuogTrwebiswistud
Alpha
Risk
-sen
sitive
ERR
@10
Del
ta
from
Tea
m's
Ow
n Ad
Hoc
Run
uogTr
ICTNET
wistud
diro_w
eb_1
3udel UJS
webis
MSR_R
edmond
00.060.12
Ad hoc Err@ 10
![Page 60: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/60.jpg)
Conclusions from TREC 2013 Risk-sensitive Task
• Evidence of some success in building robust systems that avoid baseline failures
• Less evidence of systems that are good at making explicit risk-reward tradeoffs
• Error (failure) profiles are still very different across systems, suggesting room for further improvements:– Query performance/failure prediction– Robust ranking objectives– Combining or selecting from multiple systems
![Page 61: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/61.jpg)
Research directions in risk-aware retrieval
• Measuring user-perceived impact of risky systems– Some limited user studies, for recommender systems– No large-scale studies of Web search
• Whole-page relevance as investment– Objective: Diversify across different user intent hypotheses…
• While also enforcing consistency constraints– When and how to modify the UI based on task/intent?
• Federated search– Handle growing number of diverse information resources– Integrating latency, cost with retrieval risk
![Page 62: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/62.jpg)
The three key points of this talk
1. Many key IR operations are risky to apply.• e.g. query expansion, personalized ranking
2. This risk can often be reduced by better algorithm design and feature choices
• Convex optimization, confidence-oriented features3. Evaluation should include risk analysis.
– Robustness gain/loss histograms– Risk-reward curves– Risk-averse effectiveness measures
63
Consider participating inTREC Web Track 2014!
![Page 63: Risk-sensitive Information Retrieval](https://reader036.fdocuments.in/reader036/viewer/2022062323/5681660d550346895dd949c8/html5/thumbnails/63.jpg)
64
Thanks! Questions?
• Now admitting new PhD students to my lab for Fall 2014
• Application deadline: December 15, 2013
Contact Kevyn Collins-Thompson [email protected]