Dynamic Information Retrieval Tutorial - WSDM 2015

WSDM Tutorial February 2nd 2015

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Charlie Clarke

Dynamic Information Retrieval

Modeling

Dynamic Information

Retrieval

Dynamic Information Retrieval Modeling Tutorial

20152

Document

s to

exploreInformatio

n

need

Observed

document

s

User

Devise a strategy

for helping the

user explore the

information space

in order to learn

which documents

are relevant and

which aren’t, and

satisfy their

information need.

Evolving IR


20153

Paradigm shifts in IR as new models

emerge

e.g. VSM → BM25 → Language Model

Different ways of defining relationship

between query and document

Static → Interactive → Dynamic

Evolution in modeling user interaction with

search engine

Outline


20154

Introduction & Theory

Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Conceptual Model – Static IR


20155

Static IRInteractive

IRDynamic

IR

No feedback

Characteristics of Static IR


20156

Does not learn directly from

user

Parameters updated

periodically


20157

Commonly Used Static IR

Models

BM25

PageRank

Language

Model

Learning to

Rank

Feedback in IR


20158

Outline


20159


Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking



Discussion Panel

Conceptual Model – Interactive

IR


201510


IRDynamic

IR

Exploit Feedback

Learn the user’s taste

interactively!

At the same time, provide good

recommendations!


201511

Interactive Recommender

Systems

Toy Example


201512

Multi-Page search scenario

User image searches for “jaguar”

Rank two of the four results over two

pages:

𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49

Toy Example – Static

Ranking


201513

Ranked according to PRP

Page 1 Page 2

1.

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

Toy Example – Relevance

Feedback


201514

Interactive Search

Improve 2nd page based on feedback

from 1st page

Use clicks as relevance feedback

Rocchio1 algorithm on terms in image

webpage

𝑤𝑞′ = 𝛼𝑤𝑞 +

𝛽

|𝐷𝑟| 𝑑∈𝐷𝑟

𝑤𝑑 −𝛾

𝐷𝑛 𝑑∈𝐷𝑛

𝑤𝑑

New query closer to relevant documents

and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-

Yates & Ribeiro-Neto ‘99


Feedback


201515

Ranked according to PRP and Rocchio

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

*

1.

* Click


Feedback


201516

No click when searching for animals

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

1. ?

?

Toy Example – Value

Function


201517

Optimize both pages using dynamic IR

Bellman equation for value function

Simplified example:

𝑉𝑡 𝜃𝑡, Σ𝑡 = max𝑠𝑡

𝜃𝑠𝑡 + 𝐸(𝑉𝑡+1 𝜃𝑡+1, Σ𝑡+1 𝐶𝑡)

𝜃𝑡, Σ𝑡 = relevance and covariance of documents for

page 𝑡

𝐶𝑡 = clicks on page 𝑡

𝑉𝑡 = ‘value’ of ranking on page 𝑡

Maximize value over all pages based on

estimating feedback

X Jin, M. Sloan and J. Wang

’13

1 0.8 0.1 00.8 1 0.1 00.1 0.1 1 0.950 0 0.95 1

Toy Example - Covariance


201518

Covariance matrix represents similarity between

images


’13

Toy Example – Myopic Value


201519

For myopic ranking, 𝑉2 = 16.380

Page 1

2.

1.


’13

Toy Example – Myopic

Ranking


201520

Page 2 ranking stays the same regardless of

clicksPage 1 Page 2

2.

1.

2.

1.


’13

Toy Example – Optimal Value


201521

For optimal ranking, 𝑉2 = 16.528

Page 1

2.

1.


’13

Toy Example – Optimal Ranking


201522

If car clicked, Jaguar logo is more relevant on

next pagePage 1 Page 2

2.

1.

2.

1.


’13

Toy Example – Optimal Ranking


201523

In all other scenarios, rank animal first on next

pagePage 1 Page 2

2.

1.

2.

1.


’13

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Documents exist in vector space

24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

Static IR Visualization


xx

xx

x xx

x

xx

x

oo

o o

o

o

o




Q


submission, 2015

t = 1: Static IR considers Relevancy


xx

xx

x xx

x

xx

x

oo

o o

o

o

o




Q


submission, 2015


Interactive IR Update

xx

xx

x xx

x

xx

x

oo

o o

o

o

o




Q

-1

-1

+1

Q’


submission, 2015


t = 2: Interactive considers local gains

Dynamic Ranking Principle

xx

xx

x xx

x

xx

x

oo

o o

o

o

o




t = 1: Relevancy + Variance

Q


submission, 2015


xx

xx

x xx

x

xx

x

oo

o o

o

o

o




t = 1: Relevancy + Variance + |Correlations|

Q

-1

-1

+1


submission, 2015


xx

xx

x xx

x

xx

x

oo

o o

o

o

o





Diversified, exploratory relevance ranking

Q


submission, 2015


xx

xx

x xx

x

xx

x

oo

o o

o

o

o




Q

-1

-1

+1

Q’


submission, 2015


Diversified, exploratory relevance ranking

t = 2: Personalized Re-ranking

Interactive vs Dynamic IR


201533

• Treats

interactions

independently

• Responds to

immediate

feedback

• Static IR used

before feedback

received

• Optimizes

over all

interaction

• Long term

gains

• Models future

user feedback

• Also used at

beginning of

interaction

Interactive Dynamic

Interactive & Dynamic

Techniques


201534

• Rocchio

equation in

Relevance

Feedback

• Collaborative

filtering in

recommender

systems

• Active learning

in interactive

retrieval

• POMDP in

multi page

search and ad

recommendati

on

• Multi Armed

Bandits in

Online

Evaluation

• MDP in

session search

Interactive Dynamic

Outline


201535


Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking



Discussion Panel

Conceptual Model – Interactive

IR


201536


IRDynamic

IR

Explore and exploit Feedback

Characteristics of Dynamic

IR


201537

Rich interactionsQuery formulation

Document clicks

Document examination

Eye movement

Mouse movements

etc.

[Luo et al., IRJ under revision 2014]


IR


201538

Temporal dependency

clicked documentsquery

D1

ranked documents

q1 C1

D2

q2 C2……

…… Dn

qn Cn

I

information need

iteration 1 iteration 2 iteration n



IR


201539

Overall goal

Optimize over all iterations for goal

IR metric or user satisfaction

Optimal policy


40/33

Dynamic Information

Retrieval

Dynamic Relevance

Dynamic Users

Dynamic Queries

Dynamic Documents

Dynamic Information Needs

Users change behavior

over time, user history

Topic Trends, Filtering,

document content change

User perceived

relevance changes

Changing query

definition i.e. ‘Twitter’

Information needs evolve over time

Next

generation

Search

Engine

Why Not Existing Supervised

Learning for Dynamic IR Modeling?


201541

Lack of enough training data

Dynamic IR problems contain a sequence of dynamic

interactions

E.g. a series of queries in session

Rare to find repeated sequences (close to zero)

Even in large query logs (WSCD 2013 & 2014, query logs

from Yandex)

Chance of finding repeated adjacent query

pairs is also lowDataset Repeated

Adjacent Query

Pairs

Total Adjacent

Query Pairs

Repeated

Percentage

WSCD

2013

476,390 17,784,583 2.68%

WSCD

2014

1,959,440 35,376,008 5.54%

Our Solution


201542

Try to find an optimal solution

through a sequence of dynamic

interactions

Trial and Error: learn from repeated, varied attempts

which are continued until success

No (or less) Supervised Learning

Trial and Error


201543

q1 – "dulles hotels"

q2 – "dulles airport"

q3 – "dulles airport

location"

q4 – "dulles metrostop"

What is a Desirable Model for

Dynamic IR


201544

Model interactions, which means it needs to have place holders for actions;

Model information need hidden behind user queries and other interactions;

Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;

Represent Markov properties to handle the temporal dependency.

A model in Trial and Error setting will do!

A Markov Model will do!

Markov Decision Process


201545

MDP extends MC with actions and rewards1

si– state ai – action ri – reward

pi – transition probability

p0 p1 p2 ……s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

1R. Bellman, ‘57

(S, M, A, R, γ)

Definition of MDP


201546

A tuple (S, M, A, R, γ)

S : state space

M: transition matrix

Ma(s, s') = P(s'|s, a)

A: action space

R: reward function

R(s,a) = immediate reward taking action a at state s

γ: discount factor, 0< γ ≤1

policy π

π(s) = the action taken at state s

Goal is to find an optimal policy π* maximizing the expected total rewards.

Optimality — Bellman

Equation


201547

The Bellman equation1 to MDP is a recursive

definition of the optimal value function V*(.)

𝑉∗ s = max𝑎

𝑅 𝑠, 𝑎 + 𝛾

𝑠′

𝑀𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)

Optimal Policy

π∗ s = arg𝑚𝑎𝑥𝑎

𝑅 𝑠, 𝑎 + 𝛾

𝑠′

𝑀𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)

1R. Bellman, ‘57

state-value function

MDP algorithms


201548

Value Iteration

Policy Iteration

Modified Policy Iteration

Prioritized Sweeping

Temporal Difference (TD) Learning

Q-Learning

Model free

approaches

Model-based

approaches

[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &

Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]

Solve

Bellman

equation

Optimal

value

V*(s)

Optimal

policy *(s)

[Slide altered from Carlos Guestrin’s ML

lecture]

Apply an MDP to an IR

Problem


201549

We can model IR systems using a Markov

Decision Process

Is there a temporal component?

States – What changes with each time step?

Actions – How does your system change the

state?

Rewards – How do you measure feedback or

effectiveness in your problem at each time

step?

Transition Probability – Can you determine

this?

If not, then model free approach is more

Outline


201550


Session Search

Dynamic Ranking



Discussion Panel

TREC Session Tracks (2010-

now)

Given a series of queries {q1,q2,…,qn}, top 10

retrieval results {D1, … Di-1 } for q1 to qi-1, and

click information

The task is to retrieve a list of documents for the

current/last query, qn

Relevance judgment is made based on how

relevant the documents are for qn, and how relevant

they are for information needs for the entire session

(in topic description)

no need to segment the sessions

51Dynamic Information Retrieval Modeling Tutorial

2015

1.pocono mountains pennsylvania

2.pocono mountains pennsylvania hotels

3.pocono mountains pennsylvania things to do

4.pocono mountains pennsylvania hotels

5.pocono mountains camelbeach

6.pocono mountains camelbeach hotel

7.pocono mountains chateau resort

8.pocono mountains chateau resort attractions

9.pocono mountains chateau resort getting to

10.chateau resort getting to

11.pocono mountains chateau resort directions

TREC 2012 Session 6

52

Information needs:

You are planning a winter vacation

to the Pocono Mountains region in

Pennsylvania in the US. Where will

you stay? What will you do while

there? How will you get there?

In a session, queries change

constantly


2015


We propose to model session search as a

Markov decision process (MDP)

Two agents: the User and the Search Engine

53

[Guan, Zhang and Yang SIGIR 2013]

Settings of the Session MDP

States: Queries

Environments: Search results

Actions:

User actions:

Add/remove/ unchange the query terms

Nicely correspond to our definition of query change

Search Engine actions:

Increase/ decrease /remain term weights

54


Search Engine Agent’s

Actions

∈ Di−1 action Example

qtheme

Y increase “pocono mountain” in s6

N increase“france world cup 98 reaction” in s28,

france world cup 98 reaction stock

market→ france world cup 98 reaction

+∆q

Y decrease‘policy’ in s37, Merck lobbyists → Merck

lobbyists US policy

N increase‘US’ in s37, Merck lobbyists → Merck

lobbyists US policy

−∆q

Y decrease‘reaction’ in s28, france world cup 98

reaction

→ france world cup 98

N No

change

‘legislation’ in s32, bollywood legislation

→bollywood law

55 [Guan, Zhang and Yang SIGIR 2013]

Bellman Equation

In a MDP, it is believed that a future reward is

not worth quite as much as a current reward

and thus a discount factor γ ϵ (0,1) is applied

to future rewards.

Bellman Equation gives the optimal value

(expected long term reward starting from state

s and continuing with policy π from then on)

for an MDP:

56

V*(s) = maxa

R(s,a) + g P(s' | s,a)s '

å V*(s')

Our Tweak

In a MDP, it is believed that a future reward is

not worth quite as much as a current reward

and thus a discount factor γ ϵ (0,1) is applied

to future rewards.

In session search, a past reward is not worth

quite as much as a current reward and thus a

discount factor γ should be applied to past

rewards

We model the MDP for session search in a reverse

order

57

Query Change retrieval Model

(QCM)

Bellman Equation gives the optimal value for

an MDP:

The reward function is used as the document

relevance score function and is tweaked

backwards from Bellman equation:

58

V*(s) = maxa

R(s,a) + g P(s' | s,a)s '

å V*(s')

a

Di

)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1

Document

relevant

score Query

Transition

model

Maximum

past

relevanceCurrent

reward/relevan

ce score


Calculating the Transition Model

)|(log)|(

)|(log)()|(log)|(

)|(log)]|(1[+ d)|P(q log = d) ,Score(q

*1

*1

*1ii

*1

*1

dtPdtP

dtPtidfdtPdtP

dtPdtP

qti

dtqt

dtqt

i

qthemeti

ii

59

• According to Query Change and Search

Engine ActionsCurrent reward/

relevance

score

Increase

weights for

theme terms

Decrease

weights for

removed terms

Increase

weights for

novel added

termsDecrease

weights for old

added terms


Maximizing the Reward Function

Generate a maximum rewarded document denoted as d*

i-1, from Di-1

That is the document(s) most relevant to qi-1

The relevance score can be calculated as

𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − 𝑡∈𝑞𝑖−1

{1 − 𝑃(𝑡|𝑑𝑖−1)}

𝑃 𝑡 𝑑𝑖−1 =#(𝑡,𝑑𝑖−1)

|𝑑𝑖−1|

From several options, we choose to only use the document with top relevance

maxDi-1

P(qi-1 |Di-1)



Scoring the Entire Session

The overall relevance score for a session of

queries is aggregated recursively :

Scoresession(qn, d) = Score(qn, d) + gScoresession(qn-1, d)

= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]

= g n-i

i=1

n

å Score(qi, d)



Experiments

TREC 2011-2012 query sets, datasets

ClubWeb09 Category B


2015

Search Accuracy (TREC

2012)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.2474 -21.54% 0.1274 -18.28%

TREC’12 median 0.2608 -17.29% 0.1440 -7.63%

Our TREC’12

submission0.3021 −4.19% 0.1490 -4.43%

TREC’12 best 0.3221 0.00% 0.1559 0.00%

QCM 0.3353 4.10%† 0.1529 -1.92%

QCM+Dup 0.3368 4.56%† 0.1537 -1.41%


2015

Search Accuracy (TREC

2011)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.3378 -23.38% 0.1118 -25.86%

TREC’11 median 0.3544 -19.62% 0.1143 -24.20%

TREC’11 best 0.4409 0.00% 0.1508 0.00%

QCM 0.4728 7.24%† 0.1713 13.59%†

QCM+Dup 0.4821 9.34%† 0.1714 13.66%†

Our TREC’12

submission0.4836 9.68%† 0.1724 14.32%†


2015

Search Accuracy for Different

Session Types TREC 2012 Sessions are classified into:

Product: Factual / Intellectual

Goal quality: Specific / Amorphous

Intellec

tual %chg Amorphous %chg Specific %chg Factual %chg

TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%

Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%

QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%

QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%

65

- Better handle sessions that demonstrate evolution and

exploration Because QCM treats a session as a continuous

process by studying changes among query transitions and

modeling the dynamicsDynamic Information Retrieval Modeling Tutorial

2015

POMDP Model


201566

……s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

Hidden states

Observations

Belief

1R. D. Smallwood et. al., ‘73

o1 o2 o3

POMDP Definition


201567

A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: transition matrix A: action space R: reward function γ: discount factor, 0< γ ≤1 O: observation set

an observation is a symbol emitted according to a hidden state. Θ: observation function

Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space

Belief is a probability distribution over hidden states.

68/33

A Markov Chain of Decision Making

…

A1A2 A3 A4

S1S2 S3 Sn

“old US coins” “collecting old

US coins”“selling old US

coins”

q1 q2 q3

“D1 is relevant and I

stay to find out more

about collecting…”

D1 D2 D3

“D2 is relevant and

I now move to the

next topic…”

“D3 is irrelevant; I slightly

edit the query and stay

here a little longer…”

[Luo, Zhang and Yang SIGIR 2014]

69/33

Hidden Decision Making States

SRT

Relevant &

Exploitation

SRR

Relevant &

Exploration

SNRT

Non-Relevant

& Exploitation

SNRR

Non-Relevant

& Exploration

scooter price ⟶ scooter stores

collecting old US coins⟶selling old US coins

Philadelphia NYC travel ⟶Philadelphia NYC train

Boston tourism ⟶ NYC tourism

q0


70/33

Dual Agent Stochastic Game

Hidden states

Actions

Rewards

Markov

……s0

r0

a0

r1

a1

r2

a2

s1 s2 s3

Dual-agent game

Cooperative game

Joint optimization D2

User AgentSearch Engine

Agent[Luo, Zhang and Yang SIGIR 2014]

71/33

Actions User Action (Au)

add query terms (+Δq)

remove query terms (-Δq)

keep query terms (qtheme)

Search Engine Action(Ase)

Increase/ decrease/ keep term weights

Switch on or off a search technique,

e.g. to use or not to use query expansion

adjust parameters in search techniques

e.g., select the best k for the top k docs used in PRF

Message from the user(Σu)

clicked documents

SAT clicked documents

Message from search engine(Σse)

top k returned documents

Messages are essentially

documents that an agent

thinks are relevant.


72/33

Dual-agent Stochastic Game

Documents

(world)

User agent Search engine agent

Belief

Updater


Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑

73/33


Documents

(world)

User agent

4 3

Search engine agent

Belief

Updater



74/33


Documents

(world)

User agent

4 3


Belief

Updater

Search engine agent


75/33

Observation function (O)

O(st+1, at, ωt) = P(ωt|st+1, at)

Two types of observations

Relevance related

Exploration-exploitation related

Probability of making observation ωt after taking action

at and landing in state st+1


76/33

Relevance-related Observation

Intuition

Similarly, we have

As well as 76

st is likely to be

Relevant

Non-Relevant

If ∃d ∈ Dt-1 and d is SAT Clicked

otherwise

It happens after the user sends out the message 𝛴𝑢𝑡 (clicks)

𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙|𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)

𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)

𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)

𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙

𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙


77/33

It is a combined observation

It happens when updating the before-message-belief-state for a user action au(query change) and a search engine message Ʃse =Dt-1

Intuition

st is likely to be

Exploration

Exploitation

if (+Δqt≠∅ and +Δqt∉Dt-1) or (+Δqt=∅ and -Δqt≠∅ )

if (+Δqt≠∅ and +Δqt∈Dt-1) or (+Δqt=∅ and –Δqt=∅ )

EXPLORATION-RELATED OBSERVATION

𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1

𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)


79/33

The long term reward for the search engine agent

The long term reward for the user agent

Joint optimization

𝑸𝒔𝒆(𝒃, 𝒂) =

𝒔∈𝑺

)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸

𝝎∈𝜴

𝑷(𝝎|𝒃, 𝒂𝒖, 𝜮𝒔𝒆)𝑷(𝝎|𝒃, 𝜮𝒖)𝒎𝒂𝒙𝒂

𝑸𝒔𝒆(𝒃′, 𝒂

𝑸𝒖(𝒃, 𝒂𝒖) = 𝑹(𝒔, 𝒂𝒖) + 𝜸 𝒂𝒖

)𝑻(𝒔𝒕|𝒔𝒕−𝟏, 𝑫𝒕−𝟏 𝒎𝒂𝒙𝒔𝒕−𝟏𝑸𝒖(𝒔𝒕−𝟏, 𝒂𝒖)

= P(qt|d) +𝜸 𝒂𝒖

)𝐏(𝒒𝒕|𝒒𝒕−𝟏, 𝑫𝒕−𝟏, 𝒂 𝒎𝒂𝒙𝑫𝒕−𝟏𝑷 (𝒒𝒕−𝟏|𝑫𝒕−𝟏)

𝒂𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒂

(𝑸𝒔𝒆(𝒃, 𝒂) + 𝑸𝒖(𝒃, 𝒂𝒖))

JOINT OPTIMIZATION — WIN-WIN


Dynamic Search Engine Demo

http://dumplingproject.org


201580

http://dumplingproject.org/

81/33

EXPERIMENTS

Evaluate on TREC 2012 and 2013 Session Tracks

The session logs contain

session topic

user queries

previously retrieved URLs, snippets

user clicks, and dwell time etc.

Task: retrieve 2,000 documents for the last query in each session

The evaluation is based on the whole session.

A document related to any query in the session is a good document

81

Datasets

ClueWeb09

ClueWeb12

Spams, dups are

removed

82/33

ACTIONS

increasing weights of the added terms by a factor of x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};

decreasing weights of the added terms by a factor of y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};

Query Change Model (QCM) proposed in Guan et. al SIGIR’13;

Pseudo Relevance Feedback which assumes the top 20 retrieved documents are relevant;

directly uses the query in current iteration to perform retrieval;

combines all queries in a session weights them equally.82

a

Di

)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1

83/33

SEARCH ACCURACY

Search accuracy on TREC 2012 Session Track

83

Win-win outperforms most retrieval algorithms on

TREC 2012.

84/33

84

Win-win outperforms all retrieval algorithms

on TREC 2013.

It is highly effective in Session Search.

Search accuracy on TREC 2013 Session Track

SEARCH ACCURACY

85/33

IMMEDIATE SEARCH ACCURACY

85

Original run: top returned documents provided by TREC log data

Win-win’s immediate search accuracy is better than the Original at

every iteration

Win-win's immediate search accuracy increases while the number

of search iterations increases

TREC 2012 Session Track TREC 2013 Session Track

86/33

86

q1=“best US destinations”

observation= NRRSRT

Relevant &

Exploitation

0.1784

SRRRelevant &

Exploration

0.1135

SNRTNon-Relevant &

Exploitation

0.2838

SNRRNon-Relevant

& Exploration

0.4243

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

BELIEF UPDATES (B)

q0

87/33

87


observation= NRR

q2=“distance New York

Boston”

observation = RT

SRTRelevant &

Exploitation

0.0005

SRRRelevant &

Exploration

0.0068

SNRTNon-Relevant &

Exploitation

0.0715

SNRRNon-Relevant

& Exploration

0.9212

BELIEF UPDATES (B)

q0




88/33

88


observation= NRR


Boston”

observation = RT

SRTRelevant &

Exploitation

0.0005

SRRRelevant &

Exploration

0.0068

SNRTNon-Relevant &

Exploitation

0.0715

SNRRNon-Relevant

& Exploration

0.9212

BELIEF UPDATES (B)

q0




89/33

89


observation= NRR


Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0151

SRRRelevant &

Exploration

0.4347

SNRTNon-Relevant &

Exploitation

0.0276

SNRRNon-Relevant

& Exploration

0.5226

BELIEF UPDATES (B)

q0




90/33

90


observation= NRR


Boston”

observation = RT


observation = NRT

SRTRelevant &

Exploitation

0.0151

SRRRelevant &

Exploration

0.4347

SNRTNon-Relevant &

Exploitation

0.0276

SNRRNon-Relevant

& Exploration

0.5226

BELIEF UPDATES (B)

q0




91/33

91


observation= NRR


Boston”

observation = RT


observation = NRT

SRTRelevant &

Exploitation

0.0291

SRRRelevant &

Exploration

0.7837

SNRTNon-Relevant &

Exploitation

0.0081

SNRRNon-Relevant

& Exploration

0.1790 q20=“Philadelphia NYC train”

observation = NRT

……

BELIEF UPDATES (B)

q0




92/33

92


observation= NRR


Boston”

observation = RT


observation = NRT

SRTRelevant &

Exploitation

0.0291

SRRRelevant &

Exploration

0.7837

SNRTNon-Relevant &

Exploitation

0.0081

SNRRNon-Relevant

& Exploration


observation = NRT

……

BELIEF UPDATES (B)

q0




93/33

93


observation= NRR


Boston”

observation = RT


observation = NRT

SRTRelevant &

Exploitation

0.0304

SRRRelevant &

Exploration

0.8126

SNRTNon-Relevant &

Exploitation

0.0066

SNRRNon-Relevant

& Exploration


observation = NRT

q21=“Philadelphia NYC bus”

observation = NRT

BELIEF UPDATES (B)

q0




……

94/33

94


observation= NRR


Boston”

observation = RT


observation = NRT

SRTRelevant &

Exploitation

0.0304

SRRRelevant &

Exploration

0.8126

SNRTNon-Relevant &

Exploitation

0.0066

SNRRNon-Relevant

& Exploration


observation = NRT

q21=“Philadelphia NYC bus”

observation = NRT

BELIEF UPDATES (B)

q0




……

Coffee Break


201595

Apply an MDP to an IR Problem

- Example


201596

User agent in session search

States – user’s relevance judgement

Action – new query

Reward – information gained

[Luo, Zhang, Yang SIGIR’14]

The agent uses a state estimator to update its belief about the hidden states

b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)

b′ s′ = P s′ o′, a, b =𝑃(𝑠′,𝑜′|𝑎,𝑏)

P(𝑜′|𝑎,𝑏)

=Θ(𝑠′, 𝑎, 𝑜′) 𝑠𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)

𝑃(𝑜′|𝑎, 𝑏)

POMDP → Belief Update


201597

POMDP → Bellman Equation


201598

The Bellman equation for POMDP

𝑉 𝑏 = max𝑎

𝑟 𝑏, 𝑎 + 𝛾

𝑜′

𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)

A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,

r, γ)

B : the continuous belief space

𝑀′: transition function 𝑀𝑎′ (𝑏, 𝑏′)= 𝑜∈𝑂 1𝑎,𝑜′(𝑏

′, 𝑏)Pr(𝑜′|𝑎, 𝑏)

where 1𝑎,𝑜′ 𝑏′, 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′

0, 𝑒𝑙𝑠𝑒.

A: action space

r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)

Applying POMDP to Dynamic

IR


201599

POMDP Dynamic IR

Environment Documents

Agents User, Search engine

States Queries, User’s decision making status, Relevance of

documents, etc

Actions Provide a ranking of documents, Weigh terms in the query,

Add/remove/unchange the query terms, Switch on or

switch off a search technology, Adjust parameters for a

search technology

Observations Queries, Clicks, Document lists, Snippets, Terms, etc

Rewards Evaluation measures (such as DCG, NDCG or MAP)

Clicking information

Transition matrix Given in advance or estimated from training data.

Observation

function

Problem dependent, Estimated based on sample datasets

Session Search Example - States

100

SRT

Relevant &

Exploitation

SRR

Relevant &

Exploration

SNRT

Non-Relevant &

Exploitation

SNRR

Non-Relevant &

Exploration

scooter price ⟶ scooter

stores

Hartford visitors ⟶ Hartford

Connecticut tourism

Philadelphia NYC travel ⟶ Philadelphia NYC train

distance New York Boston ⟶maps.bing.com

q0

[ J. Luo ,et al., ’14]Dynamic Information Retrieval Modeling Tutorial

2015

Session Search Example - Actions

(Au, Ase)

101

User Action(Au)

Add query terms (+Δq)

Remove query terms (-Δq)

keep query terms (qtheme)

clicked documents

SAT clicked documents

Search Engine Action(Ase)

increase/decrease/keep term weights,

Switch on or switch off query expansion

Adjust the number of top documents used in PRF

etc.

[ J. Luo et al., ’14]Dynamic Information Retrieval Modeling Tutorial

2015

TREC Session Tracks (2010-

2012)

Given a series of queries {q1,q2,…,qn}, top 10

retrieval results {D1, … Di-1 } for q1 to qi-1, and

click information

The task is to retrieve a list of documents for the

current/last query, qn

Relevance judgment is made based on how

relevant the documents are for qn, and how relevant

they are for information needs for the entire session

(in topic description)

no need to segment the sessions


2015

Query change is an important

form of feedback

We define query change as the syntactic

editing changes between two adjacent queries:

includes

, added terms

, removed terms

The unchanged/shared terms are called:

, theme term

1 iii qqq

iq

103

iqiq

iq

themeqq1 = “bollywood

legislation”

q2 = “bollywood law”

-------------------------------------

--

Theme Term =

“bollywood”

Added (+Δq) = “law”


2015

Where do these query changes come

from?

Given TREC Session settings, we consider two

sources of query change:

the previous search results that a user

viewed/read/examined

the information need

Example:

Kurosawa Kurosawa wife

`wife’ is not in any previous results, but in the topic

description

However, knowing information needs before

search is difficult to achieve


2015

Previous search results could

influence query change in quite

complex ways

Merck lobbyists Merck lobbying US policy

D1 contains several mentions of ‘policy’, such as “A lobbyist who until 2004 worked as senior policy

advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”

These mentions are about Canadian policies; while the user adds US policy in q2

Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’

Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1. The two terms should be treated differently


2015

106/33

POMDP

Rich Interactions

Hidden, Evolving

Information Needs

A Long Term

Goal

Temporal

Dependency

actions

hidden states

rewards

Markov

property

POMDP

(Partially Observable

Markov Decision

Process)

SG (Stochastic Games)

Multi-agent

Collaboration

Recap – Characteristics of

Dynamic IR


2015107

Rich interactions

Query formulation, Document clicks, Document

examination, eye movement, mouse movements, etc.

Temporal dependency

Overall goal

Modeling Query Change

A framework that is inspired by Reinforcement

Learning

Reinforcement Learning for Markov Decision

Process

models a state space S and an action space A

according to a transition model T = P(si+1|si ,ai)

a policy π(s) = a indicates that at a state s, what are

the actions a can be taken by the agent

each state is associated with a reward function R

that indicates possible positive reward or negative

loss that a state and an action may result.

Reinforcement learning offers general solutions to

MDP and seeks for the best policy for an agent.108

Outline


2015109


Session Search

Dynamic RankingMulti Armed Bandits

Portfolio Ranking

Multi-Page Search



Discussion Panel


2015110

Markov Process

Hidden Markov Model


Partially Observable Markov Decision Process

Multi-Armed Bandit

Family of Markov Models

Multi Armed Bandits (MAB)


2015111

……

……

Which slot

machine

should I select

in this round?

Reward

Multi Armed Bandits (MAB)


2015112

I won! Is this

the best slot

machine?

Reward

MAB Definition


2015113

A tuple (S, A, R, B)

S : hidden reward distribution of each

bandit

A: choose which bandit to play

R: reward for playing bandit

B: belief space, our estimate of each

bandit’s distribution

Comparison with Markov Models


2015114

Single state Markov Decision Process

No transition probability

Similar to POMDP in that we maintain a

belief state

Action = choose a bandit, does not

affect state

Does not ‘plan ahead’ but intelligently

adapts

Somewhere between interactive and

dynamic IR

MAB Policy Reward


2015115

MAB algorithm describes a policy 𝜋 for

choosing bandits

Maximise rewards from chosen bandits

over all time steps

Minimize regret

𝑡=1𝑇 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎𝜋(𝑡))

Cumulative difference between optimal

reward and actual reward

Exploration vs Exploitation


2015116

Exploration

Try out bandits to find which has highest average

reward

Exploitation

Too much exploration leads to poor performance

Play bandits that are known to pay out higher

reward on average

MAB algorithms balance exploration and

exploitation

Start by exploring more to find best bandits

Exploit more as best bandits become known

MAB – Index Algorithms


2015117

Gittens index1

Play bandit with highest ‘Dynamic Allocation Index’

Modelled using MDP but suffers ‘curse of

dimensionality’

𝜖-greedy2

Play highest reward bandit with probability 1 − ϵ

Play random bandit with probability 𝜖

UCB (Upper Confidence Bound)3

1J. C. Gittins. ‘892Nicolò Cesa-Bianchi et. al.,

‘983P. Auer et. al., ‘02

Comparison of Markov

Models


2015118

Markov Process – a fully observable stochastic

process

Hidden Markov Model – a partially observable

stochastic process

MDP – a fully observable decision process

MAB – a decision process, either fully or partially

observable

POMDP – a partially observable decision process

actions rewards states

Markov Process No No Observable

Hidden Markov

Model

No No Unobservable

MDP Yes Yes Observable

POMDP Yes Yes Unobservable

MAB Yes Yes Fixed

Outline


2015119


Session Search


Portfolio Ranking

Multi-Page Search



Discussion Panel

UCB Algorithm


2015120

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

UCB Algorithm


2015121

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

UCB Algorithm


2015122

𝑥𝑖 +2 ln 𝑡

𝑇𝑖


Average reward 𝑥𝑖

UCB Algorithm


2015123

𝑥𝑖 +2 ln 𝑡

𝑇𝑖


Average reward 𝑥𝑖 Time step 𝑡

UCB Algorithm


2015124

𝑥𝑖 +2 ln 𝑡

𝑇𝑖



Number of times bandit 𝑖 has been played 𝑇𝑖

UCB Algorithm


2015125

𝑥𝑖 +2 ln 𝑡

𝑇𝑖



Number of times bandit 𝑖 has been played 𝑇𝑖 Chances of playing infrequently played bandits

increases over time

Iterative Expectation


2015126

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

M. Sloan and J. Wang ‘13

UCB Algorithm


2015127

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Documents 𝑖




2015128

𝑟𝑖 +2 ln 𝑡

𝑇𝑖

Documents 𝑖

Average probability of relevance 𝑟𝑖




2015129

𝑟𝑖 +2 ln 𝑡

𝛾𝑖(𝑡)

Documents 𝑖

Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions

𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼

𝐶𝑘𝛽1−𝐶𝑘

𝛼 and 𝛽 reward clicks and non-clicks depending on

rank




2015130

𝑟𝑖 + 𝜆2 ln 𝑡

𝛾𝑖(𝑡)

Documents 𝑖

Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions

𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼

𝐶𝑘𝛽1−𝐶𝑘

𝛼 and 𝛽 reward clicks and non-clicks depending on

rank

Exploration parameter 𝜆


Portfolio Theory of IR


2015131

Portfolio Theory maximises expected return for a

given amount of risk1

Diversity of portfolio increases likely return

We can consider documents as ‘shares’

Documents are dependent on one another, unlike

PRP

Portfolio Theory of IR2 allows us to introduce diversity

1H. Markowitz. ‘522J. Wang et. al. ‘09

Portfolio Ranking


2015132

Documents are dependent on each other

Co-click Matrix from users and logs1

Portfolio Armed Bandit Ranking2:

Exploratively rank using Iterative Expectation

Diversify using portfolio optimisation over co-click matrix

Update relevance and dependence with each click

Both explorative and diverse

1W. Wu et al. ‘112M. Sloan and Jun Wang‘12

Outline


2015133


Session Search


Portfolio Ranking

Multi-Page Search



Discussion Panel

Multi Page Search


2015134

Page 1 Page 2

2.

1.

2.

1.


’13

Multi Page Search Example -

States & Actions


2015135

State:

Relevanc

e of

docume

nt

Action:

Ranking

of

document

s

Observatio

n: Clicks Belief:

Multivariate

Guassian

Reward: DCG

over 2 pages


’13

Model


2015136

Model


2015137

𝑁 𝜃1, Σ1

𝜃1 -prior estimate of relevance

Σ1 - prior estimate of covariance

Document similarity

Topic Clustering

Model


2015138

Rank action for page 1

Model


2015139

Model


2015140

Feedback from page 1

𝒓 ~ 𝑁(𝜃𝒔1, Σ𝒔

1)

Model


2015141

Update estimates using 𝒓1

𝜃1 =𝜃\𝒔′𝜃𝒔′

Σ1 =Σ\𝒔′ Σ\s′𝒔′Σs′\𝒔′ Σ𝒔′

𝜃2 = 𝜃\𝒔′ + Σ\s′𝒔′Σ𝒔′−1(𝒓1 − 𝜃𝒔′)

Σ2 = Σ\𝒔′ - Σ\s′𝒔′Σ𝒔′−1Σs′\𝒔′

Model


2015142

Rank using PRP

Model


2015143

Utility or Ranking

𝜆 𝑗=1𝑀

𝜃𝑠𝑗1

log2(𝑗+1)+ 1 − 𝜆 𝑗=1+𝑀

2𝑀𝜃𝑠𝑗2

log2(𝑗+1)

DCG

Model – Bellman Equation


2015144

Optimize 𝒔1 to improve 𝑼𝒔2

𝑉 𝜃1, Σ1, 1 = max𝒔1

𝜆𝜃𝒔1.𝑾1 +

𝜆


2015145

Balances exploration and exploitation in page 1

Tuned for different queries

Navigational

Informational

𝜆 = 1 for non-ambiguous search

Approximation


2015146

Monte Carlo Sampling

≈ max𝒔1

𝜆𝜃𝒔1.𝑾1 +max

𝒔21 − 𝜆

1

𝑆 𝑟∈𝑂 𝜃𝒔

2.𝑾2𝑃 𝒓

Sequential Ranking Decision

Experiment Data


2015147

Difficult to evaluate without access to live users

Simulated using 3 TREC collections and

relevance judgements

WT10G – Explicit Ratings

TREC8 – Clickthroughs

Robust – Difficult (ambiguous) search

User Simulation


2015148

Rank M documents

Simulated user clicks according to relevance

judgements

Update page 2 ranking

Measure at page 1 and 2

Recall

Precision

nDCG

MRR

BM25 – prior ranking model

Investigating λ


2015149

Baselines


2015150

𝜆 determined experimentally

BM25

BM25 with conditional update (𝜆 = 1)

Maximum Marginal Relevance (MMR)

Diversification

MMR with conditional update

Rocchio

Relevance Feedback

Results


2015151

Results


2015152

Results


2015153

Results


2015154

Results


2015155

Outline


2015156


Session Search

Dynamic Ranking



Discussion Panel

Cold-start problem in recommmender systems

Interactive Recommender Systems

Possible Solutions

Zhao, Xiaoxue, Weinan Zhang, and Jun

Wang. "Interactive collaborative filtering."

CIKM, 2013.

Objective

Cold-start problem Interactive

mechanism for CF

Zhao, Xiaoxue, Weinan Zhang, and Jun

Wang. "Interactive collaborative filtering."

CIKM, 2013.

Proposed EE algorithms

Thompson Sampling

Linear-UCB

General Linear-UCB

Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,

2013.

Cold-start users

Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,

2013.

Ad selection problem


2015163

how online publishers could optimally select ads

to maximize their ad incomes over time?

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Selling in

multiple-

channels

with non-

fixed

prices


2015164

Problem formulation


2012

Problem formulation


2015165


2012

Objective function


2015166


2012

Belief update


2015167


2012

Results


2015168


2012

Outline


2015169


Session Search

Dynamic Ranking



Discussion Panel

Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on

Dynamic Information Retrieval Modeling

Charlie Clarke

(with much much input from Mark Smucker)

University of Waterloo, Canada

Moving from static ranking to dynamic domains

• How to extend IR evaluation methodologies to

dynamic domains?

• Three key ideas:

1. Realistic models of searcher interactions

2. Measures costs to searcher in meaningful units

(e.g., time, money, …)

3. Measure benefits to searcher in meaningful units

(e.g, time, nuggets, …)

Charles Clarke, University of Waterloo 171

This talk strongly reflects my opinions (not trying to be neutral).

But I am the guest speaker

Evaluating Information Access Systems


searching, browsing, summarization,

visualization, desktop, mobile, web,

books, images, questions, etc., and

combinations of these

Does the system work for its users?

Will this change make the system better or worse?

How do we quantify performance?

Performance 101: Is this a good search result?


How to evaluate?

Study users


Users in the wild:

• A/B Testing

• Result interleaving

• Clicks and dwell time

• Mouse movements

• Other implicit feedback

• …

Users in the lab:

• Time to task completion

• Think aloud protocols

• Questionnaires

• Eye tracking

• …

Unfortunately user studies are

• Slow

• Expensive

• Conditions can never be exactly duplicated

(e.g., learning to rank)


Alternative: User performance prediction

Can we predict the impact of a proposed change to an

information access system (while respecting and reflecting

differences between users)?

Can we quantify performance improvements in meaningful

units so that effect sizes can be considered in statistical

testing? Are improvements practically significant, as well as

statistically significant?

Want to predict the impact of a proposed change

automatically, based on existing user performance data,

rather than gathering new performance data.


The BIG goal

↵

Traditional Evaluation of Rankers

• Test collection:

– Documents

– Queries

– Relevance judgments

• Each ranker generates a ranked list of

documents for each query

• Score ranked lists using relevance judgments

and standard metrics (recall, mean average

precision, nDCG, ERR, RBP, ….).



Example of a good-old-fashioned IR Metric

Relevant2.

Non-relevant1.

Non-relevant3.

Relevant5.

Non-relevant4.

Non-relevant6.

Non-relevant7.

Ranked List of

Documents

8.

…

Precision at

Rank N

0.00

0.50

0.33

0.25

0.40

0.33

0.29…

Average Precision is

the average of the

precision at N for each

relevant document.

Mean average

precision (MAP) is AP

averaged over the set

of queries.

AP =1

RPrec(Ri )

Ri

å

Precision at rank N is the fraction

of documents that are relevant in

the first N documents.

General form of effectiveness measures

Nearly all standard effectiveness measures

have the same basic form (including nDCG,

RBP, ERR, average precision,…):


Normalization

Rank Gain at rank k

Discount

factor

Implicit user model…

• User works down the ranked list spending

equal time on each document. Captions,

navigation, etc., have no impact.

• If they make it to rank i, they receive some

benefit (i.e., gain).

• Eventually they stop, which is reflected in the

discount (i.e., they are less likely to reach

lower ranks).

• Normalization typically maps the score into

the range [0:1]. Units may not be meaningful.


Traditional Evaluation of Rankers

• Many effectiveness measures: precision,

recall, average precision, rank-biased

precision, discounted cumulative gain, etc.

• Widely used and accepted as standard

practice.

• But…• What does an improvement in average precision from

0.28 to 0.31 mean to users?

• Does an increase in the measure really translate to an

improved user experience?

• How will an improve in the performance of a single

component impact overall system performance?


How to better reflect user variation and system performance?


Example: What’s the simplest possible user interface for search?

1) User issues a query

2) System returns material to read

i.e., system returns stuff to read, in order

(not a list of documents; more like a newspaper article)

A correspondingly simple user model, has two parameters:

1) Reading speed

2) Time spent reading

Reading speed distribution (from users in the lab)


Empirical distribution of reading speed during an information access task,

and its fit to a log-normal distribution.

Stopping time distribution (from users in the wild)


Empirical distribution of time spent searching during an information access

task, and its fit to a log-normal distribution.

Evaluating a search result


1) Generate a reading speed from the distribution

2) Generate a stopping time from the distribution

3) How much useful material did the user read?

4) Repeat for many (simulated) users

As an example, we use passage retrieval runs from TREC 2006

Hard Track, which essentially assume our simple user interface.

We measure costs to searcher in terms of time spent searching.

We measure benefits to searcher in terms of “time well spent”.

Useful characters read vs. Characters read


Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Useful characters read vs. Time spent reading




Time well spent vs. Time spent reading




Distribution of time well spent




Temporal precision vs. Time spent Reading




Distribution of temporal precision




General Framework (Part I): Cumulative Gain

• Consider the performance of a system in terms

of a cost-benefit (cumulative gain) curve G(t).

– Measure costs (e.g., in terms of time spent).

– Measure benefits (e.g., in terms of time well

spent).

• A particular instance of G(t) represents a

single user (described by a set of parameters)

interacting with a system. not just a list!!!

• G(t) captures factors intrinsic to the system.

We don’t know how much time the user has to

invest, but for different levels of investment,

G(t) indicates the benefit.Charles Clarke, University of Waterloo 192

General Framework (Part II): Decay

• Consider the user’s willingness to invest time in

terms of a decay curve D(t), which provides a

survival probability.

• We assume that G(t) and D(t) are independent.

(System dependent stopping probabilities are

accommodated in G(t). Details on request.)

• D(t) captures factors extrinsic to the system.

The user only has so much time they could

invest. The cannot invest more, even if they

would receive substantial additional benefit

from further interaction.


General form of effectiveness measures (REMINDER)

Nearly all standard effectiveness measures

have the same basic form (including nDCG,

RBP, ERR, average precision,…):


Normalization

Rank Gain at rank k

Discount

factor

General Framework (Part III): Time-biased gain

Overall system performance may be expressed

as expected cumulative gain (which also

incorporates standard effectiveness measures):


Normalization (== 1?)

Time Gain at time t

Decay

factor

General Framework (Part IV): Multiple users

• Cumulative gain may be computed by

– Simulation (drawing a set of parameters from a

population of users).

– Measuring actual interaction on live systems.

– Combinations of measurement and simulation.

• Simulating and/or measuring multiple users

allows us to consider performance difference

across the population of users.

• Simulation provides matching pairs (the same

user on both systems) increasing our ability to

detect differences.


General Framework

Most of the evaluation proposals in the

references can be reformulated in terms of this

general framework, including those that

address issues of:

– Novelty and diversity

– Filtering, summarization, question answering

– Session search, etc.


One more example from our current research…

Session search example

• Two (or more) result lists, e.g., from query

reformulation, query suggestion, or switching

search engines.

• Modeling searcher interaction requires a

switch from one result to another.

• The optimal time to switch depends on the

total time available to search.

For example (with many details omitted…):


Simulation of searchers switching between lists: A vs. B


User starts on list A.

If the user has less

than five minutes to

search, they should

stay on list A.

If the user has more

than five minutes to

search, they should

leave list A after 90

seconds.

But can we assume

optimal behavior when

modeling users?

Simulation of searchers switching between lists: A vs. B


0 2 4 6 8 10

02

46

8

Switch Time (minutes)

Ave

rag

e G

ain

(re

leva

nt d

ocu

me

nts

)

10 minutes

8 minutes

6 minutes

4 minutes

2 minutes

Session Duration

Topic = 389, List A = sab05ror1, List B = uic0501

Different view of the

same simulation, with

thousands of simulated

users.

Here, benefits are

measured by number of

relevant documents

seen.

Optimal switching time

depends on session

duration.

Summary

• Primary goal of IR evaluation: Predict how changes

to an IR system will impact the user experience.

• Evaluation in dynamic domains requires us to

explicitly model the system interface and the user’s

search behavior. Costs and benefits must be

measured in meaningful units (e.g., time).

• Successful IR evaluation requires measurement of

users, both “in the wild” and in the lab. These

measurements calibrate models, which make

predictions, which improve systems.


A few key papers

• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application

performance in information retrieval. In Proceedings of the 18th ACM conference on

Information and knowledge management (CIKM '09).

• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search

behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and

development in information retrieval (SIGIR '13).

• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:

simulating sessions in diverse searching environments. In Proceedings of the 35th

international ACM SIGIR conference on research and development in information retrieval

(SIGIR '12).

• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual

framework for investigation. In Proceedings of the 34th international ACM SIGIR

conference on research and development in Information Retrieval (SIGIR '11).

• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user

behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international

conference on information and knowledge management (CIKM '11).

• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in

user behavior into systems based evaluation. In Proceedings of the 21st ACM international

conference on information and knowledge management (CIKM '12).


A few more key papers

• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected

reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on

information and knowledge management (CIKM '09).

• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative

analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM

international conference on web search and data mining (WSDM '11).

• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the

5th information interaction in context symposium (IIiX '14).

• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:

evaluating ranking functions. In Proceedings of the sixth ACM international conference on

web search and data mining (WSDM '13).

• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.

Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings

of the IR research, 30th European conference on Advances in information retrieval

(ECIR'08).

• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and

the cube test: multi-dimensional evaluation for professional search. In Proceedings of the

22nd ACM international conference on information & knowledge management (CIKM '13).


And yet more key papers

• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a

unified framework for information access evaluation. In Proceedings of the 36th

international ACM SIGIR conference on Research and development in information retrieval

(SIGIR '13).

• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness

measures. In Proceedings of the 35th international ACM SIGIR conference on Research

and development in information retrieval (SIGIR '12).

• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased

gain. In Proceedings of the Symposium on Human-Computer Interaction and Information

Retrieval (HCIR '12).

• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected

browsing utility for web search evaluation. In Proceedings of the 19th ACM international

conference on Information and knowledge management (CIKM '10).

• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session

information distillation. In Proceedings of the 2nd international conference on the theory of

information retrieval (ICTIR ’09).

• Plus many other (ask me).


Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on

Dynamic Information Retrieval Modeling

Charlie Clarke

University of Waterloo, Canada

Thank you!

Outline


2015206


Session Search

Dynamic Ranking



Discussion Panel

Apply an MDP to an IR

Problem


2015207

We can model IR systems using a Markov

Decision Process

Is there a temporal component?

States – What changes with each time step?

Actions – How does your system change the

state?

Rewards – How do you measure feedback or

effectiveness in your problem at each time

step?

Transition Probability – Can you determine

this?

If not, then model free approach is more


- Example


2015208

User agent in session search

States – user’s relevance judgement

Action – new query

Reward – information gained

[Luo, Zhang, Yang SIGIR’14]


- Example


2015209

Search engine’s perspective

What if we can’t directly observe user’s

relevance judgement?

Click ≠ relevance

? ? ? ?

Applying POMDP to Dynamic

IR


2015210

POMDP Dynamic IR

Environment Documents

Agents User, Search engine

States Queries, User’s decision making status, Relevance of

documents, etc

Actions Provide a ranking of documents, Weigh terms in the query,

Add/remove/unchange the query terms, Switch on or

switch off a search technology, Adjust parameters for a

search technology

Observations Queries, Clicks, Document lists, Snippets, Terms, etc

Rewards Evaluation measures (such as DCG, NDCG or MAP)

Clicking information

Transition matrix Given in advance or estimated from training data.

Observation

function

Problem dependent, Estimated based on sample datasets

SIGIR Tutorial July 7th 2014

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Emine Yilmaz

Dynamic Information Retrieval

Modeling

Panel

Discussion

Outline


2015212


Session Search

Dynamic Ranking



Discussion Panel

Conclusion

Conclusions


2015213

Dynamic IR describes a new class of interactive

model

Incorporates rich feedback, temporal dependency

and is goal oriented.

Family of Markov models and Multi Armed Bandit

theory useful in building DIR models

Applicable to a range of IR problems

Useful in applications such as session search and

evaluation

Dynamic IR Book


2015214

Published by Morgan & Claypool

‘Synthesis Lectures on Information Concepts,

Retrieval, and Services’

Due April / May 2015 (in time for SIGIR 2015)

TREC 2015

Dynamic Domain Track Co-organized by Grace Hui Yang, John Frank, Ian Soboroff

Underexplored subsets of Web content Limited scope and richness of indexed content, which may not

include relevant components of the deep web

temporary pages, pages behind forms, etc.

Basic search interfaces, where there is little collaboration or history beyond independent keyword search Complex, task-based, dynamic search Temporal dependency Rich interactions Complex, evolving information needs Professional users A wide range of search strategies

215

Task

An interactive, multiple runs of search

Starting point: System is given a search query

Iterate System returns a ranked list of 5 documents API returns relevance judgments go to next iteration of retrieval

until done (system decides when to stop)

The goal of the system is to find relevant information for each topic as soon as possible

One-shot ad-hoc search is included

If system decides to stop after iteration one

216

domains

Domain Corpus

Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)

Which users are working together to sell illicit goods?

Ebola One million tweets

300k docs from in-country web sites (mostly official sites)

Who is doing what and where?

Local Politics 300k docs from local political groups in Pacific Northwest

and British Columbia. Who is campaigning for what and

why?

217

TIME Line TREC Call for Participation: January 2015

Data Available: March

Detailed Guidelines: April/May

Topics, Tasks available: June

Systems do their thing: June-July

Evaluation: August

Results to participants: September

Conference: November 2015

218

TREC 2015

Total Recall Track

Co-organized by Gord Cormack, Maura Grossman, , Adam Roegiest, Charlie Clarke

Explores high recall tasks through an active learning process modeled on legal search tasks (eDiscovery, patent search). Participating system start with a topic and proposes

a relevant document.

Systems gets immediate feedback on relevance.

Continues to propose additional documents and receive feedback until stopping condition is researched.

Shared online infrastructure and collections with Dynamic Domain. Easy to participate in both, if you participate in one.

219

Acknowledgment


2015220

We thank Prof. Charlie Clarke and for his guest

lecture

We sincerely thank Dr. Xuchu Dong for his help in

preparation of the tutorial

We also thank comments and suggestions from

the following colleagues:

Dr Filip Radlinski

Prof. Maarten de Rijke

References


2015221

Static IR

Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Addison-Wesley, 1999.

The PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd. 1999

Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005

A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.

Portfolio Theory of Information Retrieval. J. Wang and J. Zhu. In SIGIR 2009

References


2015222

Interactive IR

Relevance Feedback in Information Retrieval,

Rocchio, J. J., The SMART Retrieval System (pp.

313-23), 1971

A study in interface support mechanisms for

interactive information retrieval, Ryen W. White et. al,

JASIST, 2006

Visualizing stages during an exploratory search

session, Bill Kules et. al, HCIR, 2011

Dynamic Ranked Retrieval, Cristina Brandt et. al,

WSDM, 2011

Structured Learning of Two-level Dynamic Rankings,

Karthik Raman et. al, CIKM, 2011

References


2015223

Dynamic IR

A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.

Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002

A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003

Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009

Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009

References


2015224

Dynamic IR

Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009

A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010

A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010

Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011

No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011

Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011

Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012

References


2015225

Dynamic IR

Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.

Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012

Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.

Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.

Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.

Interactive Collaborative Filtering. X. Zhao, W. Zhang, J. Wang. In: CIKM'2013, pages 1411-1420.

References


2015226

Dynamic IR Win-win search: Dual-agent stochastic game in

session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.

Iterative Expectation for Multi-Period Information Retrieval. M. Sloan and J. Wang. In WSCD 2013.

Dynamical Information Retrieval Modelling: A Portfolio-Armed Bandit Machine Approach. M. Sloan and J. Wang. In WWW 2012.

Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui Yang. Designing States, Actions, and Rewards for Using POMDP in Session Search. In ECIR 2015.

Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP Model for Content-Free Document Re-ranking. In SIGIR 2014.

References


2015227

Markov Processes

A markovian decision process. R. Bellman. Indiana

University Mathematics Journal, 6:679–684, 1957.

Dynamic Programming. R. Bellman. Princeton University

Press, Princeton, NJ, USA, first edition, 1957.

Dynamic Programming and Markov Processes. R.A.

Howard. MIT Press. 1960

Linear Programming and Sequential Decisions. Alan S.

Manne. Management Science, 1960

Statistical Inference for Probabilistic Functions of Finite

State Markov Chains. Baum, Leonard E.; Petrie, Ted. The

Annals of Mathematical Statistics 37, 1966

References


2015228

Markov Processes

Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988

Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.

Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992

Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.

Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.

Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.

References


2015229

Markov Processes

Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003

VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.

Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.

Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005

Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006

Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.

References


2015230

Markov Processes

The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973

Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.

An example of statistical investigation of the text eugene oneginthe connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.

Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011

Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998

Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989

Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.

Dynamic Information Retrieval Tutorial - WSDM 2015

Science

Transcript of Dynamic Information Retrieval Tutorial - WSDM 2015