Dynamic Information Retrieval Tutorial - WSDM 2015

230
WSDM Tutorial February 2 nd 2015 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Charlie Clarke Dynamic Information Retrieval Modeling

Transcript of Dynamic Information Retrieval Tutorial - WSDM 2015

Page 1: Dynamic Information Retrieval Tutorial - WSDM 2015

WSDM Tutorial February 2nd 2015

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Charlie Clarke

Dynamic Information Retrieval

Modeling

Page 2: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information

Retrieval

Dynamic Information Retrieval Modeling Tutorial

20152

Document

s to

exploreInformatio

n

need

Observed

document

s

User

Devise a strategy

for helping the

user explore the

information space

in order to learn

which documents

are relevant and

which aren’t, and

satisfy their

information need.

Page 3: Dynamic Information Retrieval Tutorial - WSDM 2015

Evolving IR

Dynamic Information Retrieval Modeling Tutorial

20153

Paradigm shifts in IR as new models

emerge

e.g. VSM → BM25 → Language Model

Different ways of defining relationship

between query and document

Static → Interactive → Dynamic

Evolution in modeling user interaction with

search engine

Page 4: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

20154

Introduction & Theory

Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 5: Dynamic Information Retrieval Tutorial - WSDM 2015

Conceptual Model – Static IR

Dynamic Information Retrieval Modeling Tutorial

20155

Static IRInteractive

IRDynamic

IR

No feedback

Page 6: Dynamic Information Retrieval Tutorial - WSDM 2015

Characteristics of Static IR

Dynamic Information Retrieval Modeling Tutorial

20156

Does not learn directly from

user

Parameters updated

periodically

Page 7: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information Retrieval Modeling Tutorial

20157

Commonly Used Static IR

Models

BM25

PageRank

Language

Model

Learning to

Rank

Page 8: Dynamic Information Retrieval Tutorial - WSDM 2015

Feedback in IR

Dynamic Information Retrieval Modeling Tutorial

20158

Page 9: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

20159

Introduction & Theory

Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 10: Dynamic Information Retrieval Tutorial - WSDM 2015

Conceptual Model – Interactive

IR

Dynamic Information Retrieval Modeling Tutorial

201510

Static IRInteractive

IRDynamic

IR

Exploit Feedback

Page 11: Dynamic Information Retrieval Tutorial - WSDM 2015

Learn the user’s taste

interactively!

At the same time, provide good

recommendations!

Dynamic Information Retrieval Modeling Tutorial

201511

Interactive Recommender

Systems

Page 12: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example

Dynamic Information Retrieval Modeling Tutorial

201512

Multi-Page search scenario

User image searches for “jaguar”

Rank two of the four results over two

pages:

𝑟 = 0.5 𝑟 = 0.51 𝑟 = 0.9𝑟 = 0.49

Page 13: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Static

Ranking

Dynamic Information Retrieval Modeling Tutorial

201513

Ranked according to PRP

Page 1 Page 2

1.

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

Page 14: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial

201514

Interactive Search

Improve 2nd page based on feedback

from 1st page

Use clicks as relevance feedback

Rocchio1 algorithm on terms in image

webpage

𝑤𝑞′ = 𝛼𝑤𝑞 +

𝛽

|𝐷𝑟| 𝑑∈𝐷𝑟

𝑤𝑑 −𝛾

𝐷𝑛 𝑑∈𝐷𝑛

𝑤𝑑

New query closer to relevant documents

and different to non-relevant documents1Rocchio, J. J., ’71, Baeza-

Yates & Ribeiro-Neto ‘99

Page 15: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial

201515

Ranked according to PRP and Rocchio

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

𝑟 = 0.5

𝑟 = 0.49

*

1.

* Click

Page 16: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Relevance

Feedback

Dynamic Information Retrieval Modeling Tutorial

201516

No click when searching for animals

Page 1 Page 2

2.

𝑟 = 0.9

𝑟 = 0.51

1.

2.

1. ?

?

Page 17: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Value

Function

Dynamic Information Retrieval Modeling Tutorial

201517

Optimize both pages using dynamic IR

Bellman equation for value function

Simplified example:

𝑉𝑡 𝜃𝑡, Σ𝑡 = max𝑠𝑡

𝜃𝑠𝑡 + 𝐸(𝑉𝑡+1 𝜃𝑡+1, Σ𝑡+1 𝐶𝑡)

𝜃𝑡, Σ𝑡 = relevance and covariance of documents for

page 𝑡

𝐶𝑡 = clicks on page 𝑡

𝑉𝑡 = ‘value’ of ranking on page 𝑡

Maximize value over all pages based on

estimating feedback

X Jin, M. Sloan and J. Wang

’13

Page 18: Dynamic Information Retrieval Tutorial - WSDM 2015

1 0.8 0.1 00.8 1 0.1 00.1 0.1 1 0.950 0 0.95 1

Toy Example - Covariance

Dynamic Information Retrieval Modeling Tutorial

201518

Covariance matrix represents similarity between

images

X Jin, M. Sloan and J. Wang

’13

Page 19: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Myopic Value

Dynamic Information Retrieval Modeling Tutorial

201519

For myopic ranking, 𝑉2 = 16.380

Page 1

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 20: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Myopic

Ranking

Dynamic Information Retrieval Modeling Tutorial

201520

Page 2 ranking stays the same regardless of

clicksPage 1 Page 2

2.

1.

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 21: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Optimal Value

Dynamic Information Retrieval Modeling Tutorial

201521

For optimal ranking, 𝑉2 = 16.528

Page 1

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 22: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Optimal Ranking

Dynamic Information Retrieval Modeling Tutorial

201522

If car clicked, Jaguar logo is more relevant on

next pagePage 1 Page 2

2.

1.

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 23: Dynamic Information Retrieval Tutorial - WSDM 2015

Toy Example – Optimal Ranking

Dynamic Information Retrieval Modeling Tutorial

201523

In all other scenarios, rank animal first on next

pagePage 1 Page 2

2.

1.

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 24: Dynamic Information Retrieval Tutorial - WSDM 2015

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Documents exist in vector space

24 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

Static IR Visualization

Page 25: Dynamic Information Retrieval Tutorial - WSDM 2015

Static IR Visualization

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Q

25 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

t = 1: Static IR considers Relevancy

Page 26: Dynamic Information Retrieval Tutorial - WSDM 2015

Static IR Visualization

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Q

26 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

t = 1: Static IR considers Relevancy

Page 27: Dynamic Information Retrieval Tutorial - WSDM 2015

Interactive IR Update

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Q

-1

-1

+1

Q’

27 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

t = 1: Static IR considers Relevancy

t = 2: Interactive considers local gains

Page 28: Dynamic Information Retrieval Tutorial - WSDM 2015

Interactive IR Update

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Q

-1

-1

+1

Q’

28 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

t = 1: Static IR considers Relevancy

t = 2: Interactive considers local gains

Page 29: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Ranking Principle

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

t = 1: Relevancy + Variance

Q

29 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

Page 30: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Ranking Principle

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

t = 1: Relevancy + Variance + |Correlations|

Q

-1

-1

+1

30 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

Page 31: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Ranking Principle

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

t = 1: Relevancy + Variance + |Correlations|

Diversified, exploratory relevance ranking

Q

31 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

Page 32: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Ranking Principle

xx

xx

x xx

x

xx

x

oo

o o

o

o

o

x xdoc about apple ceo

X: doc about apple fruit

O: doc about apple iphone

Q

-1

-1

+1

Q’

32 Marc Sloan and Jun Wang, Dynamic Ranking Principle, Under

submission, 2015

t = 1: Relevancy + Variance + |Correlations|

Diversified, exploratory relevance ranking

t = 2: Personalized Re-ranking

Page 33: Dynamic Information Retrieval Tutorial - WSDM 2015

Interactive vs Dynamic IR

Dynamic Information Retrieval Modeling Tutorial

201533

• Treats

interactions

independently

• Responds to

immediate

feedback

• Static IR used

before feedback

received

• Optimizes

over all

interaction

• Long term

gains

• Models future

user feedback

• Also used at

beginning of

interaction

Interactive Dynamic

Page 34: Dynamic Information Retrieval Tutorial - WSDM 2015

Interactive & Dynamic

Techniques

Dynamic Information Retrieval Modeling Tutorial

201534

• Rocchio

equation in

Relevance

Feedback

• Collaborative

filtering in

recommender

systems

• Active learning

in interactive

retrieval

• POMDP in

multi page

search and ad

recommendati

on

• Multi Armed

Bandits in

Online

Evaluation

• MDP in

session search

Interactive Dynamic

Page 35: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

201535

Introduction & Theory

Static IR

Interactive IR

Dynamic IR

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 36: Dynamic Information Retrieval Tutorial - WSDM 2015

Conceptual Model – Interactive

IR

Dynamic Information Retrieval Modeling Tutorial

201536

Static IRInteractive

IRDynamic

IR

Explore and exploit Feedback

Page 37: Dynamic Information Retrieval Tutorial - WSDM 2015

Characteristics of Dynamic

IR

Dynamic Information Retrieval Modeling Tutorial

201537

Rich interactionsQuery formulation

Document clicks

Document examination

Eye movement

Mouse movements

etc.

[Luo et al., IRJ under revision 2014]

Page 38: Dynamic Information Retrieval Tutorial - WSDM 2015

Characteristics of Dynamic

IR

Dynamic Information Retrieval Modeling Tutorial

201538

Temporal dependency

clicked documentsquery

D1

ranked documents

q1 C1

D2

q2 C2……

…… Dn

qn Cn

I

information need

iteration 1 iteration 2 iteration n

[Luo et al., IRJ under revision 2014]

Page 39: Dynamic Information Retrieval Tutorial - WSDM 2015

Characteristics of Dynamic

IR

Dynamic Information Retrieval Modeling Tutorial

201539

Overall goal

Optimize over all iterations for goal

IR metric or user satisfaction

Optimal policy

[Luo et al., IRJ under revision 2014]

Page 40: Dynamic Information Retrieval Tutorial - WSDM 2015

40/33

Dynamic Information

Retrieval

Dynamic Relevance

Dynamic Users

Dynamic Queries

Dynamic Documents

Dynamic Information Needs

Users change behavior

over time, user history

Topic Trends, Filtering,

document content change

User perceived

relevance changes

Changing query

definition i.e. ‘Twitter’

Information needs evolve over time

Next

generation

Search

Engine

Page 41: Dynamic Information Retrieval Tutorial - WSDM 2015

Why Not Existing Supervised

Learning for Dynamic IR Modeling?

Dynamic Information Retrieval Modeling Tutorial

201541

Lack of enough training data

Dynamic IR problems contain a sequence of dynamic

interactions

E.g. a series of queries in session

Rare to find repeated sequences (close to zero)

Even in large query logs (WSCD 2013 & 2014, query logs

from Yandex)

Chance of finding repeated adjacent query

pairs is also lowDataset Repeated

Adjacent Query

Pairs

Total Adjacent

Query Pairs

Repeated

Percentage

WSCD

2013

476,390 17,784,583 2.68%

WSCD

2014

1,959,440 35,376,008 5.54%

Page 42: Dynamic Information Retrieval Tutorial - WSDM 2015

Our Solution

Dynamic Information Retrieval Modeling Tutorial

201542

Try to find an optimal solution

through a sequence of dynamic

interactions

Trial and Error: learn from repeated, varied attempts

which are continued until success

No (or less) Supervised Learning

Page 43: Dynamic Information Retrieval Tutorial - WSDM 2015

Trial and Error

Dynamic Information Retrieval Modeling Tutorial

201543

q1 – "dulles hotels"

q2 – "dulles airport"

q3 – "dulles airport

location"

q4 – "dulles metrostop"

Page 44: Dynamic Information Retrieval Tutorial - WSDM 2015

What is a Desirable Model for

Dynamic IR

Dynamic Information Retrieval Modeling Tutorial

201544

Model interactions, which means it needs to have place holders for actions;

Model information need hidden behind user queries and other interactions;

Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies;

Represent Markov properties to handle the temporal dependency.

A model in Trial and Error setting will do!

A Markov Model will do!

Page 45: Dynamic Information Retrieval Tutorial - WSDM 2015

Markov Decision Process

Dynamic Information Retrieval Modeling Tutorial

201545

MDP extends MC with actions and rewards1

si– state ai – action ri – reward

pi – transition probability

p0 p1 p2 ……s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

1R. Bellman, ‘57

(S, M, A, R, γ)

Page 46: Dynamic Information Retrieval Tutorial - WSDM 2015

Definition of MDP

Dynamic Information Retrieval Modeling Tutorial

201546

A tuple (S, M, A, R, γ)

S : state space

M: transition matrix

Ma(s, s') = P(s'|s, a)

A: action space

R: reward function

R(s,a) = immediate reward taking action a at state s

γ: discount factor, 0< γ ≤1

policy π

π(s) = the action taken at state s

Goal is to find an optimal policy π* maximizing the expected total rewards.

Page 47: Dynamic Information Retrieval Tutorial - WSDM 2015

Optimality — Bellman

Equation

Dynamic Information Retrieval Modeling Tutorial

201547

The Bellman equation1 to MDP is a recursive

definition of the optimal value function V*(.)

𝑉∗ s = max𝑎

𝑅 𝑠, 𝑎 + 𝛾

𝑠′

𝑀𝑎(𝑠, 𝑠′)𝑉∗(𝑠′)

Optimal Policy

π∗ s = arg𝑚𝑎𝑥𝑎

𝑅 𝑠, 𝑎 + 𝛾

𝑠′

𝑀𝑎 𝑠, 𝑠′ 𝑉∗(𝑠′)

1R. Bellman, ‘57

state-value function

Page 48: Dynamic Information Retrieval Tutorial - WSDM 2015

MDP algorithms

Dynamic Information Retrieval Modeling Tutorial

201548

Value Iteration

Policy Iteration

Modified Policy Iteration

Prioritized Sweeping

Temporal Difference (TD) Learning

Q-Learning

Model free

approaches

Model-based

approaches

[Bellman, ’57, Howard, ‘60, Puterman and Shin, ‘78, Singh & Sutton, ‘96, Sutton &

Barto, ‘98, Richard Sutton, ‘88, Watkins, ‘92]

Solve

Bellman

equation

Optimal

value

V*(s)

Optimal

policy *(s)

[Slide altered from Carlos Guestrin’s ML

lecture]

Page 49: Dynamic Information Retrieval Tutorial - WSDM 2015

Apply an MDP to an IR

Problem

Dynamic Information Retrieval Modeling Tutorial

201549

We can model IR systems using a Markov

Decision Process

Is there a temporal component?

States – What changes with each time step?

Actions – How does your system change the

state?

Rewards – How do you measure feedback or

effectiveness in your problem at each time

step?

Transition Probability – Can you determine

this?

If not, then model free approach is more

Page 50: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

201550

Introduction & Theory

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 51: Dynamic Information Retrieval Tutorial - WSDM 2015

TREC Session Tracks (2010-

now)

Given a series of queries {q1,q2,…,qn}, top 10

retrieval results {D1, … Di-1 } for q1 to qi-1, and

click information

The task is to retrieve a list of documents for the

current/last query, qn

Relevance judgment is made based on how

relevant the documents are for qn, and how relevant

they are for information needs for the entire session

(in topic description)

no need to segment the sessions

51Dynamic Information Retrieval Modeling Tutorial

2015

Page 52: Dynamic Information Retrieval Tutorial - WSDM 2015

1.pocono mountains pennsylvania

2.pocono mountains pennsylvania hotels

3.pocono mountains pennsylvania things to do

4.pocono mountains pennsylvania hotels

5.pocono mountains camelbeach

6.pocono mountains camelbeach hotel

7.pocono mountains chateau resort

8.pocono mountains chateau resort attractions

9.pocono mountains chateau resort getting to

10.chateau resort getting to

11.pocono mountains chateau resort directions

TREC 2012 Session 6

52

Information needs:

You are planning a winter vacation

to the Pocono Mountains region in

Pennsylvania in the US. Where will

you stay? What will you do while

there? How will you get there?

In a session, queries change

constantly

Dynamic Information Retrieval Modeling Tutorial

2015

Page 53: Dynamic Information Retrieval Tutorial - WSDM 2015

Markov Decision Process

We propose to model session search as a

Markov decision process (MDP)

Two agents: the User and the Search Engine

53

[Guan, Zhang and Yang SIGIR 2013]

Page 54: Dynamic Information Retrieval Tutorial - WSDM 2015

Settings of the Session MDP

States: Queries

Environments: Search results

Actions:

User actions:

Add/remove/ unchange the query terms

Nicely correspond to our definition of query change

Search Engine actions:

Increase/ decrease /remain term weights

54

[Guan, Zhang and Yang SIGIR 2013]

Page 55: Dynamic Information Retrieval Tutorial - WSDM 2015

Search Engine Agent’s

Actions

∈ Di−1 action Example

qtheme

Y increase “pocono mountain” in s6

N increase“france world cup 98 reaction” in s28,

france world cup 98 reaction stock

market→ france world cup 98 reaction

+∆q

Y decrease‘policy’ in s37, Merck lobbyists → Merck

lobbyists US policy

N increase‘US’ in s37, Merck lobbyists → Merck

lobbyists US policy

−∆q

Y decrease‘reaction’ in s28, france world cup 98

reaction

→ france world cup 98

N No

change

‘legislation’ in s32, bollywood legislation

→bollywood law

55 [Guan, Zhang and Yang SIGIR 2013]

Page 56: Dynamic Information Retrieval Tutorial - WSDM 2015

Bellman Equation

In a MDP, it is believed that a future reward is

not worth quite as much as a current reward

and thus a discount factor γ ϵ (0,1) is applied

to future rewards.

Bellman Equation gives the optimal value

(expected long term reward starting from state

s and continuing with policy π from then on)

for an MDP:

56

V*(s) = maxa

R(s,a) + g P(s' | s,a)s '

å V*(s')

Page 57: Dynamic Information Retrieval Tutorial - WSDM 2015

Our Tweak

In a MDP, it is believed that a future reward is

not worth quite as much as a current reward

and thus a discount factor γ ϵ (0,1) is applied

to future rewards.

In session search, a past reward is not worth

quite as much as a current reward and thus a

discount factor γ should be applied to past

rewards

We model the MDP for session search in a reverse

order

57

Page 58: Dynamic Information Retrieval Tutorial - WSDM 2015

Query Change retrieval Model

(QCM)

Bellman Equation gives the optimal value for

an MDP:

The reward function is used as the document

relevance score function and is tweaked

backwards from Bellman equation:

58

V*(s) = maxa

R(s,a) + g P(s' | s,a)s '

å V*(s')

a

Di

)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1

Document

relevant

score Query

Transition

model

Maximum

past

relevanceCurrent

reward/relevan

ce score

[Guan, Zhang and Yang SIGIR 2013]

Page 59: Dynamic Information Retrieval Tutorial - WSDM 2015

Calculating the Transition Model

)|(log)|(

)|(log)()|(log)|(

)|(log)]|(1[+ d)|P(q log = d) ,Score(q

*1

*1

*1ii

*1

*1

dtPdtP

dtPtidfdtPdtP

dtPdtP

qti

dtqt

dtqt

i

qthemeti

ii

59

• According to Query Change and Search

Engine ActionsCurrent reward/

relevance

score

Increase

weights for

theme terms

Decrease

weights for

removed terms

Increase

weights for

novel added

termsDecrease

weights for old

added terms

[Guan, Zhang and Yang SIGIR 2013]

Page 60: Dynamic Information Retrieval Tutorial - WSDM 2015

Maximizing the Reward Function

Generate a maximum rewarded document denoted as d*

i-1, from Di-1

That is the document(s) most relevant to qi-1

The relevance score can be calculated as

𝑃 𝑞𝑖−1 𝑑𝑖−1 = 1 − 𝑡∈𝑞𝑖−1

{1 − 𝑃(𝑡|𝑑𝑖−1)}

𝑃 𝑡 𝑑𝑖−1 =#(𝑡,𝑑𝑖−1)

|𝑑𝑖−1|

From several options, we choose to only use the document with top relevance

maxDi-1

P(qi-1 |Di-1)

60Dynamic Information Retrieval Modeling Tutorial

2015 [Guan, Zhang and Yang SIGIR 2013]

Page 61: Dynamic Information Retrieval Tutorial - WSDM 2015

Scoring the Entire Session

The overall relevance score for a session of

queries is aggregated recursively :

Scoresession(qn, d) = Score(qn, d) + gScoresession(qn-1, d)

= Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)]

= g n-i

i=1

n

å Score(qi, d)

61Dynamic Information Retrieval Modeling Tutorial

2015 [Guan, Zhang and Yang SIGIR 2013]

Page 62: Dynamic Information Retrieval Tutorial - WSDM 2015

Experiments

TREC 2011-2012 query sets, datasets

ClubWeb09 Category B

62Dynamic Information Retrieval Modeling Tutorial

2015

Page 63: Dynamic Information Retrieval Tutorial - WSDM 2015

Search Accuracy (TREC

2012)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.2474 -21.54% 0.1274 -18.28%

TREC’12 median 0.2608 -17.29% 0.1440 -7.63%

Our TREC’12

submission0.3021 −4.19% 0.1490 -4.43%

TREC’12 best 0.3221 0.00% 0.1559 0.00%

QCM 0.3353 4.10%† 0.1529 -1.92%

QCM+Dup 0.3368 4.56%† 0.1537 -1.41%

63Dynamic Information Retrieval Modeling Tutorial

2015

Page 64: Dynamic Information Retrieval Tutorial - WSDM 2015

Search Accuracy (TREC

2011)

nDCG@10 (official metric used in TREC)

Approach nDCG@10 %chg MAP %chg

Lemur 0.3378 -23.38% 0.1118 -25.86%

TREC’11 median 0.3544 -19.62% 0.1143 -24.20%

TREC’11 best 0.4409 0.00% 0.1508 0.00%

QCM 0.4728 7.24%† 0.1713 13.59%†

QCM+Dup 0.4821 9.34%† 0.1714 13.66%†

Our TREC’12

submission0.4836 9.68%† 0.1724 14.32%†

64Dynamic Information Retrieval Modeling Tutorial

2015

Page 65: Dynamic Information Retrieval Tutorial - WSDM 2015

Search Accuracy for Different

Session Types TREC 2012 Sessions are classified into:

Product: Factual / Intellectual

Goal quality: Specific / Amorphous

Intellec

tual %chg Amorphous %chg Specific %chg Factual %chg

TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00%

Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51%

QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29%

QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10%

65

- Better handle sessions that demonstrate evolution and

exploration Because QCM treats a session as a continuous

process by studying changes among query transitions and

modeling the dynamicsDynamic Information Retrieval Modeling Tutorial

2015

Page 66: Dynamic Information Retrieval Tutorial - WSDM 2015

POMDP Model

Dynamic Information Retrieval Modeling Tutorial

201566

……s0 s1

r0

a0

s2

r1

a1

s3

r2

a2

Hidden states

Observations

Belief

1R. D. Smallwood et. al., ‘73

o1 o2 o3

Page 67: Dynamic Information Retrieval Tutorial - WSDM 2015

POMDP Definition

Dynamic Information Retrieval Modeling Tutorial

201567

A tuple (S, M, A, R, γ, O, Θ, B) S : state space M: transition matrix A: action space R: reward function γ: discount factor, 0< γ ≤1 O: observation set

an observation is a symbol emitted according to a hidden state. Θ: observation function

Θ(s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space

Belief is a probability distribution over hidden states.

Page 68: Dynamic Information Retrieval Tutorial - WSDM 2015

68/33

A Markov Chain of Decision Making

A1A2 A3 A4

S1S2 S3 Sn

“old US coins” “collecting old

US coins”“selling old US

coins”

q1 q2 q3

“D1 is relevant and I

stay to find out more

about collecting…”

D1 D2 D3

“D2 is relevant and

I now move to the

next topic…”

“D3 is irrelevant; I slightly

edit the query and stay

here a little longer…”

[Luo, Zhang and Yang SIGIR 2014]

Page 69: Dynamic Information Retrieval Tutorial - WSDM 2015

69/33

Hidden Decision Making States

SRT

Relevant &

Exploitation

SRR

Relevant &

Exploration

SNRT

Non-Relevant

& Exploitation

SNRR

Non-Relevant

& Exploration

scooter price ⟶ scooter stores

collecting old US coins⟶selling old US coins

Philadelphia NYC travel ⟶Philadelphia NYC train

Boston tourism ⟶ NYC tourism

q0

[Luo, Zhang and Yang SIGIR 2014]

Page 70: Dynamic Information Retrieval Tutorial - WSDM 2015

70/33

Dual Agent Stochastic Game

Hidden states

Actions

Rewards

Markov

……s0

r0

a0

r1

a1

r2

a2

s1 s2 s3

Dual-agent game

Cooperative game

Joint optimization D2

User AgentSearch Engine

Agent[Luo, Zhang and Yang SIGIR 2014]

Page 71: Dynamic Information Retrieval Tutorial - WSDM 2015

71/33

Actions User Action (Au)

add query terms (+Δq)

remove query terms (-Δq)

keep query terms (qtheme)

Search Engine Action(Ase)

Increase/ decrease/ keep term weights

Switch on or off a search technique,

e.g. to use or not to use query expansion

adjust parameters in search techniques

e.g., select the best k for the top k docs used in PRF

Message from the user(Σu)

clicked documents

SAT clicked documents

Message from search engine(Σse)

top k returned documents

Messages are essentially

documents that an agent

thinks are relevant.

[Luo, Zhang and Yang SIGIR 2014]

Page 72: Dynamic Information Retrieval Tutorial - WSDM 2015

72/33

Dual-agent Stochastic Game

Documents

(world)

User agent Search engine agent

Belief

Updater

[Luo, Zhang and Yang SIGIR 2014]

Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑

Page 73: Dynamic Information Retrieval Tutorial - WSDM 2015

73/33

Dual-agent Stochastic Game

Documents

(world)

User agent

4 3

Search engine agent

Belief

Updater

[Luo, Zhang and Yang SIGIR 2014]

Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑

Page 74: Dynamic Information Retrieval Tutorial - WSDM 2015

74/33

Dual-agent Stochastic Game

Documents

(world)

User agent

4 3

[Luo, Zhang and Yang SIGIR 2014]

Belief

Updater

Search engine agent

Σse= 𝐷𝑡𝑜𝑝_𝑟𝑒𝑡𝑢𝑟𝑛𝑒𝑑

Page 75: Dynamic Information Retrieval Tutorial - WSDM 2015

75/33

Observation function (O)

O(st+1, at, ωt) = P(ωt|st+1, at)

Two types of observations

Relevance related

Exploration-exploitation related

Probability of making observation ωt after taking action

at and landing in state st+1

[Luo, Zhang and Yang SIGIR 2014]

Page 76: Dynamic Information Retrieval Tutorial - WSDM 2015

76/33

Relevance-related Observation

Intuition

Similarly, we have

As well as 76

st is likely to be

Relevant

Non-Relevant

If ∃d ∈ Dt-1 and d is SAT Clicked

otherwise

It happens after the user sends out the message 𝛴𝑢𝑡 (clicks)

𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt=Rel)≝ 𝑃(ωt = 𝑅𝑒𝑙|𝑠𝑡 = 𝑅𝑒𝑙, 𝑢)

𝑂(𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙) ∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙, 𝑢)∝ 𝑃 𝑠𝑡 = 𝑅𝑒𝑙 ω𝑡 = 𝑅𝑒𝑙 𝑃(ωt = 𝑅𝑒𝑙| 𝑢)

𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙∝ 𝑃 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 ω𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙 𝑃(ωt = 𝑁𝑜𝑛𝑅𝑒𝑙| 𝑢)

𝑂 𝑠𝑡 = 𝑁𝑜𝑛𝑅𝑒𝑙, 𝑢 ,ωt = 𝑅𝑒𝑙

𝑂 𝑠𝑡 = 𝑅𝑒𝑙, 𝑢 ,ωt = 𝑁𝑜𝑛𝑅𝑒𝑙

[Luo, Zhang and Yang SIGIR 2014]

Page 77: Dynamic Information Retrieval Tutorial - WSDM 2015

77/33

It is a combined observation

It happens when updating the before-message-belief-state for a user action au(query change) and a search engine message Ʃse =Dt-1

Intuition

st is likely to be

Exploration

Exploitation

if (+Δqt≠∅ and +Δqt∉Dt-1) or (+Δqt=∅ and -Δqt≠∅ )

if (+Δqt≠∅ and +Δqt∈Dt-1) or (+Δqt=∅ and –Δqt=∅ )

EXPLORATION-RELATED OBSERVATION

𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛× 𝑃 ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑖𝑡𝑎𝑡𝑖𝑜𝑛 ∆𝑞𝑡, 𝐷𝑡 − 1

𝑂 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛, 𝑎𝑢 = ∆𝑞𝑡, 𝑠𝑒 = 𝐷𝑡 − 1,ω𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛∝ 𝑃 𝑠𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛 𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛× 𝑃(𝑤𝑡 = 𝐸𝑥𝑝𝑙𝑜𝑟𝑎𝑡𝑖𝑜𝑛|∆𝑞𝑡, 𝐷𝑡 − 1)

[Luo, Zhang and Yang SIGIR 2014]

Page 78: Dynamic Information Retrieval Tutorial - WSDM 2015

78/33

The belief state b is updated when a new observation is

obtained.

𝒃𝒕+𝟏(𝒔𝒋) = 𝑷(𝒔𝒋|𝝎𝒕, 𝒂𝒕, 𝒃𝒕

=

𝑷(𝝎𝒕|𝒔𝒋, 𝒂𝒕, 𝒃𝒕) 𝒔𝒊∈𝑺

𝑷(𝒔𝒋|𝒔𝒊, 𝒂𝒕, 𝒃𝒕)𝒃𝒕(𝒔𝒊

)𝑷(𝝎𝒕|𝒂𝒕, 𝒃𝒕

=

𝑶(𝒔𝒋, 𝒂𝒕, 𝝎𝒕) 𝒔𝒊∈𝑺

𝑷(𝒔𝒋|𝒔𝒊, 𝒂𝒕, 𝒃𝒕)𝒃𝒕(𝒔𝒊

)𝑷(𝝎𝒕|𝒂𝒕, 𝒃𝒕

BELIEF UPDATES (B)

Page 79: Dynamic Information Retrieval Tutorial - WSDM 2015

79/33

The long term reward for the search engine agent

The long term reward for the user agent

Joint optimization

𝑸𝒔𝒆(𝒃, 𝒂) =

𝒔∈𝑺

)𝒃(𝒔)𝑹(𝒔, 𝒂 + 𝜸

𝝎∈𝜴

𝑷(𝝎|𝒃, 𝒂𝒖, 𝜮𝒔𝒆)𝑷(𝝎|𝒃, 𝜮𝒖)𝒎𝒂𝒙𝒂

𝑸𝒔𝒆(𝒃′, 𝒂

𝑸𝒖(𝒃, 𝒂𝒖) = 𝑹(𝒔, 𝒂𝒖) + 𝜸 𝒂𝒖

)𝑻(𝒔𝒕|𝒔𝒕−𝟏, 𝑫𝒕−𝟏 𝒎𝒂𝒙𝒔𝒕−𝟏𝑸𝒖(𝒔𝒕−𝟏, 𝒂𝒖)

= P(qt|d) +𝜸 𝒂𝒖

)𝐏(𝒒𝒕|𝒒𝒕−𝟏, 𝑫𝒕−𝟏, 𝒂 𝒎𝒂𝒙𝑫𝒕−𝟏𝑷 (𝒒𝒕−𝟏|𝑫𝒕−𝟏)

𝒂𝒔𝒆 = 𝒂𝒓𝒈𝒎𝒂𝒙𝒂

(𝑸𝒔𝒆(𝒃, 𝒂) + 𝑸𝒖(𝒃, 𝒂𝒖))

JOINT OPTIMIZATION — WIN-WIN

[Luo, Zhang and Yang SIGIR 2014]

Page 80: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Search Engine Demo

http://dumplingproject.org

Dynamic Information Retrieval Modeling Tutorial

201580

Page 81: Dynamic Information Retrieval Tutorial - WSDM 2015

81/33

EXPERIMENTS

Evaluate on TREC 2012 and 2013 Session Tracks

The session logs contain

session topic

user queries

previously retrieved URLs, snippets

user clicks, and dwell time etc.

Task: retrieve 2,000 documents for the last query in each session

The evaluation is based on the whole session.

A document related to any query in the session is a good document

81

Datasets

ClueWeb09

ClueWeb12

Spams, dups are

removed

Page 82: Dynamic Information Retrieval Tutorial - WSDM 2015

82/33

ACTIONS

increasing weights of the added terms by a factor of x={1.05, 1.10, 1.15, 1.20, 1.25, 1.5, 1.75 or 2};

decreasing weights of the added terms by a factor of y={0.5, 0.57, 0.67, 0.8, 0.83, 0.87, 0.9 or 0.95};

Query Change Model (QCM) proposed in Guan et. al SIGIR’13;

Pseudo Relevance Feedback which assumes the top 20 retrieved documents are relevant;

directly uses the query in current iteration to perform retrieval;

combines all queries in a session weights them equally.82

a

Di

)D|(q P maxa) ,D ,q|(q P + d)|(q P = d) ,Score(q 1-i1-i1-i1-iiii1

Page 83: Dynamic Information Retrieval Tutorial - WSDM 2015

83/33

SEARCH ACCURACY

Search accuracy on TREC 2012 Session Track

83

Win-win outperforms most retrieval algorithms on

TREC 2012.

Page 84: Dynamic Information Retrieval Tutorial - WSDM 2015

84/33

84

Win-win outperforms all retrieval algorithms

on TREC 2013.

It is highly effective in Session Search.

Search accuracy on TREC 2013 Session Track

SEARCH ACCURACY

Page 85: Dynamic Information Retrieval Tutorial - WSDM 2015

85/33

IMMEDIATE SEARCH ACCURACY

85

Original run: top returned documents provided by TREC log data

Win-win’s immediate search accuracy is better than the Original at

every iteration

Win-win's immediate search accuracy increases while the number

of search iterations increases

TREC 2012 Session Track TREC 2013 Session Track

Page 86: Dynamic Information Retrieval Tutorial - WSDM 2015

86/33

86

q1=“best US destinations”

observation= NRRSRT

Relevant &

Exploitation

0.1784

SRRRelevant &

Exploration

0.1135

SNRTNon-Relevant &

Exploitation

0.2838

SNRRNon-Relevant

& Exploration

0.4243

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

BELIEF UPDATES (B)

q0

Page 87: Dynamic Information Retrieval Tutorial - WSDM 2015

87/33

87

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

SRTRelevant &

Exploitation

0.0005

SRRRelevant &

Exploration

0.0068

SNRTNon-Relevant &

Exploitation

0.0715

SNRRNon-Relevant

& Exploration

0.9212

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 88: Dynamic Information Retrieval Tutorial - WSDM 2015

88/33

88

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

SRTRelevant &

Exploitation

0.0005

SRRRelevant &

Exploration

0.0068

SNRTNon-Relevant &

Exploitation

0.0715

SNRRNon-Relevant

& Exploration

0.9212

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 89: Dynamic Information Retrieval Tutorial - WSDM 2015

89/33

89

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0151

SRRRelevant &

Exploration

0.4347

SNRTNon-Relevant &

Exploitation

0.0276

SNRRNon-Relevant

& Exploration

0.5226

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 90: Dynamic Information Retrieval Tutorial - WSDM 2015

90/33

90

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0151

SRRRelevant &

Exploration

0.4347

SNRTNon-Relevant &

Exploitation

0.0276

SNRRNon-Relevant

& Exploration

0.5226

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 91: Dynamic Information Retrieval Tutorial - WSDM 2015

91/33

91

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0291

SRRRelevant &

Exploration

0.7837

SNRTNon-Relevant &

Exploitation

0.0081

SNRRNon-Relevant

& Exploration

0.1790 q20=“Philadelphia NYC train”

observation = NRT

……

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 92: Dynamic Information Retrieval Tutorial - WSDM 2015

92/33

92

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0291

SRRRelevant &

Exploration

0.7837

SNRTNon-Relevant &

Exploitation

0.0081

SNRRNon-Relevant

& Exploration

0.1790 q20=“Philadelphia NYC train”

observation = NRT

……

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

Page 93: Dynamic Information Retrieval Tutorial - WSDM 2015

93/33

93

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0304

SRRRelevant &

Exploration

0.8126

SNRTNon-Relevant &

Exploitation

0.0066

SNRRNon-Relevant

& Exploration

0.1505 q20=“Philadelphia NYC train”

observation = NRT

q21=“Philadelphia NYC bus”

observation = NRT

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

……

Page 94: Dynamic Information Retrieval Tutorial - WSDM 2015

94/33

94

q1=“best US destinations”

observation= NRR

q2=“distance New York

Boston”

observation = RT

q3=“maps.bing.com”

observation = NRT

SRTRelevant &

Exploitation

0.0304

SRRRelevant &

Exploration

0.8126

SNRTNon-Relevant &

Exploitation

0.0066

SNRRNon-Relevant

& Exploration

0.1505 q20=“Philadelphia NYC train”

observation = NRT

q21=“Philadelphia NYC bus”

observation = NRT

BELIEF UPDATES (B)

q0

TREC’13 session #87 topic: planning a trip to the United States. You will be there for a

month and able to travel within a 150-mile radius of your destination. What are the

best cities to visit?

……

Page 95: Dynamic Information Retrieval Tutorial - WSDM 2015

Coffee Break

Dynamic Information Retrieval Modeling Tutorial

201595

Page 96: Dynamic Information Retrieval Tutorial - WSDM 2015

Apply an MDP to an IR Problem

- Example

Dynamic Information Retrieval Modeling Tutorial

201596

User agent in session search

States – user’s relevance judgement

Action – new query

Reward – information gained

[Luo, Zhang, Yang SIGIR’14]

Page 97: Dynamic Information Retrieval Tutorial - WSDM 2015

The agent uses a state estimator to update its belief about the hidden states

b′ = 𝑆𝐸(𝑏, 𝑎, 𝑜′)

b′ s′ = P s′ o′, a, b =𝑃(𝑠′,𝑜′|𝑎,𝑏)

P(𝑜′|𝑎,𝑏)

=Θ(𝑠′, 𝑎, 𝑜′) 𝑠𝑀(𝑠, 𝑎, 𝑠′)𝑏(𝑠)

𝑃(𝑜′|𝑎, 𝑏)

POMDP → Belief Update

Dynamic Information Retrieval Modeling Tutorial

201597

Page 98: Dynamic Information Retrieval Tutorial - WSDM 2015

POMDP → Bellman Equation

Dynamic Information Retrieval Modeling Tutorial

201598

The Bellman equation for POMDP

𝑉 𝑏 = max𝑎

𝑟 𝑏, 𝑎 + 𝛾

𝑜′

𝑃(𝑜′|𝑎, 𝑏)𝑉(𝑏′)

A POMDP can be transformed into a continuous belief MDP (B, 𝑀′, A,

r, γ)

B : the continuous belief space

𝑀′: transition function 𝑀𝑎′ (𝑏, 𝑏′)= 𝑜∈𝑂 1𝑎,𝑜′(𝑏

′, 𝑏)Pr(𝑜′|𝑎, 𝑏)

where 1𝑎,𝑜′ 𝑏′, 𝑏 = 1, 𝑖𝑓 𝑆𝐸 𝑏, 𝑎, 𝑜′ = 𝑏′

0, 𝑒𝑙𝑠𝑒.

A: action space

r: reward function r(b, a)= 𝑠∈𝑆 𝑏 𝑠 𝑅(𝑠, 𝑎)

Page 99: Dynamic Information Retrieval Tutorial - WSDM 2015

Applying POMDP to Dynamic

IR

Dynamic Information Retrieval Modeling Tutorial

201599

POMDP Dynamic IR

Environment Documents

Agents User, Search engine

States Queries, User’s decision making status, Relevance of

documents, etc

Actions Provide a ranking of documents, Weigh terms in the query,

Add/remove/unchange the query terms, Switch on or

switch off a search technology, Adjust parameters for a

search technology

Observations Queries, Clicks, Document lists, Snippets, Terms, etc

Rewards Evaluation measures (such as DCG, NDCG or MAP)

Clicking information

Transition matrix Given in advance or estimated from training data.

Observation

function

Problem dependent, Estimated based on sample datasets

Page 100: Dynamic Information Retrieval Tutorial - WSDM 2015

Session Search Example - States

100

SRT

Relevant &

Exploitation

SRR

Relevant &

Exploration

SNRT

Non-Relevant &

Exploitation

SNRR

Non-Relevant &

Exploration

scooter price ⟶ scooter

stores

Hartford visitors ⟶ Hartford

Connecticut tourism

Philadelphia NYC travel ⟶ Philadelphia NYC train

distance New York Boston ⟶maps.bing.com

q0

[ J. Luo ,et al., ’14]Dynamic Information Retrieval Modeling Tutorial

2015

Page 101: Dynamic Information Retrieval Tutorial - WSDM 2015

Session Search Example - Actions

(Au, Ase)

101

User Action(Au)

Add query terms (+Δq)

Remove query terms (-Δq)

keep query terms (qtheme)

clicked documents

SAT clicked documents

Search Engine Action(Ase)

increase/decrease/keep term weights,

Switch on or switch off query expansion

Adjust the number of top documents used in PRF

etc.

[ J. Luo et al., ’14]Dynamic Information Retrieval Modeling Tutorial

2015

Page 102: Dynamic Information Retrieval Tutorial - WSDM 2015

TREC Session Tracks (2010-

2012)

Given a series of queries {q1,q2,…,qn}, top 10

retrieval results {D1, … Di-1 } for q1 to qi-1, and

click information

The task is to retrieve a list of documents for the

current/last query, qn

Relevance judgment is made based on how

relevant the documents are for qn, and how relevant

they are for information needs for the entire session

(in topic description)

no need to segment the sessions

102Dynamic Information Retrieval Modeling Tutorial

2015

Page 103: Dynamic Information Retrieval Tutorial - WSDM 2015

Query change is an important

form of feedback

We define query change as the syntactic

editing changes between two adjacent queries:

includes

, added terms

, removed terms

The unchanged/shared terms are called:

, theme term

1 iii qqq

iq

103

iqiq

iq

themeqq1 = “bollywood

legislation”

q2 = “bollywood law”

-------------------------------------

--

Theme Term =

“bollywood”

Added (+Δq) = “law”

Dynamic Information Retrieval Modeling Tutorial

2015

Page 104: Dynamic Information Retrieval Tutorial - WSDM 2015

Where do these query changes come

from?

Given TREC Session settings, we consider two

sources of query change:

the previous search results that a user

viewed/read/examined

the information need

Example:

Kurosawa Kurosawa wife

`wife’ is not in any previous results, but in the topic

description

However, knowing information needs before

search is difficult to achieve

104Dynamic Information Retrieval Modeling Tutorial

2015

Page 105: Dynamic Information Retrieval Tutorial - WSDM 2015

Previous search results could

influence query change in quite

complex ways

Merck lobbyists Merck lobbying US policy

D1 contains several mentions of ‘policy’, such as “A lobbyist who until 2004 worked as senior policy

advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck …”

These mentions are about Canadian policies; while the user adds US policy in q2

Our guess is that the user might be inspired by ‘policy’, but he/she prefers a different sub-concept other than `Canadian policy’

Therefore, for the added terms `US policy’, ‘US’ is the novel term here, and ‘policy’ is not since it appeared in D1. The two terms should be treated differently

105Dynamic Information Retrieval Modeling Tutorial

2015

Page 106: Dynamic Information Retrieval Tutorial - WSDM 2015

106/33

POMDP

Rich Interactions

Hidden, Evolving

Information Needs

A Long Term

Goal

Temporal

Dependency

actions

hidden states

rewards

Markov

property

POMDP

(Partially Observable

Markov Decision

Process)

SG (Stochastic Games)

Multi-agent

Collaboration

Page 107: Dynamic Information Retrieval Tutorial - WSDM 2015

Recap – Characteristics of

Dynamic IR

Dynamic Information Retrieval Modeling Tutorial

2015107

Rich interactions

Query formulation, Document clicks, Document

examination, eye movement, mouse movements, etc.

Temporal dependency

Overall goal

Page 108: Dynamic Information Retrieval Tutorial - WSDM 2015

Modeling Query Change

A framework that is inspired by Reinforcement

Learning

Reinforcement Learning for Markov Decision

Process

models a state space S and an action space A

according to a transition model T = P(si+1|si ,ai)

a policy π(s) = a indicates that at a state s, what are

the actions a can be taken by the agent

each state is associated with a reward function R

that indicates possible positive reward or negative

loss that a state and an action may result.

Reinforcement learning offers general solutions to

MDP and seeks for the best policy for an agent.108

Page 109: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015109

Introduction & Theory

Session Search

Dynamic RankingMulti Armed Bandits

Portfolio Ranking

Multi-Page Search

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 110: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information Retrieval Modeling Tutorial

2015110

Markov Process

Hidden Markov Model

Markov Decision Process

Partially Observable Markov Decision Process

Multi-Armed Bandit

Family of Markov Models

Page 111: Dynamic Information Retrieval Tutorial - WSDM 2015

Multi Armed Bandits (MAB)

Dynamic Information Retrieval Modeling Tutorial

2015111

……

……

Which slot

machine

should I select

in this round?

Reward

Page 112: Dynamic Information Retrieval Tutorial - WSDM 2015

Multi Armed Bandits (MAB)

Dynamic Information Retrieval Modeling Tutorial

2015112

I won! Is this

the best slot

machine?

Reward

Page 113: Dynamic Information Retrieval Tutorial - WSDM 2015

MAB Definition

Dynamic Information Retrieval Modeling Tutorial

2015113

A tuple (S, A, R, B)

S : hidden reward distribution of each

bandit

A: choose which bandit to play

R: reward for playing bandit

B: belief space, our estimate of each

bandit’s distribution

Page 114: Dynamic Information Retrieval Tutorial - WSDM 2015

Comparison with Markov Models

Dynamic Information Retrieval Modeling Tutorial

2015114

Single state Markov Decision Process

No transition probability

Similar to POMDP in that we maintain a

belief state

Action = choose a bandit, does not

affect state

Does not ‘plan ahead’ but intelligently

adapts

Somewhere between interactive and

dynamic IR

Page 115: Dynamic Information Retrieval Tutorial - WSDM 2015

MAB Policy Reward

Dynamic Information Retrieval Modeling Tutorial

2015115

MAB algorithm describes a policy 𝜋 for

choosing bandits

Maximise rewards from chosen bandits

over all time steps

Minimize regret

𝑡=1𝑇 𝑅𝑒𝑤𝑎𝑟𝑑 𝑎∗ − 𝑅𝑒𝑤𝑎𝑟𝑑(𝑎𝜋(𝑡))

Cumulative difference between optimal

reward and actual reward

Page 116: Dynamic Information Retrieval Tutorial - WSDM 2015

Exploration vs Exploitation

Dynamic Information Retrieval Modeling Tutorial

2015116

Exploration

Try out bandits to find which has highest average

reward

Exploitation

Too much exploration leads to poor performance

Play bandits that are known to pay out higher

reward on average

MAB algorithms balance exploration and

exploitation

Start by exploring more to find best bandits

Exploit more as best bandits become known

Page 117: Dynamic Information Retrieval Tutorial - WSDM 2015

MAB – Index Algorithms

Dynamic Information Retrieval Modeling Tutorial

2015117

Gittens index1

Play bandit with highest ‘Dynamic Allocation Index’

Modelled using MDP but suffers ‘curse of

dimensionality’

𝜖-greedy2

Play highest reward bandit with probability 1 − ϵ

Play random bandit with probability 𝜖

UCB (Upper Confidence Bound)3

1J. C. Gittins. ‘892Nicolò Cesa-Bianchi et. al.,

‘983P. Auer et. al., ‘02

Page 118: Dynamic Information Retrieval Tutorial - WSDM 2015

Comparison of Markov

Models

Dynamic Information Retrieval Modeling Tutorial

2015118

Markov Process – a fully observable stochastic

process

Hidden Markov Model – a partially observable

stochastic process

MDP – a fully observable decision process

MAB – a decision process, either fully or partially

observable

POMDP – a partially observable decision process

actions rewards states

Markov Process No No Observable

Hidden Markov

Model

No No Unobservable

MDP Yes Yes Observable

POMDP Yes Yes Unobservable

MAB Yes Yes Fixed

Page 119: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015119

Introduction & Theory

Session Search

Dynamic RankingMulti Armed Bandits

Portfolio Ranking

Multi-Page Search

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 120: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015120

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Page 121: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015121

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

Page 122: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015122

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

Average reward 𝑥𝑖

Page 123: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015123

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

Average reward 𝑥𝑖 Time step 𝑡

Page 124: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015124

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

Average reward 𝑥𝑖 Time step 𝑡

Number of times bandit 𝑖 has been played 𝑇𝑖

Page 125: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015125

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Calculate for all 𝑖 and select highest

Average reward 𝑥𝑖 Time step 𝑡

Number of times bandit 𝑖 has been played 𝑇𝑖 Chances of playing infrequently played bandits

increases over time

Page 126: Dynamic Information Retrieval Tutorial - WSDM 2015

Iterative Expectation

Dynamic Information Retrieval Modeling Tutorial

2015126

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

M. Sloan and J. Wang ‘13

Page 127: Dynamic Information Retrieval Tutorial - WSDM 2015

UCB Algorithm

Dynamic Information Retrieval Modeling Tutorial

2015127

𝑥𝑖 +2 ln 𝑡

𝑇𝑖

Documents 𝑖

M. Sloan and J. Wang ‘13

Page 128: Dynamic Information Retrieval Tutorial - WSDM 2015

Iterative Expectation

Dynamic Information Retrieval Modeling Tutorial

2015128

𝑟𝑖 +2 ln 𝑡

𝑇𝑖

Documents 𝑖

Average probability of relevance 𝑟𝑖

M. Sloan and J. Wang ‘13

Page 129: Dynamic Information Retrieval Tutorial - WSDM 2015

Iterative Expectation

Dynamic Information Retrieval Modeling Tutorial

2015129

𝑟𝑖 +2 ln 𝑡

𝛾𝑖(𝑡)

Documents 𝑖

Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions

𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼

𝐶𝑘𝛽1−𝐶𝑘

𝛼 and 𝛽 reward clicks and non-clicks depending on

rank

M. Sloan and J. Wang ‘13

Page 130: Dynamic Information Retrieval Tutorial - WSDM 2015

Iterative Expectation

Dynamic Information Retrieval Modeling Tutorial

2015130

𝑟𝑖 + 𝜆2 ln 𝑡

𝛾𝑖(𝑡)

Documents 𝑖

Average probability of relevance 𝑟𝑖 ‘Effective’ number of impressions

𝛾𝑖 𝑡 = 𝑘=1𝑡 𝛼

𝐶𝑘𝛽1−𝐶𝑘

𝛼 and 𝛽 reward clicks and non-clicks depending on

rank

Exploration parameter 𝜆

M. Sloan and J. Wang ‘13

Page 131: Dynamic Information Retrieval Tutorial - WSDM 2015

Portfolio Theory of IR

Dynamic Information Retrieval Modeling Tutorial

2015131

Portfolio Theory maximises expected return for a

given amount of risk1

Diversity of portfolio increases likely return

We can consider documents as ‘shares’

Documents are dependent on one another, unlike

PRP

Portfolio Theory of IR2 allows us to introduce diversity

1H. Markowitz. ‘522J. Wang et. al. ‘09

Page 132: Dynamic Information Retrieval Tutorial - WSDM 2015

Portfolio Ranking

Dynamic Information Retrieval Modeling Tutorial

2015132

Documents are dependent on each other

Co-click Matrix from users and logs1

Portfolio Armed Bandit Ranking2:

Exploratively rank using Iterative Expectation

Diversify using portfolio optimisation over co-click matrix

Update relevance and dependence with each click

Both explorative and diverse

1W. Wu et al. ‘112M. Sloan and Jun Wang‘12

Page 133: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015133

Introduction & Theory

Session Search

Dynamic RankingMulti Armed Bandits

Portfolio Ranking

Multi-Page Search

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 134: Dynamic Information Retrieval Tutorial - WSDM 2015

Multi Page Search

Dynamic Information Retrieval Modeling Tutorial

2015134

Page 1 Page 2

2.

1.

2.

1.

X Jin, M. Sloan and J. Wang

’13

Page 135: Dynamic Information Retrieval Tutorial - WSDM 2015

Multi Page Search Example -

States & Actions

Dynamic Information Retrieval Modeling Tutorial

2015135

State:

Relevanc

e of

docume

nt

Action:

Ranking

of

document

s

Observatio

n: Clicks Belief:

Multivariate

Guassian

Reward: DCG

over 2 pages

X Jin, M. Sloan and J. Wang

’13

Page 136: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015136

Page 137: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015137

𝑁 𝜃1, Σ1

𝜃1 -prior estimate of relevance

Σ1 - prior estimate of covariance

Document similarity

Topic Clustering

Page 138: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015138

Rank action for page 1

Page 139: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015139

Page 140: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015140

Feedback from page 1

𝒓 ~ 𝑁(𝜃𝒔1, Σ𝒔

1)

Page 141: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015141

Update estimates using 𝒓1

𝜃1 =𝜃\𝒔′𝜃𝒔′

Σ1 =Σ\𝒔′ Σ\s′𝒔′Σs′\𝒔′ Σ𝒔′

𝜃2 = 𝜃\𝒔′ + Σ\s′𝒔′Σ𝒔′−1(𝒓1 − 𝜃𝒔′)

Σ2 = Σ\𝒔′ - Σ\s′𝒔′Σ𝒔′−1Σs′\𝒔′

Page 142: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015142

Rank using PRP

Page 143: Dynamic Information Retrieval Tutorial - WSDM 2015

Model

Dynamic Information Retrieval Modeling Tutorial

2015143

Utility or Ranking

𝜆 𝑗=1𝑀

𝜃𝑠𝑗1

log2(𝑗+1)+ 1 − 𝜆 𝑗=1+𝑀

2𝑀𝜃𝑠𝑗2

log2(𝑗+1)

DCG

Page 144: Dynamic Information Retrieval Tutorial - WSDM 2015

Model – Bellman Equation

Dynamic Information Retrieval Modeling Tutorial

2015144

Optimize 𝒔1 to improve 𝑼𝒔2

𝑉 𝜃1, Σ1, 1 = max𝒔1

𝜆𝜃𝒔1.𝑾1 +

Page 145: Dynamic Information Retrieval Tutorial - WSDM 2015

𝜆

Dynamic Information Retrieval Modeling Tutorial

2015145

Balances exploration and exploitation in page 1

Tuned for different queries

Navigational

Informational

𝜆 = 1 for non-ambiguous search

Page 146: Dynamic Information Retrieval Tutorial - WSDM 2015

Approximation

Dynamic Information Retrieval Modeling Tutorial

2015146

Monte Carlo Sampling

≈ max𝒔1

𝜆𝜃𝒔1.𝑾1 +max

𝒔21 − 𝜆

1

𝑆 𝑟∈𝑂 𝜃𝒔

2.𝑾2𝑃 𝒓

Sequential Ranking Decision

Page 147: Dynamic Information Retrieval Tutorial - WSDM 2015

Experiment Data

Dynamic Information Retrieval Modeling Tutorial

2015147

Difficult to evaluate without access to live users

Simulated using 3 TREC collections and

relevance judgements

WT10G – Explicit Ratings

TREC8 – Clickthroughs

Robust – Difficult (ambiguous) search

Page 148: Dynamic Information Retrieval Tutorial - WSDM 2015

User Simulation

Dynamic Information Retrieval Modeling Tutorial

2015148

Rank M documents

Simulated user clicks according to relevance

judgements

Update page 2 ranking

Measure at page 1 and 2

Recall

Precision

nDCG

MRR

BM25 – prior ranking model

Page 149: Dynamic Information Retrieval Tutorial - WSDM 2015

Investigating λ

Dynamic Information Retrieval Modeling Tutorial

2015149

Page 150: Dynamic Information Retrieval Tutorial - WSDM 2015

Baselines

Dynamic Information Retrieval Modeling Tutorial

2015150

𝜆 determined experimentally

BM25

BM25 with conditional update (𝜆 = 1)

Maximum Marginal Relevance (MMR)

Diversification

MMR with conditional update

Rocchio

Relevance Feedback

Page 151: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015151

Page 152: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015152

Page 153: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015153

Page 154: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015154

Page 155: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015155

Page 156: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015156

Introduction & Theory

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 157: Dynamic Information Retrieval Tutorial - WSDM 2015

Cold-start problem in recommmender systems

Page 158: Dynamic Information Retrieval Tutorial - WSDM 2015

Interactive Recommender Systems

Page 159: Dynamic Information Retrieval Tutorial - WSDM 2015

Possible Solutions

Zhao, Xiaoxue, Weinan Zhang, and Jun

Wang. "Interactive collaborative filtering."

CIKM, 2013.

Page 160: Dynamic Information Retrieval Tutorial - WSDM 2015

Objective

Cold-start problem Interactive

mechanism for CF

Zhao, Xiaoxue, Weinan Zhang, and Jun

Wang. "Interactive collaborative filtering."

CIKM, 2013.

Page 161: Dynamic Information Retrieval Tutorial - WSDM 2015

Proposed EE algorithms

Thompson Sampling

Linear-UCB

General Linear-UCB

Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,

2013.

Page 162: Dynamic Information Retrieval Tutorial - WSDM 2015

Cold-start users

Zhao, Xiaoxue, Weinan Zhang, and Jun Wang. "Interactive collaborative filtering." CIKM,

2013.

Page 163: Dynamic Information Retrieval Tutorial - WSDM 2015

Ad selection problem

Dynamic Information Retrieval Modeling Tutorial

2015163

how online publishers could optimally select ads

to maximize their ad incomes over time?

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Selling in

multiple-

channels

with non-

fixed

prices

Page 164: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information Retrieval Modeling Tutorial

2015164

Problem formulation

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Page 165: Dynamic Information Retrieval Tutorial - WSDM 2015

Problem formulation

Dynamic Information Retrieval Modeling Tutorial

2015165

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Page 166: Dynamic Information Retrieval Tutorial - WSDM 2015

Objective function

Dynamic Information Retrieval Modeling Tutorial

2015166

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Page 167: Dynamic Information Retrieval Tutorial - WSDM 2015

Belief update

Dynamic Information Retrieval Modeling Tutorial

2015167

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Page 168: Dynamic Information Retrieval Tutorial - WSDM 2015

Results

Dynamic Information Retrieval Modeling Tutorial

2015168

Sequential selection of Correlated Ads by POMDPs Shuai Yuan, Jun Wang CIKM

2012

Page 169: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015169

Introduction & Theory

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 170: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on

Dynamic Information Retrieval Modeling

Charlie Clarke

(with much much input from Mark Smucker)

University of Waterloo, Canada

Page 171: Dynamic Information Retrieval Tutorial - WSDM 2015

Moving from static ranking to dynamic domains

• How to extend IR evaluation methodologies to

dynamic domains?

• Three key ideas:

1. Realistic models of searcher interactions

2. Measures costs to searcher in meaningful units

(e.g., time, money, …)

3. Measure benefits to searcher in meaningful units

(e.g, time, nuggets, …)

Charles Clarke, University of Waterloo 171

This talk strongly reflects my opinions (not trying to be neutral).

But I am the guest speaker

Page 172: Dynamic Information Retrieval Tutorial - WSDM 2015

Evaluating Information Access Systems

Charles Clarke, University of Waterloo 172

searching, browsing, summarization,

visualization, desktop, mobile, web,

books, images, questions, etc., and

combinations of these

Does the system work for its users?

Will this change make the system better or worse?

How do we quantify performance?

Page 173: Dynamic Information Retrieval Tutorial - WSDM 2015

Performance 101: Is this a good search result?

Charles Clarke, University of Waterloo 173

Page 174: Dynamic Information Retrieval Tutorial - WSDM 2015

How to evaluate?

Study users

Charles Clarke, University of Waterloo 174

Users in the wild:

• A/B Testing

• Result interleaving

• Clicks and dwell time

• Mouse movements

• Other implicit feedback

• …

Users in the lab:

• Time to task completion

• Think aloud protocols

• Questionnaires

• Eye tracking

• …

Page 175: Dynamic Information Retrieval Tutorial - WSDM 2015

Unfortunately user studies are

• Slow

• Expensive

• Conditions can never be exactly duplicated

(e.g., learning to rank)

Charles Clarke, University of Waterloo 175

Page 176: Dynamic Information Retrieval Tutorial - WSDM 2015

Alternative: User performance prediction

Can we predict the impact of a proposed change to an

information access system (while respecting and reflecting

differences between users)?

Can we quantify performance improvements in meaningful

units so that effect sizes can be considered in statistical

testing? Are improvements practically significant, as well as

statistically significant?

Want to predict the impact of a proposed change

automatically, based on existing user performance data,

rather than gathering new performance data.

Charles Clarke, University of Waterloo 176

The BIG goal

Page 177: Dynamic Information Retrieval Tutorial - WSDM 2015

Traditional Evaluation of Rankers

• Test collection:

– Documents

– Queries

– Relevance judgments

• Each ranker generates a ranked list of

documents for each query

• Score ranked lists using relevance judgments

and standard metrics (recall, mean average

precision, nDCG, ERR, RBP, ….).

Charles Clarke, University of Waterloo 177

Page 178: Dynamic Information Retrieval Tutorial - WSDM 2015

Charles Clarke, University of Waterloo 178

Example of a good-old-fashioned IR Metric

Relevant2.

Non-relevant1.

Non-relevant3.

Relevant5.

Non-relevant4.

Non-relevant6.

Non-relevant7.

Ranked List of

Documents

8.

Precision at

Rank N

0.00

0.50

0.33

0.25

0.40

0.33

0.29…

Average Precision is

the average of the

precision at N for each

relevant document.

Mean average

precision (MAP) is AP

averaged over the set

of queries.

AP =1

RPrec(Ri )

Ri

å

Precision at rank N is the fraction

of documents that are relevant in

the first N documents.

Page 179: Dynamic Information Retrieval Tutorial - WSDM 2015

General form of effectiveness measures

Nearly all standard effectiveness measures

have the same basic form (including nDCG,

RBP, ERR, average precision,…):

Charles Clarke, University of Waterloo 179

Normalization

Rank Gain at rank k

Discount

factor

Page 180: Dynamic Information Retrieval Tutorial - WSDM 2015

Implicit user model…

• User works down the ranked list spending

equal time on each document. Captions,

navigation, etc., have no impact.

• If they make it to rank i, they receive some

benefit (i.e., gain).

• Eventually they stop, which is reflected in the

discount (i.e., they are less likely to reach

lower ranks).

• Normalization typically maps the score into

the range [0:1]. Units may not be meaningful.

Charles Clarke, University of Waterloo 180

Page 181: Dynamic Information Retrieval Tutorial - WSDM 2015

Traditional Evaluation of Rankers

• Many effectiveness measures: precision,

recall, average precision, rank-biased

precision, discounted cumulative gain, etc.

• Widely used and accepted as standard

practice.

• But…• What does an improvement in average precision from

0.28 to 0.31 mean to users?

• Does an increase in the measure really translate to an

improved user experience?

• How will an improve in the performance of a single

component impact overall system performance?

Charles Clarke, University of Waterloo 181

Page 182: Dynamic Information Retrieval Tutorial - WSDM 2015

How to better reflect user variation and system performance?

Charles Clarke, University of Waterloo 182

Example: What’s the simplest possible user interface for search?

1) User issues a query

2) System returns material to read

i.e., system returns stuff to read, in order

(not a list of documents; more like a newspaper article)

A correspondingly simple user model, has two parameters:

1) Reading speed

2) Time spent reading

Page 183: Dynamic Information Retrieval Tutorial - WSDM 2015

Reading speed distribution (from users in the lab)

Charles Clarke, University of Waterloo 183

Empirical distribution of reading speed during an information access task,

and its fit to a log-normal distribution.

Page 184: Dynamic Information Retrieval Tutorial - WSDM 2015

Stopping time distribution (from users in the wild)

Charles Clarke, University of Waterloo 184

Empirical distribution of time spent searching during an information access

task, and its fit to a log-normal distribution.

Page 185: Dynamic Information Retrieval Tutorial - WSDM 2015

Evaluating a search result

Charles Clarke, University of Waterloo 185

1) Generate a reading speed from the distribution

2) Generate a stopping time from the distribution

3) How much useful material did the user read?

4) Repeat for many (simulated) users

As an example, we use passage retrieval runs from TREC 2006

Hard Track, which essentially assume our simple user interface.

We measure costs to searcher in terms of time spent searching.

We measure benefits to searcher in terms of “time well spent”.

Page 186: Dynamic Information Retrieval Tutorial - WSDM 2015

Useful characters read vs. Characters read

Charles Clarke, University of Waterloo 186

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 187: Dynamic Information Retrieval Tutorial - WSDM 2015

Useful characters read vs. Time spent reading

Charles Clarke, University of Waterloo 187

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 188: Dynamic Information Retrieval Tutorial - WSDM 2015

Time well spent vs. Time spent reading

Charles Clarke, University of Waterloo 188

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 189: Dynamic Information Retrieval Tutorial - WSDM 2015

Distribution of time well spent

Charles Clarke, University of Waterloo 189

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 190: Dynamic Information Retrieval Tutorial - WSDM 2015

Temporal precision vs. Time spent Reading

Charles Clarke, University of Waterloo 190

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 191: Dynamic Information Retrieval Tutorial - WSDM 2015

Distribution of temporal precision

Charles Clarke, University of Waterloo 191

Performance of run york04ha1 on TREC 2004 HARD Track topic 424

(“Bollywood”) with 10,000 simulated users.

Page 192: Dynamic Information Retrieval Tutorial - WSDM 2015

General Framework (Part I): Cumulative Gain

• Consider the performance of a system in terms

of a cost-benefit (cumulative gain) curve G(t).

– Measure costs (e.g., in terms of time spent).

– Measure benefits (e.g., in terms of time well

spent).

• A particular instance of G(t) represents a

single user (described by a set of parameters)

interacting with a system. not just a list!!!

• G(t) captures factors intrinsic to the system.

We don’t know how much time the user has to

invest, but for different levels of investment,

G(t) indicates the benefit.Charles Clarke, University of Waterloo 192

Page 193: Dynamic Information Retrieval Tutorial - WSDM 2015

General Framework (Part II): Decay

• Consider the user’s willingness to invest time in

terms of a decay curve D(t), which provides a

survival probability.

• We assume that G(t) and D(t) are independent.

(System dependent stopping probabilities are

accommodated in G(t). Details on request.)

• D(t) captures factors extrinsic to the system.

The user only has so much time they could

invest. The cannot invest more, even if they

would receive substantial additional benefit

from further interaction.

Charles Clarke, University of Waterloo 193

Page 194: Dynamic Information Retrieval Tutorial - WSDM 2015

General form of effectiveness measures (REMINDER)

Nearly all standard effectiveness measures

have the same basic form (including nDCG,

RBP, ERR, average precision,…):

Charles Clarke, University of Waterloo 194

Normalization

Rank Gain at rank k

Discount

factor

Page 195: Dynamic Information Retrieval Tutorial - WSDM 2015

General Framework (Part III): Time-biased gain

Overall system performance may be expressed

as expected cumulative gain (which also

incorporates standard effectiveness measures):

Charles Clarke, University of Waterloo 195

Normalization (== 1?)

Time Gain at time t

Decay

factor

Page 196: Dynamic Information Retrieval Tutorial - WSDM 2015

General Framework (Part IV): Multiple users

• Cumulative gain may be computed by

– Simulation (drawing a set of parameters from a

population of users).

– Measuring actual interaction on live systems.

– Combinations of measurement and simulation.

• Simulating and/or measuring multiple users

allows us to consider performance difference

across the population of users.

• Simulation provides matching pairs (the same

user on both systems) increasing our ability to

detect differences.

Charles Clarke, University of Waterloo 196

Page 197: Dynamic Information Retrieval Tutorial - WSDM 2015

General Framework

Most of the evaluation proposals in the

references can be reformulated in terms of this

general framework, including those that

address issues of:

– Novelty and diversity

– Filtering, summarization, question answering

– Session search, etc.

Charles Clarke, University of Waterloo 197

One more example from our current research…

Page 198: Dynamic Information Retrieval Tutorial - WSDM 2015

Session search example

• Two (or more) result lists, e.g., from query

reformulation, query suggestion, or switching

search engines.

• Modeling searcher interaction requires a

switch from one result to another.

• The optimal time to switch depends on the

total time available to search.

For example (with many details omitted…):

Charles Clarke, University of Waterloo 198

Page 199: Dynamic Information Retrieval Tutorial - WSDM 2015

Simulation of searchers switching between lists: A vs. B

Charles Clarke, University of Waterloo 199

User starts on list A.

If the user has less

than five minutes to

search, they should

stay on list A.

If the user has more

than five minutes to

search, they should

leave list A after 90

seconds.

But can we assume

optimal behavior when

modeling users?

Page 200: Dynamic Information Retrieval Tutorial - WSDM 2015

Simulation of searchers switching between lists: A vs. B

Charles Clarke, University of Waterloo 200

0 2 4 6 8 10

02

46

8

Switch Time (minutes)

Ave

rag

e G

ain

(re

leva

nt d

ocu

me

nts

)

10 minutes

8 minutes

6 minutes

4 minutes

2 minutes

Session Duration

Topic = 389, List A = sab05ror1, List B = uic0501

Different view of the

same simulation, with

thousands of simulated

users.

Here, benefits are

measured by number of

relevant documents

seen.

Optimal switching time

depends on session

duration.

Page 201: Dynamic Information Retrieval Tutorial - WSDM 2015

Summary

• Primary goal of IR evaluation: Predict how changes

to an IR system will impact the user experience.

• Evaluation in dynamic domains requires us to

explicitly model the system interface and the user’s

search behavior. Costs and benefits must be

measured in meaningful units (e.g., time).

• Successful IR evaluation requires measurement of

users, both “in the wild” and in the lab. These

measurements calibrate models, which make

predictions, which improve systems.

Charles Clarke, University of Waterloo 201

Page 202: Dynamic Information Retrieval Tutorial - WSDM 2015

A few key papers

• Leif Azzopardi. 2009. Usage based effectiveness measures: monitoring application

performance in information retrieval. In Proceedings of the 18th ACM conference on

Information and knowledge management (CIKM '09).

• Leif Azzopardi, Diane Kelly, and Kathy Brennan. 2013. How query cost affects search

behavior. In Proceedings of the 36th international ACM SIGIR conference on Research and

development in information retrieval (SIGIR '13).

• Feza Baskaya, Heikki Keskustalo, and Kalervo Järvelin. 2012. Time drives interaction:

simulating sessions in diverse searching environments. In Proceedings of the 35th

international ACM SIGIR conference on research and development in information retrieval

(SIGIR '12).

• Ben Carterette. 2011. System effectiveness, user models, and user utility: a conceptual

framework for investigation. In Proceedings of the 34th international ACM SIGIR

conference on research and development in Information Retrieval (SIGIR '11).

• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2011. Simulating simple user

behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international

conference on information and knowledge management (CIKM '11).

• Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. 2012. Incorporating variability in

user behavior into systems based evaluation. In Proceedings of the 21st ACM international

conference on information and knowledge management (CIKM '12).

Charles Clarke, University of Waterloo 202

Page 203: Dynamic Information Retrieval Tutorial - WSDM 2015

A few more key papers

• Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected

reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on

information and knowledge management (CIKM '09).

• Charles L.A. Clarke, Nick Craswell, Ian Soboroff, and Azin Ashkan. 2011. A comparative

analysis of cascade measures for novelty and diversity. In Proceedings of the fourth ACM

international conference on web search and data mining (WSDM '11).

• Charles L. A. Clarke and Mark D. Smucker. 2014. Time well spent. In Proceedings of the

5th information interaction in context symposium (IIiX '14).

• Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement:

evaluating ranking functions. In Proceedings of the sixth ACM international conference on

web search and data mining (WSDM '13).

• Kalervo Järvelin, Susan L. Price, Lois M. L. Delcambre, and Marianne Lykke Nielsen. 2008.

Discounted cumulated gain based evaluation of multiple-query IR sessions. In Proceedings

of the IR research, 30th European conference on Advances in information retrieval

(ECIR'08).

• Jiyun Luo, Christopher Wing, Hui Yang, and Marti Hearst. 2013. The water filling model and

the cube test: multi-dimensional evaluation for professional search. In Proceedings of the

22nd ACM international conference on information & knowledge management (CIKM '13).

Charles Clarke, University of Waterloo 203

Page 204: Dynamic Information Retrieval Tutorial - WSDM 2015

And yet more key papers

• Tetsuya Sakai and Zhicheng Dou. 2013. Summaries, ranked retrieval and sessions: a

unified framework for information access evaluation. In Proceedings of the 36th

international ACM SIGIR conference on Research and development in information retrieval

(SIGIR '13).

• Mark D. Smucker and Charles L.A. Clarke. 2012. Time-based calibration of effectiveness

measures. In Proceedings of the 35th international ACM SIGIR conference on Research

and development in information retrieval (SIGIR '12).

• Mark D. Smucker and Charles L. A. Clarke. 2012. Modeling user variance in time-biased

gain. In Proceedings of the Symposium on Human-Computer Interaction and Information

Retrieval (HCIR '12).

• Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. 2010. Expected

browsing utility for web search evaluation. In Proceedings of the 19th ACM international

conference on Information and knowledge management (CIKM '10).

• Yiming Yang and Abhimanyu Lad. 2009. Modeling expected utility of multi-session

information distillation. In Proceedings of the 2nd international conference on the theory of

information retrieval (ICTIR ’09).

• Plus many other (ask me).

Charles Clarke, University of Waterloo 204

Page 205: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic Information Retrieval EvaluationGuest talk at the WSDM 2015 tutorial on

Dynamic Information Retrieval Modeling

Charlie Clarke

University of Waterloo, Canada

Thank you!

Page 206: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015206

Introduction & Theory

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Page 207: Dynamic Information Retrieval Tutorial - WSDM 2015

Apply an MDP to an IR

Problem

Dynamic Information Retrieval Modeling Tutorial

2015207

We can model IR systems using a Markov

Decision Process

Is there a temporal component?

States – What changes with each time step?

Actions – How does your system change the

state?

Rewards – How do you measure feedback or

effectiveness in your problem at each time

step?

Transition Probability – Can you determine

this?

If not, then model free approach is more

Page 208: Dynamic Information Retrieval Tutorial - WSDM 2015

Apply an MDP to an IR Problem

- Example

Dynamic Information Retrieval Modeling Tutorial

2015208

User agent in session search

States – user’s relevance judgement

Action – new query

Reward – information gained

[Luo, Zhang, Yang SIGIR’14]

Page 209: Dynamic Information Retrieval Tutorial - WSDM 2015

Apply an MDP to an IR Problem

- Example

Dynamic Information Retrieval Modeling Tutorial

2015209

Search engine’s perspective

What if we can’t directly observe user’s

relevance judgement?

Click ≠ relevance

? ? ? ?

Page 210: Dynamic Information Retrieval Tutorial - WSDM 2015

Applying POMDP to Dynamic

IR

Dynamic Information Retrieval Modeling Tutorial

2015210

POMDP Dynamic IR

Environment Documents

Agents User, Search engine

States Queries, User’s decision making status, Relevance of

documents, etc

Actions Provide a ranking of documents, Weigh terms in the query,

Add/remove/unchange the query terms, Switch on or

switch off a search technology, Adjust parameters for a

search technology

Observations Queries, Clicks, Document lists, Snippets, Terms, etc

Rewards Evaluation measures (such as DCG, NDCG or MAP)

Clicking information

Transition matrix Given in advance or estimated from training data.

Observation

function

Problem dependent, Estimated based on sample datasets

Page 211: Dynamic Information Retrieval Tutorial - WSDM 2015

SIGIR Tutorial July 7th 2014

Grace Hui Yang

Marc Sloan

Jun Wang

Guest Speaker: Emine Yilmaz

Dynamic Information Retrieval

Modeling

Panel

Discussion

Page 212: Dynamic Information Retrieval Tutorial - WSDM 2015

Outline

Dynamic Information Retrieval Modeling Tutorial

2015212

Introduction & Theory

Session Search

Dynamic Ranking

Recommendation and Advertising

Guest Talk: Charlie Clarke

Discussion Panel

Conclusion

Page 213: Dynamic Information Retrieval Tutorial - WSDM 2015

Conclusions

Dynamic Information Retrieval Modeling Tutorial

2015213

Dynamic IR describes a new class of interactive

model

Incorporates rich feedback, temporal dependency

and is goal oriented.

Family of Markov models and Multi Armed Bandit

theory useful in building DIR models

Applicable to a range of IR problems

Useful in applications such as session search and

evaluation

Page 214: Dynamic Information Retrieval Tutorial - WSDM 2015

Dynamic IR Book

Dynamic Information Retrieval Modeling Tutorial

2015214

Published by Morgan & Claypool

‘Synthesis Lectures on Information Concepts,

Retrieval, and Services’

Due April / May 2015 (in time for SIGIR 2015)

Page 215: Dynamic Information Retrieval Tutorial - WSDM 2015

TREC 2015

Dynamic Domain Track Co-organized by Grace Hui Yang, John Frank, Ian Soboroff

Underexplored subsets of Web content Limited scope and richness of indexed content, which may not

include relevant components of the deep web

temporary pages, pages behind forms, etc.

Basic search interfaces, where there is little collaboration or history beyond independent keyword search Complex, task-based, dynamic search Temporal dependency Rich interactions Complex, evolving information needs Professional users A wide range of search strategies

215

Page 216: Dynamic Information Retrieval Tutorial - WSDM 2015

Task

An interactive, multiple runs of search

Starting point: System is given a search query

Iterate System returns a ranked list of 5 documents API returns relevance judgments go to next iteration of retrieval

until done (system decides when to stop)

The goal of the system is to find relevant information for each topic as soon as possible

One-shot ad-hoc search is included

If system decides to stop after iteration one

216

Page 217: Dynamic Information Retrieval Tutorial - WSDM 2015

domains

Domain Corpus

Illicit goods 30k forum posts from 5-10 forums (total ~300k posts)

Which users are working together to sell illicit goods?

Ebola One million tweets

300k docs from in-country web sites (mostly official sites)

Who is doing what and where?

Local Politics 300k docs from local political groups in Pacific Northwest

and British Columbia. Who is campaigning for what and

why?

217

Page 218: Dynamic Information Retrieval Tutorial - WSDM 2015

TIME Line TREC Call for Participation: January 2015

Data Available: March

Detailed Guidelines: April/May

Topics, Tasks available: June

Systems do their thing: June-July

Evaluation: August

Results to participants: September

Conference: November 2015

218

Page 219: Dynamic Information Retrieval Tutorial - WSDM 2015

TREC 2015

Total Recall Track

Co-organized by Gord Cormack, Maura Grossman, , Adam Roegiest, Charlie Clarke

Explores high recall tasks through an active learning process modeled on legal search tasks (eDiscovery, patent search). Participating system start with a topic and proposes

a relevant document.

Systems gets immediate feedback on relevance.

Continues to propose additional documents and receive feedback until stopping condition is researched.

Shared online infrastructure and collections with Dynamic Domain. Easy to participate in both, if you participate in one.

219

Page 220: Dynamic Information Retrieval Tutorial - WSDM 2015

Acknowledgment

Dynamic Information Retrieval Modeling Tutorial

2015220

We thank Prof. Charlie Clarke and for his guest

lecture

We sincerely thank Dr. Xuchu Dong for his help in

preparation of the tutorial

We also thank comments and suggestions from

the following colleagues:

Dr Filip Radlinski

Prof. Maarten de Rijke

Page 221: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015221

Static IR

Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro-Neto. Addison-Wesley, 1999.

The PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page , Sergey Brin , Rajeev Motwani , Terry Winograd. 1999

Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005

A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.

Portfolio Theory of Information Retrieval. J. Wang and J. Zhu. In SIGIR 2009

Page 222: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015222

Interactive IR

Relevance Feedback in Information Retrieval,

Rocchio, J. J., The SMART Retrieval System (pp.

313-23), 1971

A study in interface support mechanisms for

interactive information retrieval, Ryen W. White et. al,

JASIST, 2006

Visualizing stages during an exploratory search

session, Bill Kules et. al, HCIR, 2011

Dynamic Ranked Retrieval, Cristina Brandt et. al,

WSDM, 2011

Structured Learning of Two-level Dynamic Rankings,

Karthik Raman et. al, CIKM, 2011

Page 223: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015223

Dynamic IR

A hidden Markov model information retrieval system. D. R. H. Miller, T. Leek, and R. M. Schwartz. In SIGIR’99, pages 214-221.

Threshold setting and performance optimization in adaptive filtering, Stephen Robertson, JIR 2002

A large-scale study of the evolution of web pages, Dennis Fetterly et. al., WWW 2003

Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg, Thorsten Joachims. ICML, 2008.

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem, Yisong Yue et. al., ICML 2009

Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009

Page 224: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015224

Dynamic IR

Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009

A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al., WSDM 2010

A contextual-bandit approach to personalized news article recommendation. Lihong Li, Wei Chu, John Langford, Robert E. Schapire. WWW, 2010

Inferring search behaviors using partially observable markov model with duration (POMD), Yin he et. al., WSDM, 2011

No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011

Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011

Large-Scale Validation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al., TOIS 2012

Page 225: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015225

Dynamic IR

Using Control Theory for Stable and Efficient Recommender Systems. T. Jambor, J. Wang, N. Lathia. In: WWW '12, pages 11-20.

Sequential selection of correlated ads by POMDPs, Shuai Yuan et. al., CIKM 2012

Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR ’13, pages 453–462.

Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H. Yang. In SIGIR 2013.

Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J. Wang. In WWW ’13.

Interactive Collaborative Filtering. X. Zhao, W. Zhang, J. Wang. In: CIKM'2013, pages 1411-1420.

Page 226: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015226

Dynamic IR Win-win search: Dual-agent stochastic game in

session search. J. Luo, S. Zhang, and H. Yang. In SIGIR ’14.

Iterative Expectation for Multi-Period Information Retrieval. M. Sloan and J. Wang. In WSCD 2013.

Dynamical Information Retrieval Modelling: A Portfolio-Armed Bandit Machine Approach. M. Sloan and J. Wang. In WWW 2012.

Jiyun Luo, Sicong Zhang, Xuchu Dong and Hui Yang. Designing States, Actions, and Rewards for Using POMDP in Session Search. In ECIR 2015.

Sicong Zhang, Jiyun Luo, Hui Yang. A POMDP Model for Content-Free Document Re-ranking. In SIGIR 2014.

Page 227: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015227

Markov Processes

A markovian decision process. R. Bellman. Indiana

University Mathematics Journal, 6:679–684, 1957.

Dynamic Programming. R. Bellman. Princeton University

Press, Princeton, NJ, USA, first edition, 1957.

Dynamic Programming and Markov Processes. R.A.

Howard. MIT Press. 1960

Linear Programming and Sequential Decisions. Alan S.

Manne. Management Science, 1960

Statistical Inference for Probabilistic Functions of Finite

State Markov Chains. Baum, Leonard E.; Petrie, Ted. The

Annals of Mathematical Statistics 37, 1966

Page 228: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015228

Markov Processes

Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988

Computationally feasible bounds for partially observed Markov decision processes. W. Lovejoy. Operations Research 39: 162–175, 1991.

Q-Learning. Christopher J.C.H. Watkins, Peter Dayan. Machine Learning. 1992

Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996.

Reinforcement Learning: An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998.

Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra. Artificial Intelligence, 101(1-2):99–134, 1998.

Page 229: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015229

Markov Processes

Finding approximate POMDP solutions through belief compression. N. Roy. PhD Thesis Carnegie Mellon. 2003

VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 1081–1088.

Finding Approximate POMDP solutions Through Belief Compression. N. Roy, G. Gordon and S. Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005.

Probabilistic robotics. S. Thrun, W. Burgard, D. Fox. Cambridge. MIT Press. 2005

Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S. Thrun. Volume 27, pages 335-380, 2006

Probabilistic Robotics. S. Thrun, W. Burgard, D. Fox. The MIT Press, 2006.

Page 230: Dynamic Information Retrieval Tutorial - WSDM 2015

References

Dynamic Information Retrieval Modeling Tutorial

2015230

Markov Processes

The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973

Modified Policy Iteration Algorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978.

An example of statistical investigation of the text eugene oneginthe connection of samples in chains. A. A. Markov. Science in Context, 19:591–600, 12 2006.

Learning to Rank for Information Retrieval. Tie-Yan Liu. Springer Science & Business Media. 2011

Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicolò Cesa-Bianchi, Paul Fischer. ICML 100-108, 1998

Multi-armed bandit allocation indices, Wiley, J. C. Gittins. 1989

Finite-time Analysis of the Multiarmed Bandit Problem, Peter Auer et. al., Machine Learning 47, Issue 2-3. 2002.