Dynamic Information Retrieval Tutorial
of 214
/214

Author
marccsloan 
Category
Science

view
916 
download
22
Embed Size (px)
description
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and uptodate introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural followup to previous statistical IR modeling tutorials with a fresh look on stateoftheart dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics. http://www.dynamicirmodeling.org/ @inproceedings{Yang:2014:DIR:2600428.2602297, author = {Yang, Hui and Sloan, Marc and Wang, Jun}, title = {Dynamic Information Retrieval Modeling}, booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval}, series = {SIGIR '14}, year = {2014}, isbn = {9781450322577}, location = {Gold Coast, Queensland, Australia}, pages = {12901290}, numpages = {1}, url = {http://doi.acm.org/10.1145/2600428.2602297}, doi = {10.1145/2600428.2602297}, acmid = {2602297}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning}, }
Transcript of Dynamic Information Retrieval Tutorial
 SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling
 Dynamic Information Retrieval ModelingTutorial 20142
 Age of Empire Dynamic Information Retrieval ModelingTutorial 20143
 Dynamic Information Retrieval Dynamic Information Retrieval ModelingTutorial 20144 Documents to explore Information need Observed documents User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which arent, and satisfy their information need.
 Evolving IR Dynamic Information Retrieval ModelingTutorial 20145 Paradigm shifts in IR as new models emerge e.g.VSM BM25 Language Model Different ways of defining relationship between query and document Static Interactive Dynamic Evolution in modeling user interaction with search engine
 Outline Dynamic Information Retrieval ModelingTutorial 20146 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
 Conceptual Model Static IR Dynamic Information Retrieval ModelingTutorial 20147 Static IR Interactive IR Dynamic IR No feedback
 Characteristics of Static IR Dynamic Information Retrieval ModelingTutorial 20148 Does not learn directly from user Parameters updated periodically
 Static Information Retrieval Model Dynamic Information Retrieval ModelingTutorial 20149 Learning to Rank
 Dynamic Information Retrieval ModelingTutorial 201410 Commonly Used Static IR Models BM25 PageRank Language Model
 Feedback in IR Dynamic Information Retrieval ModelingTutorial 201411
 Outline Dynamic Information Retrieval ModelingTutorial 201412 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
 Conceptual Model Interactive IR Dynamic Information Retrieval ModelingTutorial 201413 Static IR Interactive IR Dynamic IR Exploit Feedback
 Interactive User Feedback Dynamic Information Retrieval ModelingTutorial 201414 Like, dislike, pause, skip
 Learn the users taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval ModelingTutorial 201415 Interactive Recommender Systems
 Example  Multi Page Search Dynamic Information Retrieval ModelingTutorial 201416 Ambiguous Query
 Example  Multi Page Search Dynamic Information Retrieval ModelingTutorial 201417 Topic: Car
 Example  Multi Page Search Dynamic Information Retrieval ModelingTutorial 201418 Topic:Animal
 Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201419 Click on car webpage
 Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201420 Click on Next Page
 Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201421 Page 2 results: Cars
 Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201422 Click on animal webpage
 Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201423 Page 2 results: Animals
 Example Dynamic Search Dynamic Information Retrieval ModelingTutorial 201424 Topic: Guitar
 Example Dynamic Search Dynamic Information Retrieval ModelingTutorial 201425 Diversified Page 1 Topics: Cars, animals, guitars
 Toy Example Dynamic Information Retrieval ModelingTutorial 201426 MultiPage search scenario User image searches for jaguar Rank two of the four results over two pages: = 0.5 = 0.51 = 0.9 = 0.49
 Toy Example Static Ranking Dynamic Information Retrieval ModelingTutorial 201427 Ranked according to PRP Page 1 Page 2 1. 2. = 0.9 = 0.51 1. 2. = 0.5 = 0.49
 Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201428 Interactive Search Improve 2nd page based on feedback from 1st page Use clicks as relevance feedback Rocchio1 algorithm on terms in image webpage = +   New query closer to relevant documents and different to nonrelevant documents 1Rocchio, J. J., 71, BaezaYates & RibeiroNeto99
 Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201429 Ranked according to PRP and Rocchio Page 1 Page 2 2. = 0.9 = 0.51 1. 2. = 0.5 = 0.49 1. * * Click
 Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201430 No click when searching for animals Page 1 Page 2 2. = 0.9 = 0.51 1. 2. 1. ? ?
 Toy Example Value Function Dynamic Information Retrieval ModelingTutorial 201431 Optimize both pages using dynamic IR Bellman equation for value function Simplified example: , = max + ( +1 +1 , +1 ) , = relevance and covariance of documents for page = clicks on page =value of ranking on page Maximize value over all pages based on estimating feedback
 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example  Covariance Dynamic Information Retrieval ModelingTutorial 201432 Covariance matrix represents similarity between images
 Toy Example Myopic Value Dynamic Information Retrieval ModelingTutorial 201433 For myopic ranking, 2 = 16.380 Page 1 2. 1.
 Toy Example Myopic Ranking Dynamic Information Retrieval ModelingTutorial 201434 Page 2 ranking stays the same regardless of clicks Page 1 Page 2 2. 1. 2. 1.
 Toy Example Optimal Value Dynamic Information Retrieval ModelingTutorial 201435 For optimal ranking, 2 = 16.528 Page 1 2. 1.
 Toy Example Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201436 If car clicked, Jaguar logo is more relevant on next page Page 1 Page 2 2. 1. 2. 1.
 Toy Example Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201437 In all other scenarios, rank animal first on next page Page 1 Page 2 2. 1. 2. 1.
 Interactive vs Dynamic IR Dynamic Information Retrieval ModelingTutorial 201438 Treats interactions independently Responds to immediate feedback Static IR used before feedback received Optimizes over all interaction Long term gains Models future user feedback Also used at beginning of interaction Interactive Dynamic
 Outline Dynamic Information Retrieval ModelingTutorial 201439 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
 Conceptual Model Dynamic IR Dynamic Information Retrieval ModelingTutorial 201440 Static IR Interactive IR Dynamic IR Explore and exploit Feedback
 Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201441 Rich interactions Query formulation Document clicks Document examination eye movement mouse movements etc.
 Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201442 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 Dn qn Cn I information need iteration 1 iteration 2 iteration n
 Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201443 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy
 Dynamic IR Dynamic Information Retrieval ModelingTutorial 201444 Dynamic IR explores actions Dynamic IR learns from user and adjusts its actions May hurt performance in a single stage, but improves over all stages
 Applications to IR Dynamic Information Retrieval ModelingTutorial 201445 Dynamics found in lots of different aspects of IR Dynamic Users Users change behaviour over time, user history Dynamic Documents Information Filtering, document content change Dynamic Queries Changing query definition i.e.Twitter Dynamic Information Needs Topic ontologies evolve over time Dynamic Relevance Seasonal/time of day change in relevance
 User Interactivity in DIR Dynamic Information Retrieval ModelingTutorial 201446 Modern IR interfaces Facets Verticals Personalization Responsive to particular user Complex log data Mobile Richer user interactions Ads Adaptive targeting
 Big Data Dynamic Information Retrieval ModelingTutorial 201447 Data set sizes are always increasing Computational footprint of learning to rank Rich, sequential data 1Yin He et. al, 11 Complex user model behaviour found in data, takes into account reading, skipping and rereading behaviours1 Uses a POMDP Example
 Online Learning to Rank Dynamic Information Retrieval ModelingTutorial 201448 Learning to rank iteratively on sequential data Clicks as implicit user feedback/preference Often uses multiarmed bandit techniques 1Katja Hofmann et. al., 11 2YisongYue et. al.,09 Uses click models to interpret clicks and a contextual bandit to improve learning1 Pairwise comparison of rankings using duelling bandits formulation2 Example
 Evaluation Dynamic Information Retrieval ModelingTutorial 201449 Use complex user interaction data to assess rankings Compare ranking techniques in online testing Minimise user dissatisfaction 1Jeff Huang et. al.,11 2Olivier Chapelle et. al.,12 Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1 Interleave search results from two ranking algorithms to determine which is better2 Example
 Filtering and News Dynamic Information Retrieval ModelingTutorial 201450 Adaptive techniques to personalize information filtering or news recommendation Understand the complex dynamics of real world events in search logs Capture temporal document change1 1Dennis Fetterly et. al.,03 2Stephen Robertson,02 3Jure Leskovec et. al.,09 Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1 Detected patterns and memes in news cycles and modeled how information spreads2 Example
 Advertising Dynamic Information Retrieval ModelingTutorial 201451 Behavioural targeting and personalized ads Learn when to display new ads Maximise profit from available ads 1ShuaiYuan et. al.,12 2ZeyuanAllen Zhu et. al.,10 Uses a POMDP and ad correlation to find the optimal ad to display to a user1 Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2 Example
 Outline Dynamic Information Retrieval ModelingTutorial 201452 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
 Outline Dynamic Information Retrieval ModelingTutorial 201453 Introduction Theory and Models Why not use supervised learning Markov Models Session Search Reranking Evaluation
 Why not use Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval ModelingTutorial 201454 Lack of enough training data Dynamic IR problems contain a sequence of dynamic interactions E.g. a series of queries in session Rare to find repeated sequences (close to zero) Even in large query logs (WSCD 2013 & 2014, query logs fromYandex) Chance of finding repeated adjacent query pairs is also low Dataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
 Our Solution Dynamic Information Retrieval ModelingTutorial 201455 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No Supervised Learning
 Trial and Error Dynamic Information Retrieval ModelingTutorial 201456 q1 "dulles hotels" q2 "dulles airport" q3 "dulles airport location" q4 "dulles metrostop"
 Dynamic Information Retrieval ModelingTutorial 201457 Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc. Temporal dependency Overall goal Recap Characteristics of Dynamic IR
 Dynamic Information Retrieval ModelingTutorial 201458 Model interactions, which means it needs to have place holders for actions; Model information need hidden behind user queries and other interactions; Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies; Represent Markov properties to handle the temporal dependency. What is a Desirable Model for Dynamic IR A model inTrial and Error setting will do! A Markov Model will do!
 Outline Dynamic Information Retrieval ModelingTutorial 201459 Introduction Theory and Models Why not use supervised learning Markov Models Session Search Reranking Evaluation
 Markov Process Markov Property1 (the memoryless property) for a system, its next state depends on its current state. Pr(Si+1Si,,S0)=Pr(Si+1Si) Markov Process a stochastic process with Markov property. e.g. Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,06 s0 s1 si si+1
 Dynamic Information Retrieval ModelingTutorial 201461 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process Multiarmed Bandit Family of Markov Models
 A Pagerank(A) Discretetime Markov process Example: Google PageRank1 Markov Chain B Pagerank(B) = 1 + () () # of pages # of outlinks pages linked to S Dynamic Information Retrieval ModelingTutorial 201462 D Pagerank(D) C Pagerank(C) E Pagerank(E) Random jump factor 1L. Page et. al.,99 The stable state distribution of such an MC is PageRank State S web page Transition probability M PageRank: how likely a random web surfer will land on a page (S, M)
 Hidden Markov Model A Markov chain that states are hidden and observable symbols are emitted with some probability according to its states1. Dynamic Information Retrieval ModelingTutorial 201463 s0 s1 s2 o0 o1 o2 p0 0 p1 p2 1 2 Si hidden state pi  transition probability oi observation ei observation probability (emission probability) 1Leonard E. Baum et. al.,66 (S, M, O, e)
 An HMM example for IR Construct an HMM for each document1 Dynamic Information Retrieval ModelingTutorial 201464 s0 s1 s2 t0 t1 t2 p0 0 p1 p2 1 2 Si Document or General English pi a0 or a1 ti query term ei Pr(tD) or Pr(tGE) P(Dq) (0 + 1 ()) Documenttoquery relevance 1Miller et. al.99 query
 MDP extends MC with actions and rewards1 si state ai action ri reward pi transition probability p0 p1 p2 Markov Decision Process Dynamic Information Retrieval ModelingTutorial 201465 s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman,57 (S, M, A, R, )
 Definition of MDP A tuple (S, M, A, R, ) S : state space M: transition matrix Ma(s, s') = P(s's, a) A: action space R: reward function R(s,a) = immediate reward taking action a at state s : discount factor, 0< 1 policy (s) = the action taken at state s Goal is to find an optimal policy * maximizing the expected total rewards. Dynamic Information Retrieval ModelingTutorial 201466
 Policy Policy: (s) = a According to which, select an action a at state s. (s0) =move right and ups0 (s1) =move right and ups1 (s2) = move rights2 Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrins ML lecture]
 Value of Policy Value:V(s) Expected longterm reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrins ML lecture]
 Value of Policy Value:V(s) Expected longterm reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) s1 R(s1) s1 s1 R(s1) R(s1) Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrins ML lecture]
 Value of Policy Value:V(s) Expected longterm reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) s1 R(s1) s1 s1 R(s1) R(s1) (s1) R(s2) s2 (s1) (s1) s2 s2 R(s2) R(s2) Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrins ML lecture]
 Computing the value of a policy Dynamic Information Retrieval ModelingTutorial 201471 V(s0) = [ 0, + 1, + 2 2, + 3 3, + ] = [ 0, + 1 (, ) =1 ] = 0, + [ 1 (, ) =1 ] = 0, + (, ) () Value function A possible next state The current state
 Optimality Bellman Equation The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) s = max , + (, )() Dynamic Information Retrieval ModelingTutorial 201472 Optimal Policy s = arg , + , () 1R. Bellman,57 statevalue function
 Optimality Bellman Equation The Bellman equation can be rewritten as = max a (, ) (, ) = , + (, )() Dynamic Information Retrieval ModelingTutorial 201473 Optimal Policy s = arg , actionvalue function Relationship betweenV and Q
 MDP algorithms Dynamic Information Retrieval ModelingTutorial 201474 Value Iteration Policy Iteration Modified Policy Iteration Prioritized Sweeping Temporal Difference (TD) Learning QLearning Model free approaches Modelbased approaches [Bellman, 57, Howard,60, Puterman and Shin,78, Singh & Sutton,96, Sutton & Barto,98, Richard Sutton,88,Watkins,92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrins ML lecture]
 Value Iteration Initialization Initialize 0 arbitrarily Loop Iteration +1 max , + (, )() s arg , + (, )() Stopping criteria s is good enough Dynamic Information Retrieval ModelingTutorial 201475 1Bellman,57
 Greedy Value Iteration Initialization Initialize 0 arbitrarily Iteration +1 max , + (, )() Stopping criteria +1 < Optimal policy s arg , + (, )() Dynamic Information Retrieval ModelingTutorial 201476 1Bellman,57
 Greedy Value Iteration 1. For each state sS Initialize V0(s) arbitrarily End for 2. 0 3. Repeat 3.1 + 1 3.2 For each max , + (, )1() end for until 1 < 4. For each s arg , + (, )() end for Algorithm Dynamic Information Retrieval ModelingTutorial 201477
 V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6 V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0} =max{7.416, 13.68} =13.68 Greedy Value Iteration s = max , + (, )() V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4 V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8 Dynamic Information Retrieval ModelingTutorial 201478 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1 a2
 Greedy Value Iteration s = max , + (, )() Dynamic Information Retrieval ModelingTutorial 201479 i V(i)(S1) V(i)(S2) V(i)(S3) 0 6 4 8 1 13.680 9.760 13.376 2 18.841 17.133 20.380 3 25.565 22.087 25.759 200 168.039 165.316 168.793 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1a2 a1 S1 S S a2 a1 a1
 Policy Iteration Initialization 0 0, 0 s Iteration (over i ) Policy Evaluation , s + (, ) () Policy Improvement +1 s arg , + (, ) () Stop criteria Policy stops changing Dynamic Information Retrieval ModelingTutorial 201480 1Howard ,60
 Policy Iteration 1.For each state sS 0, 0 s , 0 End for 2. Repeat 2.1 Repeat For each () () , s + , () End for until < 2.2 For each +1 s arg , + , () End for 2.3 + 1 Until = 1 Algorithm Dynamic Information Retrieval ModelingTutorial 201481
 Modified Policy Iteration The Policy Evaluation step in Policy Iteration is time consuming, especially when the state space is large. The Modified Policy Iteration calculates an approximated policy evaluation by running just a few iterations Dynamic Information Retrieval ModelingTutorial 201482 Modified Policy Iteration Policy Iteration GreedyValue Iterationk=1 k=
 Modified Policy Iteration 1.For each state sS 0, 0 s , 0 End for 2. Repeat 2.1 Repeat k times For each , s + , () End for 2.2 For each +1 s arg , + , () End for 2.3 + 1 Until = 1 Algorithm Dynamic Information Retrieval ModelingTutorial 201483
 MDP algorithms Dynamic Information Retrieval ModelingTutorial 201484 Value Iteration Policy Iteration Modified Policy Iteration Prioritized Sweeping Temporal Difference (TD) Learning QLearning Model free approaches Modelbased approaches [Bellman, 57, Howard,60, Puterman and Shin,78, Singh & Sutton,96, Sutton & Barto,98, Richard Sutton,88,Watkins,92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrins ML lecture]
 Temporal Difference Learning Dynamic Information Retrieval ModelingTutorial 201485 Monte Carlo Sampling can be used for modelfree policy iteration Estimate s in Policy Evaluation by the average reward of trajectories from s However, on the trajectories, some of them can be reused So, we estimate them by an expectation over next state s + + , The simplest estimation: s + + s A smoothed version: s + + s + (1 ) TDLearning rule: s + + () r is the immediate reward, is the learning rate Temporal difference Richard Sutton,88 Singh & Sutton,96 Sutton & Barto,98
 Dynamic Information Retrieval ModelingTutorial 201486 1. For each state sS Initialize V (s) arbitrarily End for 2. For each step in the state sequence 2.1 Initialize s 2.2 repeat 2.2.1 take action a at state s according to 2.2.2 observe immediate reward r and the next state 2.2.3 s + + () 2.2.4 Until s is a terminal state End for Algorithm Temporal Difference Learning
 QLearning Dynamic Information Retrieval ModelingTutorial 201487 TDLearning rule Qlearning rule , , + + max , (, ) s + + () = max a (, ) = arg (, ) , = , + (, ) max ( , )
 QLearning Dynamic Information Retrieval ModelingTutorial 201488 1. For each state sS and aA initialize Q0(s,a) arbitrarily End for 2. 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 + 1 3.2.2 select an action a at state s according to Qi1 3.2.3 take action a, observe immediate reward r and the next state 3.2.4 , 1 , + + max 1 , 1(, ) 3.2.5 Until s is a terminal state End for 4. For each s arg , End for Algorithm
 Apply an MDP to an IR Problem Dynamic Information Retrieval ModelingTutorial 201489 We can model IR systems using a Markov Decision Process Is there a temporal component? States What changes with each time step? Actions How does your system change the state? Rewards How do you measure feedback or effectiveness in your problem at each time step? Transition Probability Can you determine this? If not, then model free approach is more suitable
 Apply an MDP to an IR Problem  Example Dynamic Information Retrieval ModelingTutorial 201490 User agent in session search States users relevance judgement Action new query Reward information gained
 Apply an MDP to an IR Problem  Example Dynamic Information Retrieval ModelingTutorial 201491 Search engines perspective What if we cant directly observe users relevance judgement? Click relevance ? ? ? ?
 Dynamic Information Retrieval ModelingTutorial 201492 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process Multiarmed Bandit Family of Markov Models
 POMDP Model Dynamic Information Retrieval ModelingTutorial 201493 s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 Hidden states Observations Belief 1R. D. Smallwood et. al.,73 o1 o2 o3
 POMDP Definition Dynamic Information Retrieval ModelingTutorial 201494 A tuple (S, M,A, R, , O, , B) S : state space M: transition matrix A: action space R: reward function : discount factor, 0< 1 O: observation set an observation is a symbol emitted according to a hidden state. : observation function (s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(os,a). B: belief space Belief is a probability distribution over hidden states.
 Dynamic Information Retrieval ModelingTutorial 201495 The agent uses a state estimator to update its belief about the hidden states b = (, , ) b s = P s o , a, b = (,,) P(,) = (, , ) (, , )() (, ) POMDP Belief Update
 Dynamic Information Retrieval ModelingTutorial 201496 The Bellman equation for POMDP = max , + (, )() A POMDP can be transformed into a continuous belief MDP (B, , A, r, ) B : the continuous belief space : transition function (, )= 1 ,(, )Pr(, ) where 1 , , = 1, , , = 0, . A: action space r: reward function r(b, a)= (, ) POMDP Bellman Equation
 Dynamic Information Retrieval ModelingTutorial 201497 The optimal policy of a POMDP The optimal policy of its belief MDP 1L. Kaelbling et. al., 98 A variation of the value iteration algorithm Solving POMDPs The Witness Algorithm
 Policy Tree Dynamic Information Retrieval ModelingTutorial 201498 A policy tree of depth i is an istep nonstationary policy As if we run value iteration until the ith iteration a(h) ok(h) ok a11 a21 a2k a2l o1 ol aik a(i1)k ai1 ail o1 olok i steps to go i1 steps to go 2 steps to go 1 step to go
 Value of a Policy Tree Dynamic Information Retrieval ModelingTutorial 201499 Can only determine the value of a policy tree h from some belief state b, because it never knows the exact state. = ()() = , + (, ) (, , ) () the action at the root node of h the (i1)step subtree associated with ok under the root node of h
 Idea of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014100 For each action a, compute , the set of candidate istep policy trees with action a at their roots The optimal value function at the ith step, (b), is the upper surface of the value functions of all istep policy trees.
 Optimal value function Dynamic Information Retrieval ModelingTutorial 2014101 Geometrically, (b) is piecewise linear and convex. An example for a twostate POMDP b(s1)+b(s2)=1 Simplex constraint The belief space is onedimensional Vh2(b) Vh3(b) Vh1(b) Vh5(b) Vh4(b) = max H Pruning the Set of PolicyTrees
 Outlines of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014102 Algorithm 1.1 {} 2. i 1 3. Repeat 3.1 i i+1 3.2 For each a in A witness(i1, a) end for 3.3 Prune to get i until Vi(b) Vi1(b) < the inner loop
 Inner Loop of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014103 Inner loop of the witness algorithm 1. Select a belief b arbitrarily. Generate a best istep policy tree hi. Add i to an agenda. 2. In each iteration 2.1 Select a policy tree from the agenda. 2.2 Look for a witness point b using Za and . 2.3 If find such a witness point b, 2.3.1 Calculate the best policy tree for b. 2.3.2 Add to Za. 2.3.3 Add all the alternative trees of to the agenda. 2.4 Else remove from the agenda. 3. Repeat the above iteration until the agenda is empty.
 Other Solutions Dynamic Information Retrieval ModelingTutorial 2014104 QMDP1 MCPOMDP (Monte Carlo POMDP)2 Grid BasedApproximation3 Belief Compression4 1 Thrun et. al.,06 2 Thrun et. al.,05 3 Lovejoy,91 4 Roy,03
 Dynamic Information Retrieval ModelingTutorial 2014105 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, Users decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets Applying POMDP to Dynamic IR
 Session Search Example  States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT NonRelevant & Exploitation SNRR NonRelevant & Exploration scooter price scooter stores Hartford visitors Hartford Connecticut tourism Philadelphia NYC travel Philadelphia NYC train distance NewYork Boston maps.bing.com q0 106 [ J. Luo ,et al., 14]
 Session Search Example  Actions (Au, Ase) User Action(Au) Add query terms (+q) Remove query terms (q) keep query terms (qtheme) clicked documents SAT clicked documents Search Engine Action(Ase) increase/decrease/keep term weights, Switch on or switch off query expansion Adjust the number of top documents used in PRF etc. 107 [ J. Luo et al., 14]
 Multi Page Search Example  States & Actions Dynamic Information Retrieval ModelingTutorial 2014108 State: Relevance of document Action: Ranking of documents Observation: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages [Xiaoran Jin et. al., 13]
 SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
 Dynamic Information Retrieval ModelingTutorial 2014110 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process MultiArmed Bandit Family of Markov Models
 Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014111 Which slot machine should I select in this round? Reward
 Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014112 I won! Is this the best slot machine? Reward
 MAB Definition Dynamic Information Retrieval ModelingTutorial 2014113 A tuple (S,A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandits distribution
 Comparison with Markov Models Dynamic Information Retrieval ModelingTutorial 2014114 Single state Markov Decision Process No transition probability Similar to POMDP in that we maintain a belief state Action = choose a bandit, does not affect state Does notplan ahead but intelligently adapts Somewhere between interactive and dynamic IR
 Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014115 Markov Process 1 Markov Process 2 Markov Process k Which slot machine should I select in this round? Reward
 Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014116 Markov Process 1 Markov Process 2 Markov Process k Markov Process Action Which slot machine should I select in this round? Reward
 MAB Policy Reward Dynamic Information Retrieval ModelingTutorial 2014117 MAB algorithm describes a policy for choosing bandits Maximise rewards from chosen bandits over all time steps Minimize regret ( ()) =1 Cumulative difference between optimal reward and actual reward
 Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014118 Exploration Try out bandits to find which has highest average reward Exploitation Too much exploration leads to poor performance Play bandits that are known to pay out higher reward on average MAB algorithms balance exploration and exploitation Start by exploring more to find best bandits Exploit more as best bandits become known
 Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014119
 MAB Index Algorithms Dynamic Information Retrieval ModelingTutorial 2014120 Gittens index1 Play bandit with highestDynamic Allocation Index Modelled using MDP but sufferscurse of dimensionality greedy2 Play highest reward bandit with probability 1 Play random bandit with probability UCB (Upper Confidence Bound)3 Play bandit with highest + 2 ln Chances of playing infrequently played bandits increases over time 1J. C. Gittins.89 2Nicol CesaBianchi et. al.,98 3P.Auer et. al.,02
 MAB use in IR Dynamic Information Retrieval ModelingTutorial 2014121 Choosing ads to display to users1 Each ad is a bandit User click through rate is reward Recommending news articles2 News article is a bandit Similar to Information Filtering case Diversifying search results3 Each rank position is an MAB dependent on higher ranks Documents are bandits chosen by each rank 1Deepayan Chakrabarti et. al. ,09 2Lihong Li et. al., 10 3Radlinski et. al.,08
 MAB Variations Dynamic Information Retrieval ModelingTutorial 2014122 Contextual Bandits1 World has some context (i.e. user location) Learn policy : that maps context to arms (online or offline) Duelling Bandits2 Play two (or more) bandits at each time step Observe relative reward rather than absolute Learn order of bandits Mortal Bandits3 Value of bandits decays over time Exploitation > exploration 1Lihong Li et. al.,10 2YisongYue et. al.,09 3Deepayan Chakrabarti et. al. ,09
 Comparison of Markov Models Dynamic Information Retrieval ModelingTutorial 2014123 MC a fully observable stochastic process HMM a partially observable stochastic process MDP a fully observable decision process MAB a decision process, either fully or partially observable POMDP a partially observable decision process actions rewards states MC No No Observable HMM No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
 SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
 Outline Dynamic Information Retrieval ModelingTutorial 2014125 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
 TREC Session Tracks (20102012) Given a series of queries {q1,q2,,qn}, top 10 retrieval results {D1, Di1 } for q1 to qi1, and click information The task is to retrieve a list of documents for the current/last query, qn Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description) no need to segment the sessions 126
 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 127 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US.Where will you stay?What will you do while there? How will you get there? In a session, queries change constantly
 Query change is an important form of feedback We define query change as the syntactic editing changes between two adjacent queries: includes , added terms , removed terms The unchanged/shared terms are called: , theme term 1 iii qqq iq 128 iq iq iq themeq q1 = bollywood legislation q2 = bollywood law  ThemeTerm = bollywood Added (+q) = law Removed (q) = legislation
 Where do these query changes come from? GivenTREC Session settings, we consider two sources of query change: the previous search results that a user viewed/read/examined the information need Example: Kurosawa Kurosawa wife `wife is not in any previous results, but in the topic description However, knowing information needs before search is difficult to achieve 129
 Previous search results could influence query change in quite complex ways Merck lobbyists Merck lobbying US policy D1 contains several mentions ofpolicy, such as A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck These mentions are about Canadian policies; while the user adds US policy in q2 Our guess is that the user might be inspired bypolicy, but he/she prefers a different subconcept other than `Canadian policy Therefore, for the added terms `US policy,US is the novel term here, andpolicy is not since it appeared in D1. The two terms should be treated differently 130
 We propose to model session search as a Markov decision process (MDP) Two agents: the User and the Search Engine Dynamic Information Retrieval ModelingTutorial 2014131 Environments Search results States Queries Actions User actions: Add/remove/unchange the query terms Search Engine actions: Increase/ decrease /remain term weights Applying MDP to Session Search
 Search Engine Agents Actions Di1 action Example qtheme Y increase pocono mountain in s6 N increase france world cup 98 reaction in s28, france world cup 98 reaction stock market france world cup 98 reaction +q Y decrease policy in s37, Merck lobbyists Merck lobbyists US policy N increase US in s37, Merck lobbyists Merck lobbyists US policy q Y decrease reaction in s28, france world cup 98 reaction france world cup 98 N No change legislation in s32, bollywood legislation bollywood law 132
 Query Change retrieval Model (QCM) Bellman Equation gives the optimal value for an MDP: The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 133 V* (s) = max a R(s,a) + g P(s'  s,a) s' V* (s') a Di )D(qPmaxa),D,q(qP+d)(qP=d),Score(q 1i1i1i1iiii 1 Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevanc e score
 Calculating the Transition Model )(log)( )(log)()(log)( )(log)](1[+d)P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii 134 According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added terms Decrease weights for old added terms
 Maximizing the Reward Function Generate a maximum rewarded document denoted as d* i1, from Di1 That is the document(s) most relevant to qi1 The relevance score can be calculated as 1 1 = 1 {1 (1)} 1 1 = #(, 1)  1 From several options, we choose to only use the document with top relevance max Di1 P(qi1  Di1) 135
 Scoring the Entire Session The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn1, d) = Score(qn, d) + g[Score(qn1, d) + gScoresession (qn2, d)] = gni i=1 n Score(qi, d) 136
 Experiments TREC 20112012 query sets, datasets ClubWeb09 Category B 137
 Search Accuracy (TREC 2012) [email protected] (official metric used inTREC) Approach [email protected] %chg MAP %chg Lemur 0.2474 21.54% 0.1274 18.28% TREC12 median 0.2608 17.29% 0.1440 7.63% Our TREC12 submission 0.3021 4.19% 0.1490 4.43% TREC12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10% 0.1529 1.92% QCM+Dup 0.3368 4.56% 0.1537 1.41% 138
 Search Accuracy (TREC 2011) [email protected] (official metric used inTREC) Approach [email protected] %chg MAP %chg Lemur 0.3378 23.38% 0.1118 25.86% TREC11 median 0.3544 19.62% 0.1143 24.20% TREC11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24% 0.1713 13.59% QCM+Dup 0.4821 9.34% 0.1714 13.66% Our TREC12 submission 0.4836 9.68% 0.1724 14.32% 139
 Search Accuracy for Different Session Types TREC 2012 Sessions are classified into: Product: Factual / Intellectual Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 1.90% 0.3397 2.80% 0.2736 9.01% 0.2871 8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 2.10% 140  Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamics
 Outline Dynamic Information Retrieval ModelingTutorial 2014141 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
 Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014142
 Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014143 Page 1 Page 2 2. 1. 2. 1.
 Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014144 No UI Changes Interactivity is Hidden Private, performed in browser
 Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014145 Page 1 Diverse Ranking Maximise learning potential Exploration vs Exploitation Page 2 Clickthroughs or explicit ratings Respond to feedback from page 1 Personalized
 Model Dynamic Information Retrieval ModelingTutorial 2014146
 Model Dynamic Information Retrieval ModelingTutorial 2014147 1, 1 1 prior estimate of relevance 1  prior estimate of covariance Document similarity Topic Clustering
 Model Dynamic Information Retrieval ModelingTutorial 2014148 Rank action for page 1
 Model Dynamic Information Retrieval ModelingTutorial 2014149
 Model Dynamic Information Retrieval ModelingTutorial 2014150 Feedback from page 1 ~ ( 1 , 1 )
 Model Dynamic Information Retrieval ModelingTutorial 2014151 Update estimates using 1 1 = 1 = s s 2 = + s 1 (1 ) 2 =  s 1 s
 Model Dynamic Information Retrieval ModelingTutorial 2014152 Rank using PRP
 Model Dynamic Information Retrieval ModelingTutorial 2014153 Utility or Ranking 1 log2(+1) + 1 2 log2(+1) 2 =1+ =1 DCG
 Model Bellman Equation Dynamic Information Retrieval ModelingTutorial 2014154 Optimize 1 to improve 2 1 , 1 , 1 = max 1 1 . 1 + max 2 (1 ) 2 . 2
 Dynamic Information Retrieval ModelingTutorial 2014155 Balances exploration and exploitation in page 1 Tuned for different queries Navigational Informational = 1 for nonambiguous search
 Approximation Dynamic Information Retrieval ModelingTutorial 2014156 Monte Carlo Sampling max 1 1 . 1 + max 2 1 1 2 . 2 Sequential Ranking Decision
 Experiment Data Dynamic Information Retrieval ModelingTutorial 2014157 Difficult to evaluate without access to live users Simulated using 3TREC collections and relevance judgements WT10G Explicit Ratings TREC8 Clickthroughs Robust Difficult (ambiguous) search
 User Simulation Dynamic Information Retrieval ModelingTutorial 2014158 Rank M documents Simulated user clicks according to relevance judgements Update page 2 ranking Measure at page 1 and 2 Recall Precision nDCG MRR BM25 prior ranking model
 Investigating Dynamic Information Retrieval ModelingTutorial 2014159
 Baselines Dynamic Information Retrieval ModelingTutorial 2014160 determined experimentally BM25 BM25 with conditional update ( = 1) Maximum Marginal Relevance (MMR) Diversification MMR with conditional update Rocchio Relevance Feedback
 Results Dynamic Information Retrieval ModelingTutorial 2014161
 Results Dynamic Information Retrieval ModelingTutorial 2014162
 Results Dynamic Information Retrieval ModelingTutorial 2014163
 Results Dynamic Information Retrieval ModelingTutorial 2014164
 Results Dynamic Information Retrieval ModelingTutorial 2014165 Similar results across data sets and metrics 2nd page gain outweighs 1st page losses Outperformed Maximum Marginal Relevance using MRR to measure diversity BM25U simply no exploration case Similar results when = 5
 Results Dynamic Information Retrieval ModelingTutorial 2014166
 Outline Dynamic Information Retrieval ModelingTutorial 2014167 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
 Dynamic Information Retrieval Evaluation EmineYilmaz University College London [email protected]
 Information Retrieval Systems Match information seekers with the information they seek
 Retrieval Evaluation: Traditional View
 Retrieval Evaluation: Dynamic View
 Retrieval Evaluation: Dynamic View
 Retrieval Evaluation: Dynamic View
 Different Approaches to Evaluation Online Evaluation Design interactive experiments Use users actions to evaluate the quality Inherently dynamic in nature Offline Evaluation Controlled laboratory experiments The users interaction with the engine is only simulated Recent work focused on dynamic IR evaluation
 Online Evaluation Standard click metrics Clickthrough rate Probability user skips over results they have considered (pSkip) Most recently: Result interleaving Click/Noclick Evaluate 175
 What is result interleaving? A way to compare rankers online Given the two rankings produced by two methods Present a combination of the rankings to users Team Draft Interleaving (Radlinski et al., 2008) Interleaving two rankings Input:Two rankings (can be seen as teams who pick players) Repeat: o Toss a coin to see which team (ranking) picks next o Winner picks their best remaining player (document) o Loser picks their best remaining player (document) Output: One ranking (2 teams of 5) Credit assignment Ranking providing more of the clicked results wins
 Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries  Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There  Tips  Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org AB
 Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries  Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There  Tips  Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins!
 Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries  Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There  Tips  Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins! Repeat Over Many Different Queries!
 Offline Evaluation Controlled laboratory experiments The users interaction with the engine is only simulated Ask experts to judge each query result Predict how users behave when they search Aggregate judgments to evaluate 180
 Offline Evaluation Until recently: Metrics assume that users information need was not affected by the documents read E.g.Average Precision, NDCG, Users are more likely to stop searching when they see a highly relevant document Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior Based on devising more realistic user models EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09] 181
 Modeling User Behavior Cascadebased models black powder ammunition 1 2 3 4 5 6 7 8 9 10 The user views search results from top to bottom At each rank i, the user has a certain probability of being satisfied. Probability of satisfaction proportional to the relevance grade of the document at rank i. Once the user is satisfied with a document, he terminates the search.
 Rank Biased Precision Query Stop View Next Item black powder ammunition 1 2 3 4 5 6 7 8 9 10
 Rank Biased Precision black powder ammunition 1 2 3 4 5 6 7 8 9 10 1=i 1 =utilityTotal i irel examineddocsm.utility/NuTotalRBP )1/(1)1(=examineddocsNum. 1=i 1 i i )(1=RBP 1=i 1 i irel
 Expected Reciprocal Rank [Chapelle et al CIKM09] Query Stop Relevant? View Next Item nosomewhathighly black powder ammunition 1 2 3 4 5 6 7 8 9 10
 Expected Reciprocal Rank [Chapelle et al CIKM09] black powder ammunition 1 2 3 4 5 6 7 8 9 10 rrankatdocument"perfectthe"findingofUtility:(r) 1/r(r) )positionatstopsuser( 1 1 rP r ERR n r 1 11 )1( 1 r i ri n r RR r ERR documentitheofgraderelevance: th ig iRi g g i i docatstopofProb. 2 12 docofrelevanceofProb. max
 Paris Luxurious HotelsParis HiltonJ LoSession Evaluation
 What is a good system?
 Measuring goodness The user steps down a ranked list of documents and observes each one of them until a decision point and either a) abandons the search, or b) reformulates While stepping down or sideways, the user accumulates utility
 Evaluation over a single ranked list 1 2 3 4 5 6 7 8 9 10 kenya cooking traditional swahili kenya cooking traditional kenya swahili traditional food recipes
 Session DCG [Jrvelin et al ECIR 2008] kenya cooking traditional swahili kenya cooking traditional 2rel(r) 1 logb (r b 1)r1 k 2rel(r) 1 logb (r b 1)r1 k 1 logc (1 c 1) DCG(RL1) 1 logc (2 c 1) DCG(RL2)
 Modelbased measures Probabilistic space of users following different paths is the space of all paths P() is the prob of a user following a path in M is a measure over a path [Yang and Lad ICTIR 2009, Kanoulas et al. SIGIR 2011]
 Probability of a path Probability of abandoning at reform 2 X Probability of reformulating at rank 3 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (1) (2)
 Expected Global Utility [Yang and Lad ICTIR 2009] 1. User steps down ranked results onebyone 2. Stops browsing documents based on a stochastic process that defines a stopping probability distribution over ranks and reformulates 3. Gains something from relevant documents, accumulating utility
 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R Probability of abandoning the session at reformulation i Geometric w/ parameter preform (1)
 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R Geometricw/parameterpdown Probability of reformulating at rank j (2) Geometric w/ parameter preform
 Expected Global Utility [Yang and Lad ICTIR 2009] The probability of a user following a path : P() = P(r1, r2, ..., rK) ri is the stopping and reformulation point in list i Assumption: stopping positions in each list are independent P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK) Use geometric distribution (RBP) to model the stopping and reformulation behaviour P(ri = r) = (1)k1
 Conclusions Recent focus on evaluating the dynamic nature of the search process Interleaving New offline evaluation metrics ERR, RBU Session evaluation metrics
 Outline Dynamic Information Retrieval ModelingTutorial 2014200 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation Conclusion
 Conclusions Dynamic Information Retrieval ModelingTutorial 2014201 Dynamic IR describes a new class of interactive model Incorporates rich feedback, temporal dependency and is goal oriented. Family of Markov models and Multi Armed Bandit theory useful in building DIR models Applicable to a range of IR problems Useful in applications such as session search and evaluation
 Dynamic IR Book Dynamic Information Retrieval ModelingTutorial 2014202 Published by Morgan & Claypool Synthesis Lectures on Information Concepts, Retrieval, and Services Due March/April 2015 (in time for SIGIR 2015)
 Acknowledgment Dynamic Information Retrieval ModelingTutorial 2014203 We thank Dr. EmineYilmaz for giving us the guest speech. We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial We also thank comments and suggestions from the following colleagues: Dr. Jamie Callan Dr. Ophir Frieder Dr. Fernando Diaz Dr Filip Radlinski
 Dynamic Information Retrieval ModelingTutorial 2014204
 Thank You Dynamic Information Retrieval ModelingTutorial 2014205
 References Dynamic Information Retrieval ModelingTutorial 2014206 Static IR Modern Information Retrieval. R. BaezaYates and B. Ribeiro Neto.AddisonWesley, 1999. The PageRank Citation Ranking: Bringing Order to theWeb. Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd. 1999 Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005 A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94D(10): 18541862, 2011.
 References Dynamic Information Retrieval ModelingTutorial 2014207 Interactive IR Relevance Feedback in Information Retrieval, Rocchio, J. J.,The SMART Retrieval System (pp. 31323), 1971 A study in interface support mechanisms for interactive information retrieval, RyenW.White et. al, JASIST, 2006 Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011 Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011 Structured Learning of Twolevel Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
 References Dynamic Information Retrieval ModelingTutorial 2014208 Dynamic IR A hidden Markov model information retrieval system. D. R. H. Miller,T. Leek, and R. M. Schwartz. In SIGIR99, pages 214221. Threshold setting and performance optimization in adaptive ltering, Stephen Robertson, JIR 2002 A largescale study of the evolution of web pages, Dennis Fetterly et. al.,WWW 2003 Learning diverse rankings with multiarmed bandits. Filip Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem,YisongYue et. al., ICML 2009 Memetracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
 References Dynamic Information Retrieval ModelingTutorial 2014209 Dynamic IR Mortal multiarmed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009 A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al.,WSDM 2010 A contextualbandit approach to personalized news article recommendation. Lihong Li,Wei Chu, John Langford, Robert E. Schapire.WWW, 2010 Inferring search behaviors using partially observable markov model with duration (POMD),Yin he et. al.,WSDM, 2011 No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011 Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011 LargeScaleValidation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al.,TOIS 2012
 References Dynamic Information Retrieval ModelingTutorial 2014210 Dynamic IR Using ControlTheory for Stable and Efficient Recommender Systems.T. Jambor, J.Wang, N. Lathia. In:WWW '12, pages 1120. Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al., CIKM 2012 Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR 13, pages 453462. Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H.Yang. In SIGIR 2013. Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J.Wang. InWWW 13. Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In: CIKM'2013, pages 14111420. Winwin search: Dualagent stochastic game in session search. J. Luo, S. Zhang, and H.Yang. In SIGIR 14.
 References Dynamic Information Retrieval ModelingTutorial 2014211 Markov Processes A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679684, 1957. Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957. Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960 Linear Programming and Sequential Decisions.Alan S. Manne. Management Science, 1960 Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical Statistics 37, 1966
 References Dynamic Information Retrieval ModelingTutorial 2014212 Markov Processes Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988 Computationally feasible bounds for partially observed Markov decision processes.W. Lovejoy. Operations Research 39: 162175, 1991. QLearning. Christopher J.C.H.Watkins, Peter Dayan. Machine Learning. 1992 Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123158, 1996. Reinforcement Learning:An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998. Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1 2):99134, 1998.
 References Dynamic Information Retrieval ModelingTutorial 2014213 Markov Processes Finding approximate POMDP solutions through belief compression. N. Roy. PhDThesis Carnegie Mellon. 2003 VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS2004, pages 10811088. Finding Approximate POMDP solutionsThrough Belief Compression. N. Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research, 23:140,2005. Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT Press. 2005 Anytime PointBased Approximations for Large POMDPs. J. Pineau, G. Gordon and S.Thrun.Volume 27, pages 335380, 2006 Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press, 2006.
 References Dynamic Information Retrieval ModelingTutorial 2014214 Markov Processes The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973 Modified Policy IterationAlgorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978. An example of statistical investigation of the text eugene onegin the connection of samples in chains.A.A. Markov. Science in Context, 19:591600, 12 2006. Learning to Rank for Information Retrieval.TieYan Liu. Springer Science & Business Media. 2011 FiniteTime Regret Bounds for the Multiarmed Bandit Problem, Nicol Cesa Bianchi, Paul Fischer. ICML 100108, 1998 Multiarmed bandit allocation indices,Wiley, J. C. Gittins. 1989 Finitetime Analysis of the Multiarmed Bandit Problem, PeterAuer et. al., Machine Learning 47, Issue 23. 2002.