Dynamic Information Retrieval Tutorial

of 214 /214
SIGIR Tutorial July 7 th 2014 Grace Hui Yang Marc Sloan Jun Wang Guest Speaker: Emine Yilmaz Dynamic Information Retrieval Modeling

Embed Size (px)

description

Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Examples include large datasets containing sequential data capturing document dynamics and modern IR systems observing user dynamics through interactivity. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling. Dynamic IR Modeling is the statistical modeling of IR systems that can adapt to change. It is a natural follow-up to previous statistical IR modeling tutorials with a fresh look on state-of-the-art dynamic retrieval models and their applications including session search and online advertising. The tutorial covers techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and presents to fellow researchers and practitioners a handful of useful algorithms and tools for solving IR problems incorporating dynamics. http://www.dynamic-ir-modeling.org/ @inproceedings{Yang:2014:DIR:2600428.2602297, author = {Yang, Hui and Sloan, Marc and Wang, Jun}, title = {Dynamic Information Retrieval Modeling}, booktitle = {Proceedings of the 37th International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval}, series = {SIGIR '14}, year = {2014}, isbn = {978-1-4503-2257-7}, location = {Gold Coast, Queensland, Australia}, pages = {1290--1290}, numpages = {1}, url = {http://doi.acm.org/10.1145/2600428.2602297}, doi = {10.1145/2600428.2602297}, acmid = {2602297}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {dynamic information retrieval modeling, probabilistic relevance model, reinforcement learning}, }

Transcript of Dynamic Information Retrieval Tutorial

  • SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling
  • Dynamic Information Retrieval ModelingTutorial 20142
  • Age of Empire Dynamic Information Retrieval ModelingTutorial 20143
  • Dynamic Information Retrieval Dynamic Information Retrieval ModelingTutorial 20144 Documents to explore Information need Observed documents User Devise a strategy for helping the user explore the information space in order to learn which documents are relevant and which arent, and satisfy their information need.
  • Evolving IR Dynamic Information Retrieval ModelingTutorial 20145 Paradigm shifts in IR as new models emerge e.g.VSM BM25 Language Model Different ways of defining relationship between query and document Static Interactive Dynamic Evolution in modeling user interaction with search engine
  • Outline Dynamic Information Retrieval ModelingTutorial 20146 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Conceptual Model Static IR Dynamic Information Retrieval ModelingTutorial 20147 Static IR Interactive IR Dynamic IR No feedback
  • Characteristics of Static IR Dynamic Information Retrieval ModelingTutorial 20148 Does not learn directly from user Parameters updated periodically
  • Static Information Retrieval Model Dynamic Information Retrieval ModelingTutorial 20149 Learning to Rank
  • Dynamic Information Retrieval ModelingTutorial 201410 Commonly Used Static IR Models BM25 PageRank Language Model
  • Feedback in IR Dynamic Information Retrieval ModelingTutorial 201411
  • Outline Dynamic Information Retrieval ModelingTutorial 201412 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Conceptual Model Interactive IR Dynamic Information Retrieval ModelingTutorial 201413 Static IR Interactive IR Dynamic IR Exploit Feedback
  • Interactive User Feedback Dynamic Information Retrieval ModelingTutorial 201414 Like, dislike, pause, skip
  • Learn the users taste interactively! At the same time, provide good recommendations! Dynamic Information Retrieval ModelingTutorial 201415 Interactive Recommender Systems
  • Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201416 Ambiguous Query
  • Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201417 Topic: Car
  • Example - Multi Page Search Dynamic Information Retrieval ModelingTutorial 201418 Topic:Animal
  • Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201419 Click on car webpage
  • Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201420 Click on Next Page
  • Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201421 Page 2 results: Cars
  • Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201422 Click on animal webpage
  • Example Interactive Search Dynamic Information Retrieval ModelingTutorial 201423 Page 2 results: Animals
  • Example Dynamic Search Dynamic Information Retrieval ModelingTutorial 201424 Topic: Guitar
  • Example Dynamic Search Dynamic Information Retrieval ModelingTutorial 201425 Diversified Page 1 Topics: Cars, animals, guitars
  • Toy Example Dynamic Information Retrieval ModelingTutorial 201426 Multi-Page search scenario User image searches for jaguar Rank two of the four results over two pages: = 0.5 = 0.51 = 0.9 = 0.49
  • Toy Example Static Ranking Dynamic Information Retrieval ModelingTutorial 201427 Ranked according to PRP Page 1 Page 2 1. 2. = 0.9 = 0.51 1. 2. = 0.5 = 0.49
  • Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201428 Interactive Search Improve 2nd page based on feedback from 1st page Use clicks as relevance feedback Rocchio1 algorithm on terms in image webpage = + | | New query closer to relevant documents and different to non-relevant documents 1Rocchio, J. J., 71, Baeza-Yates & Ribeiro-Neto99
  • Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201429 Ranked according to PRP and Rocchio Page 1 Page 2 2. = 0.9 = 0.51 1. 2. = 0.5 = 0.49 1. * * Click
  • Toy Example Relevance Feedback Dynamic Information Retrieval ModelingTutorial 201430 No click when searching for animals Page 1 Page 2 2. = 0.9 = 0.51 1. 2. 1. ? ?
  • Toy Example Value Function Dynamic Information Retrieval ModelingTutorial 201431 Optimize both pages using dynamic IR Bellman equation for value function Simplified example: , = max + ( +1 +1 , +1 ) , = relevance and covariance of documents for page = clicks on page =value of ranking on page Maximize value over all pages based on estimating feedback
  • 1 0.8 0.1 0 0.8 1 0.1 0 0.1 0.1 1 0.95 0 0 0.95 1 Toy Example - Covariance Dynamic Information Retrieval ModelingTutorial 201432 Covariance matrix represents similarity between images
  • Toy Example Myopic Value Dynamic Information Retrieval ModelingTutorial 201433 For myopic ranking, 2 = 16.380 Page 1 2. 1.
  • Toy Example Myopic Ranking Dynamic Information Retrieval ModelingTutorial 201434 Page 2 ranking stays the same regardless of clicks Page 1 Page 2 2. 1. 2. 1.
  • Toy Example Optimal Value Dynamic Information Retrieval ModelingTutorial 201435 For optimal ranking, 2 = 16.528 Page 1 2. 1.
  • Toy Example Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201436 If car clicked, Jaguar logo is more relevant on next page Page 1 Page 2 2. 1. 2. 1.
  • Toy Example Optimal Ranking Dynamic Information Retrieval ModelingTutorial 201437 In all other scenarios, rank animal first on next page Page 1 Page 2 2. 1. 2. 1.
  • Interactive vs Dynamic IR Dynamic Information Retrieval ModelingTutorial 201438 Treats interactions independently Responds to immediate feedback Static IR used before feedback received Optimizes over all interaction Long term gains Models future user feedback Also used at beginning of interaction Interactive Dynamic
  • Outline Dynamic Information Retrieval ModelingTutorial 201439 Introduction Static IR Interactive IR Dynamic IR Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Conceptual Model Dynamic IR Dynamic Information Retrieval ModelingTutorial 201440 Static IR Interactive IR Dynamic IR Explore and exploit Feedback
  • Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201441 Rich interactions Query formulation Document clicks Document examination eye movement mouse movements etc.
  • Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201442 Temporal dependency clicked documentsquery D1 ranked documents q1 C1 D2 q2 C2 Dn qn Cn I information need iteration 1 iteration 2 iteration n
  • Characteristics of Dynamic IR Dynamic Information Retrieval ModelingTutorial 201443 Overall goal Optimize over all iterations for goal IR metric or user satisfaction Optimal policy
  • Dynamic IR Dynamic Information Retrieval ModelingTutorial 201444 Dynamic IR explores actions Dynamic IR learns from user and adjusts its actions May hurt performance in a single stage, but improves over all stages
  • Applications to IR Dynamic Information Retrieval ModelingTutorial 201445 Dynamics found in lots of different aspects of IR Dynamic Users Users change behaviour over time, user history Dynamic Documents Information Filtering, document content change Dynamic Queries Changing query definition i.e.Twitter Dynamic Information Needs Topic ontologies evolve over time Dynamic Relevance Seasonal/time of day change in relevance
  • User Interactivity in DIR Dynamic Information Retrieval ModelingTutorial 201446 Modern IR interfaces Facets Verticals Personalization Responsive to particular user Complex log data Mobile Richer user interactions Ads Adaptive targeting
  • Big Data Dynamic Information Retrieval ModelingTutorial 201447 Data set sizes are always increasing Computational footprint of learning to rank Rich, sequential data 1Yin He et. al, 11 Complex user model behaviour found in data, takes into account reading, skipping and re-reading behaviours1 Uses a POMDP Example
  • Online Learning to Rank Dynamic Information Retrieval ModelingTutorial 201448 Learning to rank iteratively on sequential data Clicks as implicit user feedback/preference Often uses multi-armed bandit techniques 1Katja Hofmann et. al., 11 2YisongYue et. al.,09 Uses click models to interpret clicks and a contextual bandit to improve learning1 Pairwise comparison of rankings using duelling bandits formulation2 Example
  • Evaluation Dynamic Information Retrieval ModelingTutorial 201449 Use complex user interaction data to assess rankings Compare ranking techniques in online testing Minimise user dissatisfaction 1Jeff Huang et. al.,11 2Olivier Chapelle et. al.,12 Modelled cursor activity and correlated with eye tracking to validate good or bad abandonment1 Interleave search results from two ranking algorithms to determine which is better2 Example
  • Filtering and News Dynamic Information Retrieval ModelingTutorial 201450 Adaptive techniques to personalize information filtering or news recommendation Understand the complex dynamics of real world events in search logs Capture temporal document change1 1Dennis Fetterly et. al.,03 2Stephen Robertson,02 3Jure Leskovec et. al.,09 Uses relevance feedback to adapt threshold sensitivity over time in information filtering to maximise overal utility1 Detected patterns and memes in news cycles and modeled how information spreads2 Example
  • Advertising Dynamic Information Retrieval ModelingTutorial 201451 Behavioural targeting and personalized ads Learn when to display new ads Maximise profit from available ads 1ShuaiYuan et. al.,12 2ZeyuanAllen Zhu et. al.,10 Uses a POMDP and ad correlation to find the optimal ad to display to a user1 Dynamic click model that can interpret complex user behaviour in logs and apply results to tail queries and unseen ads2 Example
  • Outline Dynamic Information Retrieval ModelingTutorial 201452 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Outline Dynamic Information Retrieval ModelingTutorial 201453 Introduction Theory and Models Why not use supervised learning Markov Models Session Search Reranking Evaluation
  • Why not use Supervised Learning for Dynamic IR Modeling? Dynamic Information Retrieval ModelingTutorial 201454 Lack of enough training data Dynamic IR problems contain a sequence of dynamic interactions E.g. a series of queries in session Rare to find repeated sequences (close to zero) Even in large query logs (WSCD 2013 & 2014, query logs fromYandex) Chance of finding repeated adjacent query pairs is also low Dataset Repeated Adjacent Query Pairs Total Adjacent Query Pairs Repeated Percentage WSCD 2013 476,390 17,784,583 2.68% WSCD 2014 1,959,440 35,376,008 5.54%
  • Our Solution Dynamic Information Retrieval ModelingTutorial 201455 Try to find an optimal solution through a sequence of dynamic interactions Trial and Error: learn from repeated, varied attempts which are continued until success No Supervised Learning
  • Trial and Error Dynamic Information Retrieval ModelingTutorial 201456 q1 "dulles hotels" q2 "dulles airport" q3 "dulles airport location" q4 "dulles metrostop"
  • Dynamic Information Retrieval ModelingTutorial 201457 Rich interactions Query formulation, Document clicks, Document examination, eye movement, mouse movements, etc. Temporal dependency Overall goal Recap Characteristics of Dynamic IR
  • Dynamic Information Retrieval ModelingTutorial 201458 Model interactions, which means it needs to have place holders for actions; Model information need hidden behind user queries and other interactions; Set up a reward mechanism to guide the entire search algorithm to adjust its retrieval strategies; Represent Markov properties to handle the temporal dependency. What is a Desirable Model for Dynamic IR A model inTrial and Error setting will do! A Markov Model will do!
  • Outline Dynamic Information Retrieval ModelingTutorial 201459 Introduction Theory and Models Why not use supervised learning Markov Models Session Search Reranking Evaluation
  • Markov Process Markov Property1 (the memoryless property) for a system, its next state depends on its current state. Pr(Si+1|Si,,S0)=Pr(Si+1|Si) Markov Process a stochastic process with Markov property. e.g. Dynamic Information Retrieval ModelingTutorial 201460 1A.A. Markov,06 s0 s1 si si+1
  • Dynamic Information Retrieval ModelingTutorial 201461 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process Multi-armed Bandit Family of Markov Models
  • A Pagerank(A) Discrete-time Markov process Example: Google PageRank1 Markov Chain B Pagerank(B) = 1 + () () # of pages # of outlinks pages linked to S Dynamic Information Retrieval ModelingTutorial 201462 D Pagerank(D) C Pagerank(C) E Pagerank(E) Random jump factor 1L. Page et. al.,99 The stable state distribution of such an MC is PageRank State S web page Transition probability M PageRank: how likely a random web surfer will land on a page (S, M)
  • Hidden Markov Model A Markov chain that states are hidden and observable symbols are emitted with some probability according to its states1. Dynamic Information Retrieval ModelingTutorial 201463 s0 s1 s2 o0 o1 o2 p0 0 p1 p2 1 2 Si hidden state pi -- transition probability oi --observation ei --observation probability (emission probability) 1Leonard E. Baum et. al.,66 (S, M, O, e)
  • An HMM example for IR Construct an HMM for each document1 Dynamic Information Retrieval ModelingTutorial 201464 s0 s1 s2 t0 t1 t2 p0 0 p1 p2 1 2 Si Document or General English pi a0 or a1 ti query term ei Pr(t|D) or Pr(t|GE) P(D|q) (0 + 1 (|)) Document-to-query relevance 1Miller et. al.99 query
  • MDP extends MC with actions and rewards1 si state ai action ri reward pi transition probability p0 p1 p2 Markov Decision Process Dynamic Information Retrieval ModelingTutorial 201465 s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 1R. Bellman,57 (S, M, A, R, )
  • Definition of MDP A tuple (S, M, A, R, ) S : state space M: transition matrix Ma(s, s') = P(s'|s, a) A: action space R: reward function R(s,a) = immediate reward taking action a at state s : discount factor, 0< 1 policy (s) = the action taken at state s Goal is to find an optimal policy * maximizing the expected total rewards. Dynamic Information Retrieval ModelingTutorial 201466
  • Policy Policy: (s) = a According to which, select an action a at state s. (s0) =move right and ups0 (s1) =move right and ups1 (s2) = move rights2 Dynamic Information Retrieval ModelingTutorial 201467 [Slide altered from Carlos Guestrins ML lecture]
  • Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) Dynamic Information Retrieval ModelingTutorial 201468 [Slide altered from Carlos Guestrins ML lecture]
  • Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) s1 R(s1) s1 s1 R(s1) R(s1) Dynamic Information Retrieval ModelingTutorial 201469 [Slide altered from Carlos Guestrins ML lecture]
  • Value of Policy Value:V(s) Expected long-term reward starting from s Start from s0 s0 R(s0) (s0) V(s0) = E[R(s0) + R(s1) + 2 R(s2) + 3 R(s3) + 4 R(s4) + ] Future rewards discounted by [0,1) s1 R(s1) s1 s1 R(s1) R(s1) (s1) R(s2) s2 (s1) (s1) s2 s2 R(s2) R(s2) Dynamic Information Retrieval ModelingTutorial 201470 [Slide altered from Carlos Guestrins ML lecture]
  • Computing the value of a policy Dynamic Information Retrieval ModelingTutorial 201471 V(s0) = [ 0, + 1, + 2 2, + 3 3, + ] = [ 0, + 1 (, ) =1 ] = 0, + [ 1 (, ) =1 ] = 0, + (, ) () Value function A possible next state The current state
  • Optimality Bellman Equation The Bellman equation1 to MDP is a recursive definition of the optimal value function V*(.) s = max , + (, )() Dynamic Information Retrieval ModelingTutorial 201472 Optimal Policy s = arg , + , () 1R. Bellman,57 state-value function
  • Optimality Bellman Equation The Bellman equation can be rewritten as = max a (, ) (, ) = , + (, )() Dynamic Information Retrieval ModelingTutorial 201473 Optimal Policy s = arg , action-value function Relationship betweenV and Q
  • MDP algorithms Dynamic Information Retrieval ModelingTutorial 201474 Value Iteration Policy Iteration Modified Policy Iteration Prioritized Sweeping Temporal Difference (TD) Learning Q-Learning Model free approaches Model-based approaches [Bellman, 57, Howard,60, Puterman and Shin,78, Singh & Sutton,96, Sutton & Barto,98, Richard Sutton,88,Watkins,92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrins ML lecture]
  • Value Iteration Initialization Initialize 0 arbitrarily Loop Iteration +1 max , + (, )() s arg , + (, )() Stopping criteria s is good enough Dynamic Information Retrieval ModelingTutorial 201475 1Bellman,57
  • Greedy Value Iteration Initialization Initialize 0 arbitrarily Iteration +1 max , + (, )() Stopping criteria +1 < Optimal policy s arg , + (, )() Dynamic Information Retrieval ModelingTutorial 201476 1Bellman,57
  • Greedy Value Iteration 1. For each state sS Initialize V0(s) arbitrarily End for 2. 0 3. Repeat 3.1 + 1 3.2 For each max , + (, )1() end for until 1 < 4. For each s arg , + (, )() end for Algorithm Dynamic Information Retrieval ModelingTutorial 201477
  • V(0)(S1)=max{R(S1,a1), R(S1,a2)}=6 V(1)(S1)=max{ 3+0.96*(0.3*6+0.7*4), 6+0.96*(1.0*8) } =max{3+0.96*4.6, 6+0.96*8.0} =max{7.416, 13.68} =13.68 Greedy Value Iteration s = max , + (, )() V(0)(S2)=max{R(S2,a1), R(S2,a2)}=4 V(0)(S3)=max{R(S3,a1), R(S3,a2)}=8 Dynamic Information Retrieval ModelingTutorial 201478 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1 a2
  • Greedy Value Iteration s = max , + (, )() Dynamic Information Retrieval ModelingTutorial 201479 i V(i)(S1) V(i)(S2) V(i)(S3) 0 6 4 8 1 13.680 9.760 13.376 2 18.841 17.133 20.380 3 25.565 22.087 25.759 200 168.039 165.316 168.793 Ma1 = 0.3 0.7 0 1.0 0 0 0.8 0.2 0 Ma2 = 0 0 1.0 0 0.2 0.8 0 1.0 0 a1a2 a1 S1 S S a2 a1 a1
  • Policy Iteration Initialization 0 0, 0 s Iteration (over i ) Policy Evaluation , s + (, ) () Policy Improvement +1 s arg , + (, ) () Stop criteria Policy stops changing Dynamic Information Retrieval ModelingTutorial 201480 1Howard ,60
  • Policy Iteration 1.For each state sS 0, 0 s , 0 End for 2. Repeat 2.1 Repeat For each () () , s + , () End for until < 2.2 For each +1 s arg , + , () End for 2.3 + 1 Until = 1 Algorithm Dynamic Information Retrieval ModelingTutorial 201481
  • Modified Policy Iteration The Policy Evaluation step in Policy Iteration is time- consuming, especially when the state space is large. The Modified Policy Iteration calculates an approximated policy evaluation by running just a few iterations Dynamic Information Retrieval ModelingTutorial 201482 Modified Policy Iteration Policy Iteration GreedyValue Iterationk=1 k=
  • Modified Policy Iteration 1.For each state sS 0, 0 s , 0 End for 2. Repeat 2.1 Repeat k times For each , s + , () End for 2.2 For each +1 s arg , + , () End for 2.3 + 1 Until = 1 Algorithm Dynamic Information Retrieval ModelingTutorial 201483
  • MDP algorithms Dynamic Information Retrieval ModelingTutorial 201484 Value Iteration Policy Iteration Modified Policy Iteration Prioritized Sweeping Temporal Difference (TD) Learning Q-Learning Model free approaches Model-based approaches [Bellman, 57, Howard,60, Puterman and Shin,78, Singh & Sutton,96, Sutton & Barto,98, Richard Sutton,88,Watkins,92] Solve Bellman equation Optimal valueV*(s) Optimal policy *(s) [Slide altered from Carlos Guestrins ML lecture]
  • Temporal Difference Learning Dynamic Information Retrieval ModelingTutorial 201485 Monte Carlo Sampling can be used for model-free policy iteration Estimate s in Policy Evaluation by the average reward of trajectories from s However, on the trajectories, some of them can be reused So, we estimate them by an expectation over next state s + + |, The simplest estimation: s + + s A smoothed version: s + + s + (1 ) TD-Learning rule: s + + () r is the immediate reward, is the learning rate Temporal difference Richard Sutton,88 Singh & Sutton,96 Sutton & Barto,98
  • Dynamic Information Retrieval ModelingTutorial 201486 1. For each state sS Initialize V (s) arbitrarily End for 2. For each step in the state sequence 2.1 Initialize s 2.2 repeat 2.2.1 take action a at state s according to 2.2.2 observe immediate reward r and the next state 2.2.3 s + + () 2.2.4 Until s is a terminal state End for Algorithm Temporal Difference Learning
  • Q-Learning Dynamic Information Retrieval ModelingTutorial 201487 TD-Learning rule Q-learning rule , , + + max , (, ) s + + () = max a (, ) = arg (, ) , = , + (, ) max ( , )
  • Q-Learning Dynamic Information Retrieval ModelingTutorial 201488 1. For each state sS and aA initialize Q0(s,a) arbitrarily End for 2. 0 3. For each step in the state sequence 3.1 Initialize s 3.2 Repeat 3.2.1 + 1 3.2.2 select an action a at state s according to Qi-1 3.2.3 take action a, observe immediate reward r and the next state 3.2.4 , 1 , + + max 1 , 1(, ) 3.2.5 Until s is a terminal state End for 4. For each s arg , End for Algorithm
  • Apply an MDP to an IR Problem Dynamic Information Retrieval ModelingTutorial 201489 We can model IR systems using a Markov Decision Process Is there a temporal component? States What changes with each time step? Actions How does your system change the state? Rewards How do you measure feedback or effectiveness in your problem at each time step? Transition Probability Can you determine this? If not, then model free approach is more suitable
  • Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201490 User agent in session search States users relevance judgement Action new query Reward information gained
  • Apply an MDP to an IR Problem - Example Dynamic Information Retrieval ModelingTutorial 201491 Search engines perspective What if we cant directly observe users relevance judgement? Click relevance ? ? ? ?
  • Dynamic Information Retrieval ModelingTutorial 201492 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process Multi-armed Bandit Family of Markov Models
  • POMDP Model Dynamic Information Retrieval ModelingTutorial 201493 s0 s1 r0 a0 s2 r1 a1 s3 r2 a2 Hidden states Observations Belief 1R. D. Smallwood et. al.,73 o1 o2 o3
  • POMDP Definition Dynamic Information Retrieval ModelingTutorial 201494 A tuple (S, M,A, R, , O, , B) S : state space M: transition matrix A: action space R: reward function : discount factor, 0< 1 O: observation set an observation is a symbol emitted according to a hidden state. : observation function (s,a,o) is the probability that o is observed when the system transitions into state s after taking action a, i.e. P(o|s,a). B: belief space Belief is a probability distribution over hidden states.
  • Dynamic Information Retrieval ModelingTutorial 201495 The agent uses a state estimator to update its belief about the hidden states b = (, , ) b s = P s o , a, b = (,|,) P(|,) = (, , ) (, , )() (|, ) POMDP Belief Update
  • Dynamic Information Retrieval ModelingTutorial 201496 The Bellman equation for POMDP = max , + (|, )() A POMDP can be transformed into a continuous belief MDP (B, , A, r, ) B : the continuous belief space : transition function (, )= 1 ,(, )Pr(|, ) where 1 , , = 1, , , = 0, . A: action space r: reward function r(b, a)= (, ) POMDP Bellman Equation
  • Dynamic Information Retrieval ModelingTutorial 201497 The optimal policy of a POMDP The optimal policy of its belief MDP 1L. Kaelbling et. al., 98 A variation of the value iteration algorithm Solving POMDPs The Witness Algorithm
  • Policy Tree Dynamic Information Retrieval ModelingTutorial 201498 A policy tree of depth i is an i-step non-stationary policy As if we run value iteration until the ith iteration a(h) ok(h) ok a11 a21 a2k a2l o1 ol aik a(i-1)k ai1 ail o1 olok i steps to go i-1 steps to go 2 steps to go 1 step to go
  • Value of a Policy Tree Dynamic Information Retrieval ModelingTutorial 201499 Can only determine the value of a policy tree h from some belief state b, because it never knows the exact state. = ()() = , + (, ) (, , ) () the action at the root node of h the (i-1)-step subtree associated with ok under the root node of h
  • Idea of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014100 For each action a, compute , the set of candidate i-step policy trees with action a at their roots The optimal value function at the ith step, (b), is the upper surface of the value functions of all i-step policy trees.
  • Optimal value function Dynamic Information Retrieval ModelingTutorial 2014101 Geometrically, (b) is piecewise linear and convex. An example for a two-state POMDP b(s1)+b(s2)=1 Simplex constraint The belief space is one-dimensional Vh2(b) Vh3(b) Vh1(b) Vh5(b) Vh4(b) = max H Pruning the Set of PolicyTrees
  • Outlines of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014102 Algorithm 1.1 {} 2. i 1 3. Repeat 3.1 i i+1 3.2 For each a in A witness(i1, a) end for 3.3 Prune to get i until |Vi(b) Vi1(b)| < the inner loop
  • Inner Loop of the Witness Algorithm Dynamic Information Retrieval ModelingTutorial 2014103 Inner loop of the witness algorithm 1. Select a belief b arbitrarily. Generate a best i-step policy tree hi. Add i to an agenda. 2. In each iteration 2.1 Select a policy tree from the agenda. 2.2 Look for a witness point b using Za and . 2.3 If find such a witness point b, 2.3.1 Calculate the best policy tree for b. 2.3.2 Add to Za. 2.3.3 Add all the alternative trees of to the agenda. 2.4 Else remove from the agenda. 3. Repeat the above iteration until the agenda is empty.
  • Other Solutions Dynamic Information Retrieval ModelingTutorial 2014104 QMDP1 MC-POMDP (Monte Carlo POMDP)2 Grid BasedApproximation3 Belief Compression4 1 Thrun et. al.,06 2 Thrun et. al.,05 3 Lovejoy,91 4 Roy,03
  • Dynamic Information Retrieval ModelingTutorial 2014105 POMDP Dynamic IR Environment Documents Agents User, Search engine States Queries, Users decision making status, Relevance of documents, etc Actions Provide a ranking of documents, Weigh terms in the query, Add/remove/unchange the query terms, Switch on or switch off a search technology, Adjust parameters for a search technology Observations Queries, Clicks, Document lists, Snippets, Terms, etc Rewards Evaluation measures (such as DCG, NDCG or MAP) Clicking information Transition matrix Given in advance or estimated from training data. Observation function Problem dependent, Estimated based on sample datasets Applying POMDP to Dynamic IR
  • Session Search Example - States SRT Relevant & Exploitation SRR Relevant & Exploration SNRT Non-Relevant & Exploitation SNRR Non-Relevant & Exploration scooter price scooter stores Hartford visitors Hartford Connecticut tourism Philadelphia NYC travel Philadelphia NYC train distance NewYork Boston maps.bing.com q0 106 [ J. Luo ,et al., 14]
  • Session Search Example - Actions (Au, Ase) User Action(Au) Add query terms (+q) Remove query terms (-q) keep query terms (qtheme) clicked documents SAT clicked documents Search Engine Action(Ase) increase/decrease/keep term weights, Switch on or switch off query expansion Adjust the number of top documents used in PRF etc. 107 [ J. Luo et al., 14]
  • Multi Page Search Example - States & Actions Dynamic Information Retrieval ModelingTutorial 2014108 State: Relevance of document Action: Ranking of documents Observation: Clicks Belief: Multivariate Guassian Reward: DCG over 2 pages [Xiaoran Jin et. al., 13]
  • SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  • Dynamic Information Retrieval ModelingTutorial 2014110 Markov Chain Hidden Markov Model Markov Decision Process Partially Observable Markov Decision Process Multi-Armed Bandit Family of Markov Models
  • Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014111 Which slot machine should I select in this round? Reward
  • Multi Armed Bandits (MAB) Dynamic Information Retrieval ModelingTutorial 2014112 I won! Is this the best slot machine? Reward
  • MAB Definition Dynamic Information Retrieval ModelingTutorial 2014113 A tuple (S,A, R, B) S : hidden reward distribution of each bandit A: choose which bandit to play R: reward for playing bandit B: belief space, our estimate of each bandits distribution
  • Comparison with Markov Models Dynamic Information Retrieval ModelingTutorial 2014114 Single state Markov Decision Process No transition probability Similar to POMDP in that we maintain a belief state Action = choose a bandit, does not affect state Does notplan ahead but intelligently adapts Somewhere between interactive and dynamic IR
  • Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014115 Markov Process 1 Markov Process 2 Markov Process k Which slot machine should I select in this round? Reward
  • Markov Multi Armed Bandits Dynamic Information Retrieval ModelingTutorial 2014116 Markov Process 1 Markov Process 2 Markov Process k Markov Process Action Which slot machine should I select in this round? Reward
  • MAB Policy Reward Dynamic Information Retrieval ModelingTutorial 2014117 MAB algorithm describes a policy for choosing bandits Maximise rewards from chosen bandits over all time steps Minimize regret ( ()) =1 Cumulative difference between optimal reward and actual reward
  • Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014118 Exploration Try out bandits to find which has highest average reward Exploitation Too much exploration leads to poor performance Play bandits that are known to pay out higher reward on average MAB algorithms balance exploration and exploitation Start by exploring more to find best bandits Exploit more as best bandits become known
  • Exploration vs Exploitation Dynamic Information Retrieval ModelingTutorial 2014119
  • MAB Index Algorithms Dynamic Information Retrieval ModelingTutorial 2014120 Gittens index1 Play bandit with highestDynamic Allocation Index Modelled using MDP but sufferscurse of dimensionality -greedy2 Play highest reward bandit with probability 1 Play random bandit with probability UCB (Upper Confidence Bound)3 Play bandit with highest + 2 ln Chances of playing infrequently played bandits increases over time 1J. C. Gittins.89 2Nicol Cesa-Bianchi et. al.,98 3P.Auer et. al.,02
  • MAB use in IR Dynamic Information Retrieval ModelingTutorial 2014121 Choosing ads to display to users1 Each ad is a bandit User click through rate is reward Recommending news articles2 News article is a bandit Similar to Information Filtering case Diversifying search results3 Each rank position is an MAB dependent on higher ranks Documents are bandits chosen by each rank 1Deepayan Chakrabarti et. al. ,09 2Lihong Li et. al., 10 3Radlinski et. al.,08
  • MAB Variations Dynamic Information Retrieval ModelingTutorial 2014122 Contextual Bandits1 World has some context (i.e. user location) Learn policy : that maps context to arms (online or offline) Duelling Bandits2 Play two (or more) bandits at each time step Observe relative reward rather than absolute Learn order of bandits Mortal Bandits3 Value of bandits decays over time Exploitation > exploration 1Lihong Li et. al.,10 2YisongYue et. al.,09 3Deepayan Chakrabarti et. al. ,09
  • Comparison of Markov Models Dynamic Information Retrieval ModelingTutorial 2014123 MC a fully observable stochastic process HMM a partially observable stochastic process MDP a fully observable decision process MAB a decision process, either fully or partially observable POMDP a partially observable decision process actions rewards states MC No No Observable HMM No No Unobservable MDP Yes Yes Observable POMDP Yes Yes Unobservable MAB Yes Yes Fixed
  • SIGIRTutorial July 7th 2014 Grace Hui Yang Marc Sloan JunWang Guest Speaker: EmineYilmaz Dynamic Information Retrieval Modeling Exercise
  • Outline Dynamic Information Retrieval ModelingTutorial 2014125 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
  • TREC Session Tracks (2010-2012) Given a series of queries {q1,q2,,qn}, top 10 retrieval results {D1, Di-1 } for q1 to qi-1, and click information The task is to retrieve a list of documents for the current/last query, qn Relevance judgment is made based on how relevant the documents are for qn, and how relevant they are for information needs for the entire session (in topic description) no need to segment the sessions 126
  • 1.pocono mountains pennsylvania 2.pocono mountains pennsylvania hotels 3.pocono mountains pennsylvania things to do 4.pocono mountains pennsylvania hotels 5.pocono mountains camelbeach 6.pocono mountains camelbeach hotel 7.pocono mountains chateau resort 8.pocono mountains chateau resort attractions 9.pocono mountains chateau resort getting to 10.chateau resort getting to 11.pocono mountains chateau resort directions TREC 2012 Session 6 127 Information needs: You are planning a winter vacation to the Pocono Mountains region in Pennsylvania in the US.Where will you stay?What will you do while there? How will you get there? In a session, queries change constantly
  • Query change is an important form of feedback We define query change as the syntactic editing changes between two adjacent queries: includes , added terms , removed terms The unchanged/shared terms are called: , theme term 1 iii qqq iq 128 iq iq iq themeq q1 = bollywood legislation q2 = bollywood law --------------------------------------- ThemeTerm = bollywood Added (+q) = law Removed (-q) = legislation
  • Where do these query changes come from? GivenTREC Session settings, we consider two sources of query change: the previous search results that a user viewed/read/examined the information need Example: Kurosawa Kurosawa wife `wife is not in any previous results, but in the topic description However, knowing information needs before search is difficult to achieve 129
  • Previous search results could influence query change in quite complex ways Merck lobbyists Merck lobbying US policy D1 contains several mentions ofpolicy, such as A lobbyist who until 2004 worked as senior policy advisor to Canadian Prime Minister Stephen Harper was hired last month by Merck These mentions are about Canadian policies; while the user adds US policy in q2 Our guess is that the user might be inspired bypolicy, but he/she prefers a different sub-concept other than `Canadian policy Therefore, for the added terms `US policy,US is the novel term here, andpolicy is not since it appeared in D1. The two terms should be treated differently 130
  • We propose to model session search as a Markov decision process (MDP) Two agents: the User and the Search Engine Dynamic Information Retrieval ModelingTutorial 2014131 Environments Search results States Queries Actions User actions: Add/remove/unchange the query terms Search Engine actions: Increase/ decrease /remain term weights Applying MDP to Session Search
  • Search Engine Agents Actions Di1 action Example qtheme Y increase pocono mountain in s6 N increase france world cup 98 reaction in s28, france world cup 98 reaction stock market france world cup 98 reaction +q Y decrease policy in s37, Merck lobbyists Merck lobbyists US policy N increase US in s37, Merck lobbyists Merck lobbyists US policy q Y decrease reaction in s28, france world cup 98 reaction france world cup 98 N No change legislation in s32, bollywood legislation bollywood law 132
  • Query Change retrieval Model (QCM) Bellman Equation gives the optimal value for an MDP: The reward function is used as the document relevance score function and is tweaked backwards from Bellman equation: 133 V* (s) = max a R(s,a) + g P(s' | s,a) s' V* (s') a Di )D|(qPmaxa),D,q|(qP+d)|(qP=d),Score(q 1-i1-i1-i1-iiii 1 Document relevant score Query Transition model Maximum past relevanceCurrent reward/relevanc e score
  • Calculating the Transition Model )|(log)|( )|(log)()|(log)|( )|(log)]|(1[+d)|P(qlog=d),Score(q * 1 * 1 * 1ii * 1 * 1 dtPdtP dtPtidfdtPdtP dtPdtP qt i dt qt dt qt i qthemet i ii 134 According to Query Change and Search Engine Actions Current reward/ relevance score Increase weights for theme terms Decrease weights for removed terms Increase weights for novel added terms Decrease weights for old added terms
  • Maximizing the Reward Function Generate a maximum rewarded document denoted as d* i-1, from Di-1 That is the document(s) most relevant to qi-1 The relevance score can be calculated as 1 1 = 1 {1 (|1)} 1 1 = #(, 1) | 1| From several options, we choose to only use the document with top relevance max Di-1 P(qi-1 | Di-1) 135
  • Scoring the Entire Session The overall relevance score for a session of queries is aggregated recursively : Scoresession (qn, d) = Score(qn, d) + gScoresession (qn-1, d) = Score(qn, d) + g[Score(qn-1, d) + gScoresession (qn-2, d)] = gn-i i=1 n Score(qi, d) 136
  • Experiments TREC 2011-2012 query sets, datasets ClubWeb09 Category B 137
  • Search Accuracy (TREC 2012) [email protected] (official metric used inTREC) Approach [email protected] %chg MAP %chg Lemur 0.2474 -21.54% 0.1274 -18.28% TREC12 median 0.2608 -17.29% 0.1440 -7.63% Our TREC12 submission 0.3021 4.19% 0.1490 -4.43% TREC12 best 0.3221 0.00% 0.1559 0.00% QCM 0.3353 4.10% 0.1529 -1.92% QCM+Dup 0.3368 4.56% 0.1537 -1.41% 138
  • Search Accuracy (TREC 2011) [email protected] (official metric used inTREC) Approach [email protected] %chg MAP %chg Lemur 0.3378 -23.38% 0.1118 -25.86% TREC11 median 0.3544 -19.62% 0.1143 -24.20% TREC11 best 0.4409 0.00% 0.1508 0.00% QCM 0.4728 7.24% 0.1713 13.59% QCM+Dup 0.4821 9.34% 0.1714 13.66% Our TREC12 submission 0.4836 9.68% 0.1724 14.32% 139
  • Search Accuracy for Different Session Types TREC 2012 Sessions are classified into: Product: Factual / Intellectual Goal quality: Specific / Amorphous Intellec tual %chg Amorphous %chg Specific %chg Factual %chg TREC best 0.3369 0.00% 0.3495 0.00% 0.3007 0.00% 0.3138 0.00% Nugget 0.3305 -1.90% 0.3397 -2.80% 0.2736 -9.01% 0.2871 -8.51% QCM 0.3870 14.87% 0.3689 5.55% 0.3091 2.79% 0.3066 -2.29% QCM+DUP 0.3900 15.76% 0.3692 5.64% 0.3114 3.56% 0.3072 -2.10% 140 - Better handle sessions that demonstrate evolution and exploration Because QCM treats a session as a continuous process by studying changes among query transitions and modeling the dynamics
  • Outline Dynamic Information Retrieval ModelingTutorial 2014141 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014142
  • Multi Page Search Dynamic Information Retrieval ModelingTutorial 2014143 Page 1 Page 2 2. 1. 2. 1.
  • Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014144 No UI Changes Interactivity is Hidden Private, performed in browser
  • Relevance Feedback Dynamic Information Retrieval ModelingTutorial 2014145 Page 1 Diverse Ranking Maximise learning potential Exploration vs Exploitation Page 2 Clickthroughs or explicit ratings Respond to feedback from page 1 Personalized
  • Model Dynamic Information Retrieval ModelingTutorial 2014146
  • Model Dynamic Information Retrieval ModelingTutorial 2014147 1, 1 1 -prior estimate of relevance 1 - prior estimate of covariance Document similarity Topic Clustering
  • Model Dynamic Information Retrieval ModelingTutorial 2014148 Rank action for page 1
  • Model Dynamic Information Retrieval ModelingTutorial 2014149
  • Model Dynamic Information Retrieval ModelingTutorial 2014150 Feedback from page 1 ~ ( 1 , 1 )
  • Model Dynamic Information Retrieval ModelingTutorial 2014151 Update estimates using 1 1 = 1 = s s 2 = + s 1 (1 ) 2 = - s 1 s
  • Model Dynamic Information Retrieval ModelingTutorial 2014152 Rank using PRP
  • Model Dynamic Information Retrieval ModelingTutorial 2014153 Utility or Ranking 1 log2(+1) + 1 2 log2(+1) 2 =1+ =1 DCG
  • Model Bellman Equation Dynamic Information Retrieval ModelingTutorial 2014154 Optimize 1 to improve 2 1 , 1 , 1 = max 1 1 . 1 + max 2 (1 ) 2 . 2
  • Dynamic Information Retrieval ModelingTutorial 2014155 Balances exploration and exploitation in page 1 Tuned for different queries Navigational Informational = 1 for non-ambiguous search
  • Approximation Dynamic Information Retrieval ModelingTutorial 2014156 Monte Carlo Sampling max 1 1 . 1 + max 2 1 1 2 . 2 Sequential Ranking Decision
  • Experiment Data Dynamic Information Retrieval ModelingTutorial 2014157 Difficult to evaluate without access to live users Simulated using 3TREC collections and relevance judgements WT10G Explicit Ratings TREC8 Clickthroughs Robust Difficult (ambiguous) search
  • User Simulation Dynamic Information Retrieval ModelingTutorial 2014158 Rank M documents Simulated user clicks according to relevance judgements Update page 2 ranking Measure at page 1 and 2 Recall Precision nDCG MRR BM25 prior ranking model
  • Investigating Dynamic Information Retrieval ModelingTutorial 2014159
  • Baselines Dynamic Information Retrieval ModelingTutorial 2014160 determined experimentally BM25 BM25 with conditional update ( = 1) Maximum Marginal Relevance (MMR) Diversification MMR with conditional update Rocchio Relevance Feedback
  • Results Dynamic Information Retrieval ModelingTutorial 2014161
  • Results Dynamic Information Retrieval ModelingTutorial 2014162
  • Results Dynamic Information Retrieval ModelingTutorial 2014163
  • Results Dynamic Information Retrieval ModelingTutorial 2014164
  • Results Dynamic Information Retrieval ModelingTutorial 2014165 Similar results across data sets and metrics 2nd page gain outweighs 1st page losses Outperformed Maximum Marginal Relevance using MRR to measure diversity BM25-U simply no exploration case Similar results when = 5
  • Results Dynamic Information Retrieval ModelingTutorial 2014166
  • Outline Dynamic Information Retrieval ModelingTutorial 2014167 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation
  • Dynamic Information Retrieval Evaluation EmineYilmaz University College London [email protected]
  • Information Retrieval Systems Match information seekers with the information they seek
  • Retrieval Evaluation: Traditional View
  • Retrieval Evaluation: Dynamic View
  • Retrieval Evaluation: Dynamic View
  • Retrieval Evaluation: Dynamic View
  • Different Approaches to Evaluation Online Evaluation Design interactive experiments Use users actions to evaluate the quality Inherently dynamic in nature Offline Evaluation Controlled laboratory experiments The users interaction with the engine is only simulated Recent work focused on dynamic IR evaluation
  • Online Evaluation Standard click metrics Clickthrough rate Probability user skips over results they have considered (pSkip) Most recently: Result interleaving Click/Noclick Evaluate 175
  • What is result interleaving? A way to compare rankers online Given the two rankings produced by two methods Present a combination of the rankings to users Team Draft Interleaving (Radlinski et al., 2008) Interleaving two rankings Input:Two rankings (can be seen as teams who pick players) Repeat: o Toss a coin to see which team (ranking) picks next o Winner picks their best remaining player (document) o Loser picks their best remaining player (document) Output: One ranking (2 teams of 5) Credit assignment Ranking providing more of the clicked results wins
  • Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org AB
  • Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins!
  • Team Draft InterleavingRanking A 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Valley Wineries - Plan your wine... www.napavalley.com/wineries 3. Napa Valley College www.napavalley.edu/homex.asp 4. Been There | Tips | Napa Valley www.ivebeenthere.co.uk/tips/16681 5. Napa Valley Wineries and Wine www.napavintners.com 6. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley Ranking B 1. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 2. Napa Valley The authority for lodging... www.napavalley.com 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 5. NapaValley.org www.napavalley.org 6. The Napa Valley Marathon www.napavalleymarathon.org Presented Ranking 1. Napa Valley The authority for lodging... www.napavalley.com 2. Napa Country, California Wikipedia en.wikipedia.org/wiki/Napa_Valley 3. Napa: The Story of an American Eden... books.google.co.uk/books?isbn=... 4. Napa Valley Wineries Plan your wine... www.napavalley.com/wineries 5. Napa Valley Hotels Bed and Breakfast... www.napalinks.com 6. Napa Valley College www.napavalley.edu/homex.asp 7 NapaValley.org www.napavalley.org B wins! Repeat Over Many Different Queries!
  • Offline Evaluation Controlled laboratory experiments The users interaction with the engine is only simulated Ask experts to judge each query result Predict how users behave when they search Aggregate judgments to evaluate 180
  • Offline Evaluation Until recently: Metrics assume that users information need was not affected by the documents read E.g.Average Precision, NDCG, Users are more likely to stop searching when they see a highly relevant document Lately: Metrics that incorporate the affect of relevance of documents seen by the user on user behavior Based on devising more realistic user models EBU, ERR [Yilmaz et al CIKM10, Chapelle et al CIKM09] 181
  • Modeling User Behavior Cascade-based models black powder ammunition 1 2 3 4 5 6 7 8 9 10 The user views search results from top to bottom At each rank i, the user has a certain probability of being satisfied. Probability of satisfaction proportional to the relevance grade of the document at rank i. Once the user is satisfied with a document, he terminates the search.
  • Rank Biased Precision Query Stop View Next Item black powder ammunition 1 2 3 4 5 6 7 8 9 10
  • Rank Biased Precision black powder ammunition 1 2 3 4 5 6 7 8 9 10 1=i 1 =utilityTotal i irel examineddocsm.utility/NuTotalRBP )1/(1)1(=examineddocsNum. 1=i 1 i i )-(1=RBP 1=i 1 i irel
  • Expected Reciprocal Rank [Chapelle et al CIKM09] Query Stop Relevant? View Next Item nosomewhathighly black powder ammunition 1 2 3 4 5 6 7 8 9 10
  • Expected Reciprocal Rank [Chapelle et al CIKM09] black powder ammunition 1 2 3 4 5 6 7 8 9 10 rrankatdocument"perfectthe"findingofUtility:(r) 1/r(r) )positionatstopsuser( 1 1 rP r ERR n r 1 11 )1( 1 r i ri n r RR r ERR documentitheofgraderelevance: th ig iRi g g i i docatstopofProb. 2 12 docofrelevanceofProb. max
  • Paris Luxurious HotelsParis HiltonJ LoSession Evaluation
  • What is a good system?
  • Measuring goodness The user steps down a ranked list of documents and observes each one of them until a decision point and either a) abandons the search, or b) reformulates While stepping down or sideways, the user accumulates utility
  • Evaluation over a single ranked list 1 2 3 4 5 6 7 8 9 10 kenya cooking traditional swahili kenya cooking traditional kenya swahili traditional food recipes
  • Session DCG [Jrvelin et al ECIR 2008] kenya cooking traditional swahili kenya cooking traditional 2rel(r) 1 logb (r b 1)r1 k 2rel(r) 1 logb (r b 1)r1 k 1 logc (1 c 1) DCG(RL1) 1 logc (2 c 1) DCG(RL2)
  • Model-based measures Probabilistic space of users following different paths is the space of all paths P() is the prob of a user following a path in M is a measure over a path [Yang and Lad ICTIR 2009, Kanoulas et al. SIGIR 2011]
  • Probability of a path Probability of abandoning at reform 2 X Probability of reformulating at rank 3 Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R (1) (2)
  • Expected Global Utility [Yang and Lad ICTIR 2009] 1. User steps down ranked results one-by-one 2. Stops browsing documents based on a stochastic process that defines a stopping probability distribution over ranks and reformulates 3. Gains something from relevant documents, accumulating utility
  • Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R Probability of abandoning the session at reformulation i Geometric w/ parameter preform (1)
  • Q1 Q2 Q3 N R R N R R N R R N R R N R R N N R N N R N N R N N R N N R Geometricw/parameterpdown Probability of reformulating at rank j (2) Geometric w/ parameter preform
  • Expected Global Utility [Yang and Lad ICTIR 2009] The probability of a user following a path : P() = P(r1, r2, ..., rK) ri is the stopping and reformulation point in list i Assumption: stopping positions in each list are independent P(r1, r2, ..., rK) = P(r1)P(r2)...P(rK) Use geometric distribution (RBP) to model the stopping and reformulation behaviour P(ri = r) = (1-)k1
  • Conclusions Recent focus on evaluating the dynamic nature of the search process Interleaving New offline evaluation metrics ERR, RBU Session evaluation metrics
  • Outline Dynamic Information Retrieval ModelingTutorial 2014200 Introduction Theory and Models Session Search Reranking GuestTalk: Evaluation Conclusion
  • Conclusions Dynamic Information Retrieval ModelingTutorial 2014201 Dynamic IR describes a new class of interactive model Incorporates rich feedback, temporal dependency and is goal oriented. Family of Markov models and Multi Armed Bandit theory useful in building DIR models Applicable to a range of IR problems Useful in applications such as session search and evaluation
  • Dynamic IR Book Dynamic Information Retrieval ModelingTutorial 2014202 Published by Morgan & Claypool Synthesis Lectures on Information Concepts, Retrieval, and Services Due March/April 2015 (in time for SIGIR 2015)
  • Acknowledgment Dynamic Information Retrieval ModelingTutorial 2014203 We thank Dr. EmineYilmaz for giving us the guest speech. We sincerely thank Dr. Xuchu Dong for his help in preparation of the tutorial We also thank comments and suggestions from the following colleagues: Dr. Jamie Callan Dr. Ophir Frieder Dr. Fernando Diaz Dr Filip Radlinski
  • Dynamic Information Retrieval ModelingTutorial 2014204
  • Thank You Dynamic Information Retrieval ModelingTutorial 2014205
  • References Dynamic Information Retrieval ModelingTutorial 2014206 Static IR Modern Information Retrieval. R. Baeza-Yates and B. Ribeiro- Neto.Addison-Wesley, 1999. The PageRank Citation Ranking: Bringing Order to theWeb. Lawrence Page , Sergey Brin , Rajeev Motwani ,TerryWinograd. 1999 Implicit User Modeling for Personalized Search, Xuehua Shen et. al, CIKM, 2005 A Short Introduction to Learning to Rank. Hang Li, IEICE Transactions 94-D(10): 1854-1862, 2011.
  • References Dynamic Information Retrieval ModelingTutorial 2014207 Interactive IR Relevance Feedback in Information Retrieval, Rocchio, J. J.,The SMART Retrieval System (pp. 313-23), 1971 A study in interface support mechanisms for interactive information retrieval, RyenW.White et. al, JASIST, 2006 Visualizing stages during an exploratory search session, Bill Kules et. al, HCIR, 2011 Dynamic Ranked Retrieval, Cristina Brandt et. al,WSDM, 2011 Structured Learning of Two-level Dynamic Rankings, Karthik Raman et. al, CIKM, 2011
  • References Dynamic Information Retrieval ModelingTutorial 2014208 Dynamic IR A hidden Markov model information retrieval system. D. R. H. Miller,T. Leek, and R. M. Schwartz. In SIGIR99, pages 214-221. Threshold setting and performance optimization in adaptive ltering, Stephen Robertson, JIR 2002 A large-scale study of the evolution of web pages, Dennis Fetterly et. al.,WWW 2003 Learning diverse rankings with multi-armed bandits. Filip Radlinski, Robert Kleinberg,Thorsten Joachims. ICML, 2008. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem,YisongYue et. al., ICML 2009 Meme-tracking and the dynamics of the news cycle, Jure Leskovec et. al., KDD 2009
  • References Dynamic Information Retrieval ModelingTutorial 2014209 Dynamic IR Mortal multi-armed bandits. Deepayan Chakrabarti, Ravi Kumar, Filip Radlinski, Eli Upfal. NIPS 2009 A Novel Click Model and Its Applications to Online Advertising , Zeyuan Allen Zhu et. al.,WSDM 2010 A contextual-bandit approach to personalized news article recommendation. Lihong Li,Wei Chu, John Langford, Robert E. Schapire.WWW, 2010 Inferring search behaviors using partially observable markov model with duration (POMD),Yin he et. al.,WSDM, 2011 No Clicks, No Problem: Using Cursor Movements to Understand and Improve Search, Jeff Huang et. al., CHI 2011 Balancing Exploration and Exploitation in Learning to Rank Online, Katja Hofmann et. al., ECIR, 2011 Large-ScaleValidation and Analysis of Interleaved Search Evaluation, Olivier Chapelle et. al.,TOIS 2012
  • References Dynamic Information Retrieval ModelingTutorial 2014210 Dynamic IR Using ControlTheory for Stable and Efficient Recommender Systems.T. Jambor, J.Wang, N. Lathia. In:WWW '12, pages 11-20. Sequential selection of correlated ads by POMDPs, ShuaiYuan et. al., CIKM 2012 Utilizing query change for session search. D. Guan, S. Zhang, and H. Yang. In SIGIR 13, pages 453462. Query Change as Relevance Feedback in Session Search (short paper). S. Zhang, D. Guan, and H.Yang. In SIGIR 2013. Interactive exploratory search for multi page search results. X. Jin, M. Sloan, and J.Wang. InWWW 13. Interactive Collaborative Filtering. X. Zhao,W. Zhang, J.Wang. In: CIKM'2013, pages 1411-1420. Win-win search: Dual-agent stochastic game in session search. J. Luo, S. Zhang, and H.Yang. In SIGIR 14.
  • References Dynamic Information Retrieval ModelingTutorial 2014211 Markov Processes A markovian decision process. R. Bellman. Indiana University Mathematics Journal, 6:679684, 1957. Dynamic Programming. R. Bellman. Princeton University Press, Princeton, NJ, USA, first edition, 1957. Dynamic Programming and Markov Processes. R.A. Howard. MIT Press. 1960 Linear Programming and Sequential Decisions.Alan S. Manne. Management Science, 1960 Statistical Inference for Probabilistic Functions of Finite State Markov Chains. Baum, Leonard E.; Petrie,Ted.The Annals of Mathematical Statistics 37, 1966
  • References Dynamic Information Retrieval ModelingTutorial 2014212 Markov Processes Learning to predict by the methods of temporal differences. Richard Sutton. Machine Learning 3. 1988 Computationally feasible bounds for partially observed Markov decision processes.W. Lovejoy. Operations Research 39: 162175, 1991. Q-Learning. Christopher J.C.H.Watkins, Peter Dayan. Machine Learning. 1992 Reinforcement learning with replacing eligibility traces. Singh, S. P. & Sutton, R. S. Machine Learning, 22, pages 123-158, 1996. Reinforcement Learning:An Introduction. Richard S. Sutton and Andrew G. Barto. MIT Press, 1998. Planning and acting in partially observable stochastic domains. L. Kaelbling, M. Littman, and A. Cassandra.Artificial Intelligence, 101(1- 2):99134, 1998.
  • References Dynamic Information Retrieval ModelingTutorial 2014213 Markov Processes Finding approximate POMDP solutions through belief compression. N. Roy. PhDThesis Carnegie Mellon. 2003 VDCBPI: an approximate scalable algorithm for large scale POMDPs, P. Poupart and C. Boutilier. In NIPS-2004, pages 10811088. Finding Approximate POMDP solutionsThrough Belief Compression. N. Roy, G. Gordon and S.Thrun. Journal of Artificial Intelligence Research, 23:1-40,2005. Probabilistic robotics. S.Thrun,W. Burgard, D. Fox. Cambridge. MIT Press. 2005 Anytime Point-Based Approximations for Large POMDPs. J. Pineau, G. Gordon and S.Thrun.Volume 27, pages 335-380, 2006 Probabilistic Robotics. S.Thrun,W. Burgard, D. Fox.The MIT Press, 2006.
  • References Dynamic Information Retrieval ModelingTutorial 2014214 Markov Processes The optimal control of partially observable Markov decision processes over a finite horizon. R. D. Smallwood, E.J. Sondik. Operations Research. 1973 Modified Policy IterationAlgorithms for Discounted Markov Decision Problems. M. L. Puterman and Shin M. C. Management Science 24, 1978. An example of statistical investigation of the text eugene onegin the connection of samples in chains.A.A. Markov. Science in Context, 19:591600, 12 2006. Learning to Rank for Information Retrieval.Tie-Yan Liu. Springer Science & Business Media. 2011 Finite-Time Regret Bounds for the Multiarmed Bandit Problem, Nicol Cesa- Bianchi, Paul Fischer. ICML 100-108, 1998 Multi-armed bandit allocation indices,Wiley, J. C. Gittins. 1989 Finite-time Analysis of the Multiarmed Bandit Problem, PeterAuer et. al., Machine Learning 47, Issue 2-3. 2002.