Personalizing Atypical Web Search Sessions (WSDM'13)

Post on 17-Jun-2015

185 views 1 download

Tags:

description

State-of-the-art web search personalization treats users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that there is a significant number of search sessions in which users diverge from their regular search profiles in order to satisfy atypical, non-recurring information needs. In this work, we conduct a large-scale inspection of real life search sessions to further the understanding of this problem. Subsequently, we design an automatic means of detecting and supporting such atypical sessions. We demonstrate significant improvements over state-of-the-art web search personalization techniques by accounting for the typicality of search sessions. The merit of the proposed method is evaluated based on web-scale search session data spanning several months of user activity. This work together with Kevyn Collins-Thompson, Paul Bennett and Susan Dumais has been accepted for full oral presentation at the ACM International Conference on Web Search and Data Mining (WSDM) in Rome, Italy. The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2433434

Transcript of Personalizing Atypical Web Search Sessions (WSDM'13)

February 7, 2013

Challenge the future

Delft University of Technology

Personalizing Atypical Web Search Sessions

Carsten Eickhoff, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais

2 Personalizing Atypical Web Search Sessions

Introduction

• Web search personalization is used to account for user

preferences based on historic observations

• This seems appropriate for typical search tasks

• Atypical search tasks in unfamiliar domains may not

benefit as much from personalization

3 Personalizing Atypical Web Search Sessions

4 Personalizing Atypical Web Search Sessions

Overview

1. Investigation of nature, frequency and cause of atypical search sessions

2. Automatic prediction of atypical search sessions

3. Personalization of atypical search sessions

5 Personalizing Atypical Web Search Sessions

Atypical Search Tasks

• Motivation • Atypical search tasks are often caused by external needs

• Topic domain

• They cover unfamiliar, previously unseen topics & genres

• Behavior

• Due to limited domain knowledge, the searcher may encounter problems during query formulation and result selection

6 Personalizing Atypical Web Search Sessions

Atypical Search Sessions

• Sessions tend to be task-centered • 200 most active Bing users in January 2012

• Navigational queries removed

• Based on a profile, each session is manually judged as

typical or atypical

7 Personalizing Atypical Web Search Sessions

Human-readable User Profiles

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

8 Personalizing Atypical Web Search Sessions

Example 1:

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

• Boxing(“soto vs ortiz”)

• Boxing(“humberto soto”)

9 Personalizing Atypical Web Search Sessions

Example 2:

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

• Dentistry(“oral sores”)

• Dentistry(“aphthous sore”)

• Healthcare(“aphthous ulcer treatment”)

10 Personalizing Atypical Web Search Sessions

Frequency of Atypical Sessions

• 166 out of 2790 sessions (~6%) were judged atypical

• 74% of all users show atypical search sessions

• On average, 7.5% of a user's query volume is atypical

11 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

12 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

13 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

14 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

• higher SAT reading level

15 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

• higher SAT reading level

• Observed differences persist within profiles

16 Personalizing Atypical Web Search Sessions

Topic Spread

• Most frequent topics for typical sessions include (sports,

celebrities & gossip, entertainment)

• Often typicality describes the cut between what you choose to

do (typical) and what you have to do (atypical)

17 Personalizing Atypical Web Search Sessions

Predicting Atypical Sessions

• User profiling based on activity in Jan. 2012

• Post hoc classification of sessions in Apr. 2012

• Session features (25)

• Session length

• Query length

• Question words

• POS ratios

• Longest query position

• Reading level

• ...

• Profile features (27)

• δ session length

• δ query length

• …

• Query vocabulary

divergence

• Topic divergence

18 Personalizing Atypical Web Search Sessions

Classification Performance

• Logistic regression model

• 52-dimensional feature space

• CV performance F1 0.84 (P 0.82 / R 0.86) • The resulting performance is comparable to the

agreement among human judges

19 Personalizing Atypical Web Search Sessions

Strongest Typicality Indicators

1. Query length divergence from profile

2. Absolute query length

3. Question word ratio

4. Verb ratio divergence from profile

5. Topic divergence from profile

20 Personalizing Atypical Web Search Sessions

Robustness to Sparsity

• It takes 15-20 sessions to reliably characterize users

• Most users reach this point in 14 days of search activity

21 Personalizing Atypical Web Search Sessions

Retrieval Performance

• Moving from qualitative setting to web scale

• 155k users

• 10.4M sessions

• LambdaMART learning scheme

• Re-ranking the top 10 returned results

22 Personalizing Atypical Web Search Sessions

Profiling Scopes

• Session

• All previous activity in the same session

• Historic

• All previous activity before this session

• Aggregate

• All previous activity before current query

23 Personalizing Atypical Web Search Sessions

Personalization Performance

24 Personalizing Atypical Web Search Sessions

Personalization Performance

25 Personalizing Atypical Web Search Sessions

Hybrid Approach

26 Personalizing Atypical Web Search Sessions

Hybrid Performance

27 Personalizing Atypical Web Search Sessions

Hybrid Performance

28 Personalizing Atypical Web Search Sessions

Hybrid Performance

29 Personalizing Atypical Web Search Sessions

Conclusion

• Investigation of atypical web search sessions • Atypical sessions occur for most users • They account for a significant proportion of queries

• Typicality Prediction • Automatic prediction comparable to human accuracy

• Benefit for Retrieval Settings • Including typicality information into the personalization

process improved retrieval performance

30 Personalizing Atypical Web Search Sessions

Future Work

• Classification of ongoing sessions • Currently looking at last query, do it earlier for more impact • Weakly supervised approach based on pre-classified

complete sessions

• From atypical to typical • How do information needs change over time? • Life cycle of an information need / a profile

• Information value at risk • How important is a piece of information for the user? • How much effort will they invest?

31 Personalizing Atypical Web Search Sessions

Thank You!

c.eickhoff@tudelft.nl