Personalizing Atypical Web Search Sessions (WSDM'13)

31
February 7, 2013 Challenge the future Delft University of Technology Personalizing Atypical Web Search Sessions Carsten Eickhoff, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais

description

State-of-the-art web search personalization treats users as static or slowly evolving entities with a given set of preferences defined by their past behavior. However, recent publications as well as empirical evidence suggest that there is a significant number of search sessions in which users diverge from their regular search profiles in order to satisfy atypical, non-recurring information needs. In this work, we conduct a large-scale inspection of real life search sessions to further the understanding of this problem. Subsequently, we design an automatic means of detecting and supporting such atypical sessions. We demonstrate significant improvements over state-of-the-art web search personalization techniques by accounting for the typicality of search sessions. The merit of the proposed method is evaluated based on web-scale search session data spanning several months of user activity. This work together with Kevyn Collins-Thompson, Paul Bennett and Susan Dumais has been accepted for full oral presentation at the ACM International Conference on Web Search and Data Mining (WSDM) in Rome, Italy. The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2433434

Transcript of Personalizing Atypical Web Search Sessions (WSDM'13)

Page 1: Personalizing Atypical Web Search Sessions (WSDM'13)

February 7, 2013

Challenge the future

Delft University of Technology

Personalizing Atypical Web Search Sessions

Carsten Eickhoff, Kevyn Collins-Thompson, Paul Bennett and Susan Dumais

Page 2: Personalizing Atypical Web Search Sessions (WSDM'13)

2 Personalizing Atypical Web Search Sessions

Introduction

• Web search personalization is used to account for user

preferences based on historic observations

• This seems appropriate for typical search tasks

• Atypical search tasks in unfamiliar domains may not

benefit as much from personalization

Page 3: Personalizing Atypical Web Search Sessions (WSDM'13)

3 Personalizing Atypical Web Search Sessions

Page 4: Personalizing Atypical Web Search Sessions (WSDM'13)

4 Personalizing Atypical Web Search Sessions

Overview

1. Investigation of nature, frequency and cause of atypical search sessions

2. Automatic prediction of atypical search sessions

3. Personalization of atypical search sessions

Page 5: Personalizing Atypical Web Search Sessions (WSDM'13)

5 Personalizing Atypical Web Search Sessions

Atypical Search Tasks

• Motivation • Atypical search tasks are often caused by external needs

• Topic domain

• They cover unfamiliar, previously unseen topics & genres

• Behavior

• Due to limited domain knowledge, the searcher may encounter problems during query formulation and result selection

Page 6: Personalizing Atypical Web Search Sessions (WSDM'13)

6 Personalizing Atypical Web Search Sessions

Atypical Search Sessions

• Sessions tend to be task-centered • 200 most active Bing users in January 2012

• Navigational queries removed

• Based on a profile, each session is manually judged as

typical or atypical

Page 7: Personalizing Atypical Web Search Sessions (WSDM'13)

7 Personalizing Atypical Web Search Sessions

Human-readable User Profiles

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

Page 8: Personalizing Atypical Web Search Sessions (WSDM'13)

8 Personalizing Atypical Web Search Sessions

Example 1:

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

• Boxing(“soto vs ortiz”)

• Boxing(“humberto soto”)

Page 9: Personalizing Atypical Web Search Sessions (WSDM'13)

9 Personalizing Atypical Web Search Sessions

Example 2:

• 55% Football(“nfl”,”philadelphia eagles”,”mark sanchez”)

• 14% Boxing(“espn boxing”,”mickey garcia”,”hbo boxing”)

• 09% Television(“modern familiy”,”dexter 8”,”tv guide”)

• 06% Travel(“rome hotels”,“tripadvisor seattle”,“rome pasta”)

• 05% Hockey(“elmira pioneers”,”umass lax”,”necbl”)

• Dentistry(“oral sores”)

• Dentistry(“aphthous sore”)

• Healthcare(“aphthous ulcer treatment”)

Page 10: Personalizing Atypical Web Search Sessions (WSDM'13)

10 Personalizing Atypical Web Search Sessions

Frequency of Atypical Sessions

• 166 out of 2790 sessions (~6%) were judged atypical

• 74% of all users show atypical search sessions

• On average, 7.5% of a user's query volume is atypical

Page 11: Personalizing Atypical Web Search Sessions (WSDM'13)

11 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

Page 12: Personalizing Atypical Web Search Sessions (WSDM'13)

12 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

Page 13: Personalizing Atypical Web Search Sessions (WSDM'13)

13 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

Page 14: Personalizing Atypical Web Search Sessions (WSDM'13)

14 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

• higher SAT reading level

Page 15: Personalizing Atypical Web Search Sessions (WSDM'13)

15 Personalizing Atypical Web Search Sessions

Typical vs. Atypical Sessions

• Atypical sessions have:

• longer queries (often natural language questions)

• more diverse query vocabulary

• higher SAT reading level

• Observed differences persist within profiles

Page 16: Personalizing Atypical Web Search Sessions (WSDM'13)

16 Personalizing Atypical Web Search Sessions

Topic Spread

• Most frequent topics for typical sessions include (sports,

celebrities & gossip, entertainment)

• Often typicality describes the cut between what you choose to

do (typical) and what you have to do (atypical)

Page 17: Personalizing Atypical Web Search Sessions (WSDM'13)

17 Personalizing Atypical Web Search Sessions

Predicting Atypical Sessions

• User profiling based on activity in Jan. 2012

• Post hoc classification of sessions in Apr. 2012

• Session features (25)

• Session length

• Query length

• Question words

• POS ratios

• Longest query position

• Reading level

• ...

• Profile features (27)

• δ session length

• δ query length

• …

• Query vocabulary

divergence

• Topic divergence

Page 18: Personalizing Atypical Web Search Sessions (WSDM'13)

18 Personalizing Atypical Web Search Sessions

Classification Performance

• Logistic regression model

• 52-dimensional feature space

• CV performance F1 0.84 (P 0.82 / R 0.86) • The resulting performance is comparable to the

agreement among human judges

Page 19: Personalizing Atypical Web Search Sessions (WSDM'13)

19 Personalizing Atypical Web Search Sessions

Strongest Typicality Indicators

1. Query length divergence from profile

2. Absolute query length

3. Question word ratio

4. Verb ratio divergence from profile

5. Topic divergence from profile

Page 20: Personalizing Atypical Web Search Sessions (WSDM'13)

20 Personalizing Atypical Web Search Sessions

Robustness to Sparsity

• It takes 15-20 sessions to reliably characterize users

• Most users reach this point in 14 days of search activity

Page 21: Personalizing Atypical Web Search Sessions (WSDM'13)

21 Personalizing Atypical Web Search Sessions

Retrieval Performance

• Moving from qualitative setting to web scale

• 155k users

• 10.4M sessions

• LambdaMART learning scheme

• Re-ranking the top 10 returned results

Page 22: Personalizing Atypical Web Search Sessions (WSDM'13)

22 Personalizing Atypical Web Search Sessions

Profiling Scopes

• Session

• All previous activity in the same session

• Historic

• All previous activity before this session

• Aggregate

• All previous activity before current query

Page 23: Personalizing Atypical Web Search Sessions (WSDM'13)

23 Personalizing Atypical Web Search Sessions

Personalization Performance

Page 24: Personalizing Atypical Web Search Sessions (WSDM'13)

24 Personalizing Atypical Web Search Sessions

Personalization Performance

Page 25: Personalizing Atypical Web Search Sessions (WSDM'13)

25 Personalizing Atypical Web Search Sessions

Hybrid Approach

Page 26: Personalizing Atypical Web Search Sessions (WSDM'13)

26 Personalizing Atypical Web Search Sessions

Hybrid Performance

Page 27: Personalizing Atypical Web Search Sessions (WSDM'13)

27 Personalizing Atypical Web Search Sessions

Hybrid Performance

Page 28: Personalizing Atypical Web Search Sessions (WSDM'13)

28 Personalizing Atypical Web Search Sessions

Hybrid Performance

Page 29: Personalizing Atypical Web Search Sessions (WSDM'13)

29 Personalizing Atypical Web Search Sessions

Conclusion

• Investigation of atypical web search sessions • Atypical sessions occur for most users • They account for a significant proportion of queries

• Typicality Prediction • Automatic prediction comparable to human accuracy

• Benefit for Retrieval Settings • Including typicality information into the personalization

process improved retrieval performance

Page 30: Personalizing Atypical Web Search Sessions (WSDM'13)

30 Personalizing Atypical Web Search Sessions

Future Work

• Classification of ongoing sessions • Currently looking at last query, do it earlier for more impact • Weakly supervised approach based on pre-classified

complete sessions

• From atypical to typical • How do information needs change over time? • Life cycle of an information need / a profile

• Information value at risk • How important is a piece of information for the user? • How much effort will they invest?

Page 31: Personalizing Atypical Web Search Sessions (WSDM'13)

31 Personalizing Atypical Web Search Sessions

Thank You!

[email protected]