Diane Kelly, Filip Radlinski, Jaime Teevan

78
Observational Approaches to Information Retrieval SIGIR 2014 Tutorial: Choices and Constraints (Part II) Diane Kelly, Filip Radlinski, Jaime Teevan Slides available at: http://aka.ms/sigirtutorial

description

Observational Approaches to Information Retrieval SIGIR 2014 Tutorial: Choices and Constraints (Part II). Diane Kelly, Filip Radlinski, Jaime Teevan. Slides available at: http://aka.ms/sigirtutorial. Diane Kelly, University of North Carolina, USA. Filip Radlinski , Microsoft, UK. - PowerPoint PPT Presentation

Transcript of Diane Kelly, Filip Radlinski, Jaime Teevan

Page 1: Diane Kelly, Filip Radlinski, Jaime Teevan

Observational Approaches to Information RetrievalSIGIR 2014 Tutorial: Choices and Constraints (Part II)

Diane Kelly, Filip Radlinski, Jaime Teevan

Slides available at: http://aka.ms/sigirtutorial

Page 2: Diane Kelly, Filip Radlinski, Jaime Teevan

Diane Kelly, University of North Carolina, USA

Filip Radlinski, Microsoft, UK

Jaime Teevan, Microsoft Research, USA

Page 3: Diane Kelly, Filip Radlinski, Jaime Teevan

Tutorial Goals 1. To help participants develop a broader perspective of

research goals and approaches in IR. Descriptive, predictive and explanatory

2. To improve participants’ understandings of research choices and constraints. Every research project requires the researcher to make a series of

choices about a range of factors and usually there are constraints that influence these choices.

By using some of our own research papers, we aim to expose you to the experiential aspects of the research process by giving you a behind the scenes view of how we make/made choices in our own research.

Page 4: Diane Kelly, Filip Radlinski, Jaime Teevan

Research Goals & Approaches

Describe• Report a set of observations and provide benchmarks (e.g.,

average queries per user, problems a user experiences when engaging in search)

• Such studies might also present categorizations of the observations

Predict• Seek to establish predictable relationships• Take as input some set of features (click through rate, dwell

time) and use these to predict other variables (query abandonment, satisfaction)

Explain(why?)

• Propose a theoretical model that explains how select constructs interact and interrelate

• Devise procedures to measure those constructs (that is, translate the constructs into variables that can be controlled and measured)

• Devise protocol (usually experimental) to observe phenomenon of interest.

• Seek to demonstrate causality, not just show the variables are related.

Page 5: Diane Kelly, Filip Radlinski, Jaime Teevan

Research Goals & Approaches

Describe Predict Explain

AfternoonField Observation

Log Analysis ✔ ✔

Morning

Laboratory Experiment

✔ ✔ ✔

Field Experiment ✔ ✔ ✔

Page 6: Diane Kelly, Filip Radlinski, Jaime Teevan

Example: Search Difficulties

Describe• A diary study might be used to gain insight about when and

how users’ experience and address search difficulties.• Log data might also be analyzed to identify how often these

events occur.

Predict• A model might be constructed using the signals available in

a log to predict when users will abandon search result pages without clicking. This model might then be evaluated with other log data.

Explain

• Results from these studies might then be used to create an explanatory/theoretical model of search difficulty, which can be used to generate testable hypotheses. The model can include constructs and variables beyond those which are available in the log data.

• An experiment might be designed to test the explanatory power of the theory indirectly by examining the predictive power of the hypotheses.

Page 7: Diane Kelly, Filip Radlinski, Jaime Teevan

Overview Observational log analysis

What we can learn

Collecting log data

Cleaning log data (Filip)

Analyzing log data

Field observations (Diane)

Dumais, Jeffries, Russell, Tang & Teevan. “Understanding User Behavior through

Log Data and Analysis.”

Page 8: Diane Kelly, Filip Radlinski, Jaime Teevan

What We Can Learn

Observational Approaches to Information Retrieval

Page 9: Diane Kelly, Filip Radlinski, Jaime Teevan

David Foster Wallace

Mark Twain

Cowards die many times before their deaths.

Annotated by Nelson Mandela

I have discovered a truly marvelous proof ...which this margin is too narrow to contain.Pierre de Fermat

(1637)

Students prefer used textbooks that are annotated.

[Marshall 1998]

Page 10: Diane Kelly, Filip Radlinski, Jaime Teevan

Digital Marginalia Do we lose marginalia with digital documents? Internet exposes information experiences

Meta-data, annotations, relationships Large-scale information usage data

Change in focus With marginalia, interest is in the individual Now we can look at experiences in the aggregate

Page 11: Diane Kelly, Filip Radlinski, Jaime Teevan
Page 12: Diane Kelly, Filip Radlinski, Jaime Teevan

Practical Uses for Behavioral Data Behavioral data to improve Web search

Offline log analysis Example: Re-finding common, so add history support

Online log-based experiments Example: Interleave different rankings to find best algorithm

Log-based functionality Example: Boost clicked results in a search result list

Behavioral data on the desktop Goal: Allocate editorial resources to create Help docs How to do so without knowing what people search for?

Page 13: Diane Kelly, Filip Radlinski, Jaime Teevan

Value of Observational Log Analysis Focus of observational log analysis

Description: What do people currently do? Prediction: What will people do in similar situations?

Study real behavior in natural settings Understand how people search Identify real problems to study Improve ranking algorithms Influence system design Create realistic simulations and evaluations Build a picture of human interest

Page 14: Diane Kelly, Filip Radlinski, Jaime Teevan

Societal Uses of Behavioral Data Understand people’s information needs Understand what people talk about Impact public policy? (E.g., DonorsChoose.org)

Baeza-Yates, Dupret, Velasco. A study of mobile search queries in Japan. WWW 2007

Page 15: Diane Kelly, Filip Radlinski, Jaime Teevan

Personal Use of Behavioral Data Individuals now have a lot

of behavioral data Introspection of personal

data popular My Year in Status Status Statistics

Expect to see more As compared to others For a purpose

Page 16: Diane Kelly, Filip Radlinski, Jaime Teevan

Defining Behavioral Log Data Behavioral log data are:

Traces of natural behavior, seen through a sensor Examples: Links clicked, queries issued, tweets posted

Real-world, large-scale, real-time

Behavioral log data are not: Non-behavioral sources of large-scale data Collected data (e.g., poll data, surveys, census data)

Not recalled behavior or subjective impression

Page 17: Diane Kelly, Filip Radlinski, Jaime Teevan

Real-World, Large-Scale, Real-Time Private behavior is exposed

Example: Porn queries, medical queries

Rare behavior is common Example: Observe 500 million queries a day

Interested in behavior that occurs 0.002% of the time Still observe the behavior 10 thousand times a day!

New behavior appears immediately Example: Google Flu Trends

Page 18: Diane Kelly, Filip Radlinski, Jaime Teevan

Drawbacks Not controlled

Can run controlled log studies Discussed in morning tutorial (Filip)

Adversarial Cleaning log data later today (Filip)

Lots of missing information Not annotated, no demographics, we don’t know why Observing richer information after break (Diane)

Privacy concerns Collect and store data thoughtfully Next section addresses privacy

Page 19: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

learning to rank 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sigir 2014 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

xxx clubs on gold coast 10:14 pm 1/15/13 142039

sex videos 1:49 am 1/16/13 142039

Page 20: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

teen sex 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sex with animals 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

Information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

xxx clubs on gold coast 10:14 pm 1/15/14 142039

sex videos 1:49 am 1/16/14 142039

cheap digital camera 12:17 pm 1/15/14 554320

cheap digital camera 12:18 pm 1/15/14 554320

cheap digital camera 12:19 pm 1/15/14 554320

社会科学11:59 am 11/3/23

12:01 pm 11/3/23

Porn

Language

Spam

System

errors

Page 21: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

learning to rank 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sigir 2014 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

kangaroos 10:14 pm 1/15/14 142039

machine learning 1:49 am 1/16/14 142039

Page 22: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

learning to rank 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sigir 2014 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

kangaroos 10:14 pm 1/15/14 142039

machine learning 1:49 am 1/16/14 142039

Query typology

Page 23: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

learning to rank 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sigir 2014 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

kangaroos 10:14 pm 1/15/14 142039

machine learning 1:49 am 1/16/14 142039

Query typology

Query behavior

Page 24: Diane Kelly, Filip Radlinski, Jaime Teevan

Query Time User

sigir 2014 10:41 am 1/15/14 142039

goldcoast sofitel 10:44 am 1/15/14 142039

learning to rank 10:56 am 1/15/14 142039

sigir 2014 11:21 am 1/15/14 659327

ool transportation 11:59 am 1/15/14 318222

restaurants brisbane 12:01 pm 1/15/14 318222surf lessons 12:17 pm 1/15/14 318222

james allen 12:18 pm 1/15/14 142039

daytrips from brisbane 1:30 pm 1/15/14 554320

sigir 2014 1:30 pm 1/15/14 659327

sigir program 2:32 pm 1/15/14 435451

sigir2014.org 2:42 pm 1/15/14 435451

information retrieval 4:56 pm 1/15/14 142039

sigir 2014 5:02 pm 1/15/14 312055

kangaroos 10:14 pm 1/15/14 142039

machine learning 1:49 am 1/16/14 142039

Query typology

Query behavior

Long term trends

Uses of Analysis• Ranking

– E.g., precision• System design

– E.g., caching• User interface

– E.g., history• Test set

development• Complementary

research

Page 25: Diane Kelly, Filip Radlinski, Jaime Teevan

Surprises About Query Log Data From early log analysis

Examples: Jansen et al. 2000, Broder 1998 Scale: Term common if it appeared 100 times!

Queries are not 7 or 8 words long Advanced operators not used or “misused” Nobody used relevance feedback Lots of people search for sex Navigation behavior common Prior experience was with library search

Page 26: Diane Kelly, Filip Radlinski, Jaime Teevan

Surprises About Microblog Search?

Page 27: Diane Kelly, Filip Radlinski, Jaime Teevan

Ordered by time

Ordered by relevance

8 new tweets

Surprises About Microblog Search?

Page 28: Diane Kelly, Filip Radlinski, Jaime Teevan

Ordered by time

Ordered by relevance

8 new tweets

Surprises About Microblog Search?

• Time important• People important• Specialized syntax• Queries common• Repeated a lot• Change very little

• Often navigational• Time and people

less important• No syntax use• Queries longer• Queries develop

Page 29: Diane Kelly, Filip Radlinski, Jaime Teevan

Overview Observational log analysis

What we can learn Understand and predict user behavior

Collecting log data

Cleaning log data

Analyzing log data

Field observations

Page 30: Diane Kelly, Filip Radlinski, Jaime Teevan

Collecting Log Data

Observational Approaches to Information Retrieval

Page 31: Diane Kelly, Filip Radlinski, Jaime Teevan

How to Get Logs for Analysis Use existing logged data

Explore sources in your community (e.g., proxy logs) Work with a company (e.g., FTE, intern, visiting researcher)

Generate your own logs Focuses on questions of unique interest to you Examples: UFindIt, Wikispeedia

Construct community resources Shared software and tools

Client side logger (e.g., VIBE logger) Shared data sets Shared platform

Lemur Community Query Log Project

Page 32: Diane Kelly, Filip Radlinski, Jaime Teevan

Web Service Logs

Government contractor

Recruiting

Academic field

Example sources Search engine Commercial site

Types of information Queries, clicks, edits Results, ads, products

Example analysis Click entropy Teevan, Dumais & Liebling. To

Personalize or Not to Personalize: Modeling Queries with Variation in User Intent. SIGIR 2008

Page 33: Diane Kelly, Filip Radlinski, Jaime Teevan

Controlled Web Service Logs Example sources

Mechanical Turk Games with a purpose

Types of information Logged behavior Active feedback

Example analysis Search success Ageev, Guo, Lagun & Agichtein.

Find It If You Can: A Game for Modeling … Web Search Success Using Interaction Data. SIGIR 2011

Page 34: Diane Kelly, Filip Radlinski, Jaime Teevan

Public Web Service Content Example sources

Social network sites Wiki change logs

Types of information Public content Dependent on service

Example analysis Twitter topic models Ramage, Dumais & Liebling.

Characterizing microblogging using latent topic models. ICWSM 2010 j

http://twahpic.cloudapp.net

Page 35: Diane Kelly, Filip Radlinski, Jaime Teevan

Web Browser Logs Example sources

Proxy Logging tool

Types of information URL visits, paths followed Content shown, settings

Example analysis DiffIE Teevan, Dumais and Liebling. A

Longitudinal Study of How Highlighting Web Content Change Affects .. Interactions. CHI 2010

Page 36: Diane Kelly, Filip Radlinski, Jaime Teevan

Web Browser Logs Example sources

Proxy Logging tool

Types of information URL visits, paths followed Content shown, settings

Example analysis Revisitation Adar, Teevan and Dumais. Large

Scale Analysis of Web Revisitation Patterns. CHI 2008

Page 37: Diane Kelly, Filip Radlinski, Jaime Teevan

Rich Client-Side Logs Example sources

Client application Operating system

Types of information Web client interactions Other interactions – rich!

Example analysis Stuff I’ve Seen Dumais et al. Stuff I've Seen: A

system for personal information retrieval and re-use. SIGIR 2003

Page 38: Diane Kelly, Filip Radlinski, Jaime Teevan

dumais

beijing

sigir 2014

vancouver

A Simple Example Logging search Queries and Clicked Results

Web Service

Web Service

Web Service

“SERP”

chi 2014

Page 39: Diane Kelly, Filip Radlinski, Jaime Teevan

A Simple Example

Logging Queries Basic data: <query, userID, time>

Which time? timeClient.send, timeServer.receive, timeServer.send, timeClient.receive

Additional contextual data: Where did the query come from? What results were returned? What algorithm or presentation was used? Other metadata about the state of the system

Page 40: Diane Kelly, Filip Radlinski, Jaime Teevan

A Simple Example

Logging Clicked Results (on the SERP) How can a Web service know which SERP links are clicked?

Proxy re-direct Script (e.g., JavaScript)

Dom and cross-browser challenges, but can instrument more than link clicks No download required; but adds complexity and latency, and may influence user

interaction What happened after the result was clicked?

What happens beyond the SERP is difficult to capture Browser actions (back, open in new tab, etc.) are difficult to capture To better interpret user behavior, need richer client instrumentation

http://www.chi2014.org vs. http://redir.service.com/?q=chi2014&url=http://www.chi2014.org/&pos=3&log=DiFVYj1tRQZtv6e1FF7kltj02Z30eatB2jr8tJUFR

<img border="0" id="imgC" src=“image.gif" width="198" height="202" onmouseover="changeImage()" onmouseout="backImage()"><script lang="text/javascript"> function changeImage(){ document.imgC.src="thank_you..gif “; } function backImage(){ document.imgC.src=“image.gif"; }</script>

Page 41: Diane Kelly, Filip Radlinski, Jaime Teevan

A (Not-So-) Simple Example Logging: Queries, Clicked Results, and Beyond

Page 42: Diane Kelly, Filip Radlinski, Jaime Teevan

What to Log Log as much as possible

Time keyed events, e.g.: <time, userID, action, value, context> Ideal log allows user experience to be fully reconstructed

But … make reasonable choices Richly instrumented client experiments can provide guidance Consider the amount of data, storage required

Challenges with scale Storage requirements

1k bytes/record x 10 records/query x 100 mil queries/day = 1000 Gb/day Network bandwidth

Client to server; Data center to data center

Page 43: Diane Kelly, Filip Radlinski, Jaime Teevan

What to Do with the Data Keep as much raw data as possible

And allowable Must consider Terms of Service, IRB

Post-process data to put into a usable form Integrate across servers to organize the data

By time By userID

Normalize time, URLs, etc. Rich data cleaning

Page 44: Diane Kelly, Filip Radlinski, Jaime Teevan

Practical Issues: Time Time

Client time is closer to the user, but can be wrong or reset Server time includes network latencies, but controllable In both cases, need to synchronize time across multiple machines

Data integration Ensure that joins of data are all using the same basis

(e.g., UTC vs. local time)

Accurate timing data is critical for understanding the sequence of user activities, daily temporal patterns, etc.

Page 45: Diane Kelly, Filip Radlinski, Jaime Teevan

Practical Issues: Users Http cookies, IP address, temporary ID

Provides broad coverage and easy to use, but … Multiple people use same machine Same person uses multiple machines (and browsers)

How many cookies did you use today? Lots of churn in these IDs

Jupiter Res (39% delete cookies monthly); Comscore (2.5x inflation) Login or download client code (e.g., browser plug-in)

Better correspondence to people, but … Requires sign-in or download Results in a smaller and biased sample of people or data (who

remember to login, decided to download, etc.) Either way, loss of data

Page 46: Diane Kelly, Filip Radlinski, Jaime Teevan

Using the Data Responsibly What data is collected and how it can be used?

User agreements (terms of service) Emerging industry standards and best practices

Trade-offs More data:

More intrusive and potential privacy concerns, but also more useful for understanding interaction and improving systems

Less data: Less intrusive, but less useful

Risk, benefit, and trust

Page 47: Diane Kelly, Filip Radlinski, Jaime Teevan

August 4, 2006: Logs released to academic community 3 months, 650 thousand users, 20 million queries Logs contain anonymized User IDs

August 7, 2006: AOL pulled the files, but already mirrored August 9, 2006: New York Times identified Thelma Arnold

“A Face Is Exposed for AOL Searcher No. 4417749” Queries for businesses, services in Lilburn, GA (pop. 11k) Queries for Jarrett Arnold (and others of the Arnold clan) NYT contacted all 14 people in Lilburn with Arnold surname When contacted, Thelma Arnold acknowledged her queries

August 21, 2006: 2 AOL employees fired, CTO resigned September, 2006: Class action lawsuit filed against AOL

AnonID Query QueryTime ItemRank ClickURL---------- --------- --------------- ------------- ------------1234567 uw cse 2006-04-04 18:18:18 1 http://www.cs.washington.edu/1234567 uw admissions process 2006-04-04 18:18:18 3 http://admit.washington.edu/admission1234567 computer science hci 2006-04-24 09:19:321234567 computer science hci 2006-04-24 09:20:04 2 http://www.hcii.cmu.edu1234567 seattle restaurants 2006-04-24 09:25:50 2 http://seattletimes.nwsource.com/rests1234567 perlman montreal 2006-04-24 10:15:14 4 http://oldwww.acm.org/perlman/guide.html1234567 uw admissions notification 2006-05-20 13:13:13…

Example: AOL Search Dataset

Page 48: Diane Kelly, Filip Radlinski, Jaime Teevan

Example: AOL Search Dataset Other well known AOL users

User 711391 i love alaska http://www.minimovies.org/documentaires/view/ilovealaska

User 17556639 how to kill your wife User 927

Anonymous IDs do not make logs anonymous Contain directly identifiable information

Names, phone numbers, credit cards, social security numbers Contain indirectly identifiable information

Example: Thelma’s queries Birthdate, gender, zip code identifies 87% of Americans

Page 49: Diane Kelly, Filip Radlinski, Jaime Teevan

Example: Netflix Challenge October 2, 2006: Netflix announces contest

Predict people’s ratings for a $1 million dollar prize 100 million ratings, 480k users, 17k movies Very careful with anonymity post-AOL

May 18, 2008: Data de-anonymized Paper published by Narayanan & Shmatikov Uses background knowledge from IMDB Robust to perturbations in data

December 17, 2009: Doe v. Netflix March 12, 2010: Netflix cancels second competition

Ratings1: [Movie 1 of 17770]12, 3, 2006-04-18 [CustomerID, Rating, Date]1234, 5 , 2003-07-08 [CustomerID, Rating, Date]2468, 1, 2005-11-12 [CustomerID, Rating, Date]…

Movie Titles…10120, 1982, “Bladerunner”17690, 2007, “The Queen”…

All customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy. . . Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because only a small sample was included (less than one tenth of our complete dataset) and that data was subject to perturbation.

Page 50: Diane Kelly, Filip Radlinski, Jaime Teevan

Using the Data Responsibly Control access to the data

Internally: Access control; data retention policy Externally: Risky (e.g., AOL, Netflix, Enron, Facebook public)

Protect user privacy Directly identifiable information

Social security, credit card, driver’s license numbers Indirectly identifiable information

Names, locations, phone numbers … you’re so vain (e.g., AOL) Putting together multiple sources indirectly (e.g., Netflix, hospital records)

Linking public and private data k-anonymity; Differential privacy; etc.

Transparency and user control Publicly available privacy policy Give users control to delete, opt-out, etc.

Page 51: Diane Kelly, Filip Radlinski, Jaime Teevan

Overview Observational log analysis

What we can learn Understand and predict user behavior

Collecting log data Not as simple as it seems

Cleaning log data – Filip!

Analyzing log data

Field observations

Page 52: Diane Kelly, Filip Radlinski, Jaime Teevan

[Filip on data cleaning]

52

Page 53: Diane Kelly, Filip Radlinski, Jaime Teevan

Observational Approaches to Information RetrievalSIGIR 2014 Tutorial: Choices and Constraints (Part II)

Diane Kelly, Filip Radlinski, Jaime Teevan

Page 54: Diane Kelly, Filip Radlinski, Jaime Teevan

Overview Observational log analysis

What we can learn Understand and predict user behavior

Collecting log data Not as simple as it seems

Cleaning log data Significant portion of log analysis about cleaning

Analyzing log data

Field observations

Page 55: Diane Kelly, Filip Radlinski, Jaime Teevan

Analyzing Log Data

Observational Approaches to Information Retrieval

Page 56: Diane Kelly, Filip Radlinski, Jaime Teevan

Develop Metrics to Capture Behavior

[Joachims 2002]

Sessions 2.20 queries long

[Silverstein et al. 1999]

[Lau and Horvitz, 1999]

Navigational, Informational, Transactional

[Broder 2002]

2.35 terms[Jansen et al. 1998]

Queries appear 3.97 times[Silverstein et al. 1999]

Summary measures Query frequency Query length

Analysis of query intent Query types and topics

Temporal features Session length Common re-formulations

Click behavior Relevant results for query Queries that lead to clicks

Page 57: Diane Kelly, Filip Radlinski, Jaime Teevan

Develop Metrics to Capture Behavior

Lee, Teevan, de la Chica. Characterizing multi-click search behavior. SIGIR 2014

Page 58: Diane Kelly, Filip Radlinski, Jaime Teevan

Partitioning the Data Language Location Time User activity Individual Entry point Device System variant

Baeza-Yates, Dupret, Velasco. A study of mobile search queries in Japan. WWW 2007

Page 59: Diane Kelly, Filip Radlinski, Jaime Teevan

Partition by Time

Periodicities Spikes Real-time data

New behavior Immediate feedback

Individual Within session Across sessions

Beitzel, et al. Hourly analysis of a .. topically categorized web query log. SIGIR 2004

Page 60: Diane Kelly, Filip Radlinski, Jaime Teevan

Partition by User

Temporary ID (e.g., cookie, IP address) High coverage but high churn Does not necessarily map directly to users

User account Only a subset of users

Teevan, Adar, Jones, Potts. Information re-retrieval: Repeat queries … SIGIR 2007

Page 61: Diane Kelly, Filip Radlinski, Jaime Teevan

Partition by System Variant Also known as controlled experiments Some people see one variant, others another Example: What color for search result links?

Bing tested 40 colors Identified #0044CC Value: $80 million

Page 62: Diane Kelly, Filip Radlinski, Jaime Teevan

Considerations When Partitioning Choose comparison groups carefully

From the same time period With comparable users, tasks, etc.

Log a lot because it can be hard to recreate state Which partition did a particular behavior fall into?

Confirm partitions with metrics that should be the same

White, Dumais, Teevan. Characterizing the influence of domain expertise... WSDM 2009

Page 63: Diane Kelly, Filip Radlinski, Jaime Teevan

Interpreting Significant Metrics Often, everything is significant

Adar, Teevan, Dumais. Large scale analysis of web revisitation patterns. CHI 2008

Page 64: Diane Kelly, Filip Radlinski, Jaime Teevan

Interpreting Significant Metrics Everything is significant, but not always meaningful

“All differences significant except when noted.” Choose the metrics you care about first Look for converging evidence

Look at the data

Beware: Typically very high variance Large variance by user, task, noise Calculate empirically

Page 65: Diane Kelly, Filip Radlinski, Jaime Teevan

Confidence Intervals Confidence interval (C.I.):

Interval around the treatment mean that contains the true value of the mean x% (typically 95%) of the time

Gives useful information about the size of the effect and its practical significance

C.I.s that do not contain the control mean are statistically significant (statistically different from the control)

This is an independent test for each metric Thus you will get 1 in 20 results (for 95% C.I.s) that are

spurious Challenge: You don't know which ones are spurious

Page 66: Diane Kelly, Filip Radlinski, Jaime Teevan

Confidence Intervals

Lee, Teevan, de la Chica. Characterizing multi-click search behavior. SIGIR 2014

Radlinski, Kurup, Joachims. How does clickthrough data reflect retrieval quality? CIKM 2008.

Page 67: Diane Kelly, Filip Radlinski, Jaime Teevan

When Significance Is Wrong Sometimes there is spurious significance

Confidence interval only tells you there is a 95% chance that this difference is real; not 100%

If only a few things significant, chance a likely explanation Sometimes you will miss significance

Because the true difference is tiny/zero or because you don’t have enough power

If you did your sizing right, you have enough power to see all the differences of practical significance

Sometimes reason for change is unexpected Look at many metrics to get a big picture

Chilton, Teevan. Addressing Info. Needs Directly in the Search Result Page. WWW 2011

Page 68: Diane Kelly, Filip Radlinski, Jaime Teevan

Be Thoughtful When Combining Metrics 1995 and 1996 performance != Combined performance

Simpsons Paradox Changes in mix (denominators) make combined metrics

(ratios) inconsistent with yearly metrics

Batting Average

1995 1996 Combined

Hits At Bat Hits At Bat Hits At Bat

Derek Jeter 12 48 183 582 195 630

.250 .314 .310David Justice 104 411 45 140 149 551

.253 .321 .270

Page 69: Diane Kelly, Filip Radlinski, Jaime Teevan

Detailed Analysis Big Picture Not all effects will point the same direction

Take a closer look at the items going in the “wrong” direction Can you interpret them?

E.g., people are doing fewer next-pages because they are finding their answer on the first page

Could they be artifactual? What if they are real?

What should be the impact on your conclusions? on your decision?

Significance and impact are not the same thing Looking at % change vs. absolute change helps Effect size depends on what you want to do with the data

Page 70: Diane Kelly, Filip Radlinski, Jaime Teevan

Beware of Tyranny of the Data Can provide insight into behavior

Example: What is search for, how needs are expressed Can be used to test hypotheses

Example: Compare ranking variants or link color Can only reveal what can be observed Cannot tell you what you cannot observe

Example: Nobody uses Twitter to re-find

Page 71: Diane Kelly, Filip Radlinski, Jaime Teevan

People’s intent People’s success People’s experience People’s attention People’s beliefs

Behavior can mean many things 81% of search sequences ambiguous

[Viermetz et al. 2006]

<Back to results>

<Back to results>7:16 – Try new engine

What Logs Cannot Tell Us

<Open in new tab>

<Open in new tab>7:16 – Read Result 17:20 – Read Result 37:27 –Save links locally

7:12 – Query

7:14 – Click Result 1

7:15 – Click Result 3

Page 72: Diane Kelly, Filip Radlinski, Jaime Teevan

HCI

Example: Click Entropy Question: How ambiguous

is a query? Approach: Look at

variation in clicks Measure: Click entropy

Low if no variation human computer …

High if lots of variation hci

Companies

Wikipedia disambiguation HCI

Teevan, Dumais, Liebling. To personalize or not to personalize... SIGIR 2008

Page 73: Diane Kelly, Filip Radlinski, Jaime Teevan

Which Has Less Variation in Clicks? www.usajobs.gov v. federal government jobs

find phone number v. msn live search

singapore pools v. singaporepools.com

tiffany v. tiffany’s

nytimes v. connecticut newspapers

campbells soup recipes v. vegetable soup recipe

soccer rules v. hockey equipment

?

?

?

Results change

Result quality varies

Tasks impacts # of clicks

Clicks/user = 1.1 Clicks/user = 2.1

Click position = 2.6 Click position = 1.6

Result entropy = 5.7 Result entropy = 10.7

Page 74: Diane Kelly, Filip Radlinski, Jaime Teevan

Supplementing Log Data Enhance log data

Collect associated information Example: For browser logs, crawl visited webpages

Instrumented panels Converging methods

Usability studies Eye tracking Surveys Field studies Diary studies

Page 75: Diane Kelly, Filip Radlinski, Jaime Teevan

Large-scale log analysis of re-finding

Do people know they are re-finding? Do they mean to re-find the result they do? Why are they returning to the result?

Small-scale critical incident user study Browser plug-in that logs queries and clicks Pop up survey on repeat clicks and 1/8 new clicks

Insight into intent + Rich, real-world picture Re-finding often targeted towards a particular URL Not targeted when query changes or in same session

Example: Re-Finding Intent

Tyler, Teevan. Large scale query log analysis of re-finding. WSDM 2010

Page 76: Diane Kelly, Filip Radlinski, Jaime Teevan

Example: Curious Browser Browser plug-in to examine relationship between implicit and explicit behavior

Capture many implicit actions (e.g., click, click position, dwell time, scroll) Probe for explicit user judgments of relevance of a page to the query

Deployed to ~4k people in US and Japan Learned models to predict explicit judgments from implicit indicators

45% accuracy w/ just click; 75% accuracy w/ click + dwell + session Used to identify important features; then apply model in open loop setting

Fox, et al. Evaluating implicit measures to improve the search experience. TOIS 2005

Page 77: Diane Kelly, Filip Radlinski, Jaime Teevan

Overview Observational log analysis

What we can learn Partition logs to observe behavior

Collecting log data Not as simple as it seems

Cleaning log data Clean and sanity check

Analyzing log data Big picture more important than individual metrics

Field observations – Diane!

Page 78: Diane Kelly, Filip Radlinski, Jaime Teevan

[Diane on field observations]

78