1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein...
-
Upload
willis-morris -
Category
Documents
-
view
217 -
download
1
Transcript of 1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein...
11
Learning User Interaction Models for Predicting Web Search Result Preferences
Eugene AgichteinEric BrillSusan DumaisRobert Ragno
Microsoft
Research
22
User InteractionsUser Interactions Goal: Harness rich user interactions with Goal: Harness rich user interactions with
search results to improve quality of searchsearch results to improve quality of search
Millions of users submit queries daily and Millions of users submit queries daily and interact with the search results interact with the search results – Clicks, query refinement, dwell timeClicks, query refinement, dwell time
User interactions with search engines are User interactions with search engines are plentiful, but require careful interpretationplentiful, but require careful interpretation
We will predict user preferences for resultsWe will predict user preferences for results
33
Related WorkRelated Work Linking implicit interactions and explicit Linking implicit interactions and explicit
judgmentsjudgments– Fox et al. [TOIS 2005]Fox et al. [TOIS 2005]
Predict explicit satisfaction rating Predict explicit satisfaction rating
– Joachims [SIGIR 2005 ]Joachims [SIGIR 2005 ] Predict preference (gaze studies, interpretation Predict preference (gaze studies, interpretation
strategies)strategies)
More broad overview of analyzing implicit More broad overview of analyzing implicit interactions: interactions: Kelly & Teevan [SIGIR Forum Kelly & Teevan [SIGIR Forum 2003]2003]
44
OutlineOutline Distributional model of user Distributional model of user
interactionsinteractions– User Behavior = Relevance + “Noise”User Behavior = Relevance + “Noise”
Rich set of user interaction featuresRich set of user interaction features
Learning framework to predict user Learning framework to predict user preferencespreferences
Large-scale evaluationLarge-scale evaluation
55
Interpreting User Interpreting User InteractionsInteractions Clickthrough and subsequent browsing behavior Clickthrough and subsequent browsing behavior
of of individual individual users influenced by many factorsusers influenced by many factors– Relevance of a result to a queryRelevance of a result to a query– Visual appearance and layoutVisual appearance and layout– Result presentation orderResult presentation order– Context, history, etc.Context, history, etc.
General idea: General idea: – Aggregate interactionsAggregate interactions across all users and queries across all users and queries– Compute “expected” behaviorCompute “expected” behavior for any query/page for any query/page– Recover relevance signalRecover relevance signal for a given query for a given query
66
Case Study: ClickthroughCase Study: Clickthrough
Clickthrough frequency for all queries in sampleClickthrough frequency for all queries in sample
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
result position
Re
lati
ve
Cli
ck
Fre
qu
en
cy
All queries
Clickthrough (query q, document d, result position p) = expected (p) + relevance (q , d)
77
Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult
Relative clickthrough for queries top relevant result known to be at position 1
1 2 3 5 10
Result Position
Re
lati
ve
Cli
ck
Fre
qu
en
cy
All queries
PTR=1
88
Clickthrough for Queries with Clickthrough for Queries with Known Position of Top Relevant Known Position of Top Relevant ResultResult
Relative clickthrough for queries with known relevant results in position 1 and 3 respectively
1 2 3 5 10
Result Position
Re
lati
ve
Cli
ck
Fre
qu
en
cy
All queries
PTR=1
PTR=3
Higher clickthrough at top non-relevant than
at top relevant document
99
Deviation from ExpectedDeviation from Expected
Relevance component: Relevance component: deviationdeviation from from “expected”:“expected”:Relevance(q , d)= observed - expected (p)
-0.023-0.029
-0.009-0.001
-0.013
0.010-0.002 -0.001
0.144
0.063
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
1 2 3 5 10
Result position
Clic
k f
req
ue
nc
y d
ev
iati
on
PTR=1
PTR=3
1010
Beyond Clickthrough: Beyond Clickthrough: Rich User Interaction SpaceRich User Interaction Space Observed and Distributional featuresObserved and Distributional features
– Observed features: aggregated values over all user Observed features: aggregated values over all user interactions for each query and result pairinteractions for each query and result pair
– Distributional features: deviations from the Distributional features: deviations from the “expected” behavior for the query“expected” behavior for the query
Represent user interactions as vectors in Represent user interactions as vectors in “Behavior Space”“Behavior Space”– PresentationPresentation: what a user sees : what a user sees beforebefore click click– ClickthroughClickthrough: frequency and timing of clicks: frequency and timing of clicks– BrowsingBrowsing: what users do : what users do afterafter the click the click
1111
Some User Interaction FeaturesSome User Interaction Features
PresentationPresentation
ResultPositionResultPosition Position of the URL in Current rankingPosition of the URL in Current ranking
QueryTitleOverlaQueryTitleOverlapp
Fraction of query terms in result TitleFraction of query terms in result Title
Clickthrough Clickthrough
DeliberationTimeDeliberationTime Seconds between query and first clickSeconds between query and first click
ClickFrequencyClickFrequency Fraction of all clicks landing on pageFraction of all clicks landing on page
ClickDeviationClickDeviation Deviation from expected click Deviation from expected click frequencyfrequency
Browsing Browsing
DwellTimeDwellTime Result page dwell timeResult page dwell time
DwellTimeDeviatiDwellTimeDeviationon
Deviation from expected dwell time for Deviation from expected dwell time for queryquery
1212
OutlineOutline
Distributional model of user Distributional model of user interactionsinteractions
Rich set of user interaction featuresRich set of user interaction features
Models for predicting user Models for predicting user preferencespreferences
Experimental resultsExperimental results
1313
Predicting Result Predicting Result PreferencesPreferences Task: predict pairwise preferencesTask: predict pairwise preferences
– A user will prefer Result A > ResultA user will prefer Result A > Result BB
Models for preference prediction Models for preference prediction – Current search engine rankingCurrent search engine ranking– ClickthroughClickthrough– Full user behavior modelFull user behavior model
1414
Clickthrough ModelClickthrough Model SA+N: “Skip Above” and “Skip Next”SA+N: “Skip Above” and “Skip Next”
– Adapted from Joachims’ et al. [SIGIR’05]Adapted from Joachims’ et al. [SIGIR’05]– Motivated by gaze tracking Motivated by gaze tracking
ExampleExample– Click on results 2, 4Click on results 2, 4– Skip Above: 4 > (1, 3), 2>1Skip Above: 4 > (1, 3), 2>1– Skip Next: 4 > 5, 2>3Skip Next: 4 > 5, 2>3
1
2
3
4
5
6
7
8
1515
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
1 2 3 4 5
Result position
Cli
ck
thro
ug
h F
req
ue
nc
y D
ev
iati
on
Distributional ModelDistributional Model CD: distributional model, extends SA+NCD: distributional model, extends SA+N
– Clickthrough considered iff frequency > Clickthrough considered iff frequency > εε than than expected expected
Click on result 2 likely “by chance”Click on result 2 likely “by chance” 4>(1,2,3,5), but 4>(1,2,3,5), but notnot 2>(1,3) 2>(1,3)
1
2
3
4
5
6
7
8
1616
User Behavior ModelUser Behavior Model Full set of interaction featuresFull set of interaction features
– Presentation, clickthrough, browsingPresentation, clickthrough, browsing
TrainTrain the model with explicit judgments the model with explicit judgments– Input: behavior feature vectors for each query-Input: behavior feature vectors for each query-
page pair in rated resultspage pair in rated results
– Use Use RankNetRankNet (Burges et al., [ICML 2005]) (Burges et al., [ICML 2005]) to discover model weightsto discover model weights
– Output: a neural net that can assign a Output: a neural net that can assign a “relevance” score to a behavior feature vector“relevance” score to a behavior feature vector
1717
RankNet for User RankNet for User BehaviorBehavior
RankNet: general, scalable, robust RankNet: general, scalable, robust Neural Net training algorithms and Neural Net training algorithms and implementationimplementation
Optimized for Optimized for rankingranking – predicting an – predicting an ordering of items, not scores for eachordering of items, not scores for each
Trains on pairsTrains on pairs (where first point is to be (where first point is to be ranked higher or equal to second)ranked higher or equal to second)– Extremely efficientExtremely efficient– Uses Uses cross entropy costcross entropy cost (probabilistic model) (probabilistic model)– UsesUses gradient descentgradient descent to set weights to set weights – Restarts to escape local minimaRestarts to escape local minima
1818
OutlineOutline
Distributional model of user Distributional model of user interactionsinteractions
Rich set of user interaction featuresRich set of user interaction features
Models for predicting user Models for predicting user preferencespreferences
Experimental evaluationExperimental evaluation
1919
Evaluation MetricsEvaluation Metrics Task: predict user preferencesTask: predict user preferences
Pairwise agreement:Pairwise agreement:– For comparison with previous workFor comparison with previous work– Useful for ranking and other applicationsUseful for ranking and other applications
Precision for a query:Precision for a query:– Fraction of pairs predicted that agree with Fraction of pairs predicted that agree with
preferences derived from human ratingspreferences derived from human ratings
Recall for a query:Recall for a query:– Fraction of human-rated preferences predicted Fraction of human-rated preferences predicted
correctlycorrectly
Average Precision and Recall across all queries Average Precision and Recall across all queries
2020
DatasetsDatasets
Explicit judgmentsExplicit judgments– 3,500 queries, top 10 results, relevance 3,500 queries, top 10 results, relevance
ratings converted to pairwise preferences for ratings converted to pairwise preferences for each query each query
User behavior dataUser behavior data– Opt-in client-side instrumentationOpt-in client-side instrumentation– Anonymized UserID, time, visited pageAnonymized UserID, time, visited page
Detect queries submitted to MSN Search engineDetect queries submitted to MSN Search engine Subsequent visited pagesSubsequent visited pages 120,000 instances of these 3,500 queries 120,000 instances of these 3,500 queries
submitted at least 2 times over 21 dayssubmitted at least 2 times over 21 days
2121
Methods ComparedMethods Compared
Preferences inferred by:Preferences inferred by:
Current search engine ranking: Current search engine ranking: BaselineBaseline– Result Result i i > Result > Result jj iff iff i i > > jj
Clickthrough model: Clickthrough model: SA+NSA+N
Clickthrough distributional model: Clickthrough distributional model: CDCD
Full user behavior model: Full user behavior model: UserBehaviorUserBehavior
2222
Results: Predicting User Results: Predicting User PreferencesPreferences
SA+N
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0 0.1 0.2 0.3 0.4
Recall
Pre
cis
ion
SA+N
CD
UserBehavior
Baseline
• Baseline < SA+N < CD << UserBehavior• Rich user behavior features result in dramatic improvement
2323
Contribution of Feature TypesContribution of Feature Types
• Presentation features not helpful• Browsing features: higher precision, lower recall• Clickthrough features > CD: due to learning
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.01 0.05 0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45
Recall
Pre
cisi
on
ClickthroughPresentationBrowsing
2424
Amount of Interaction Amount of Interaction DataData
0.65
0.67
0.69
0.71
0.73
0.75
0.77
0.79
0.81
0.83
0.85
0.01 0.05 0.09 0.13 0.17 0.21 0.25 0.29 0.33 0.37 0.41 0.45 0.49
Recall
Pre
cisi
on
2 or more
10 or more
20 or more
• Prediction accuracy for varying amount of user interactions per query• Slight increase in Recall, substantial increase in Precision
2525
Learning CurveLearning Curve
0
0.05
0.1
0.15
0.2
7 12 17 21
Days of user interactions observed
Rec
all
ClickDeviation
UserBehavior
• Minimum precision of 0.7• Recall increases substantially with more days of user interactions
2626
Experiments SummaryExperiments Summary Clickthrough distributional model: Clickthrough distributional model:
more accurate than previously more accurate than previously published workpublished work
Rich user behavior features: dramatic Rich user behavior features: dramatic accuracy improvementaccuracy improvement
Accuracy increases for frequent Accuracy increases for frequent queries and longer observation periodqueries and longer observation period
2727
Some ApplicationsSome Applications Web search ranking (next talk):Web search ranking (next talk):
– Can use preference predictions to re-rank resultsCan use preference predictions to re-rank results– Can integrate features into ranking algorithmsCan integrate features into ranking algorithms
Identifying and answering navigational Identifying and answering navigational queries queries – Can tune model to focus on top 1 result Can tune model to focus on top 1 result – Supports classification or ranking methodsSupports classification or ranking methods– Details in Agichtein & Zheng, [KDD 2006]Details in Agichtein & Zheng, [KDD 2006]
Automatic evaluation: augment explicit Automatic evaluation: augment explicit relevance judgments relevance judgments
2828
ConclusionsConclusions
General framework for training General framework for training rich user interaction modelsrich user interaction models
Robust techniques for inferring Robust techniques for inferring user relevance preferencesuser relevance preferences
High-accuracy preference High-accuracy preference prediction in a large scale prediction in a large scale evaluationevaluation
2929
Thank youThank you
Text Mining, Search, and Navigation group: http://research.microsoft.com/tmsn/
Adaptive Systems and Interaction group:http://research.microsoft.com/adapt/
Microsoft
Research
3030
Presentation FeaturesPresentation Features
Query terms in Title, Summary, Query terms in Title, Summary, URLURL
Position of result Position of result Length of URLLength of URL Depth of URLDepth of URL ……
3131
Clickthrough FeaturesClickthrough Features
Fraction of clicks on URLFraction of clicks on URL Deviation from “expected” given Deviation from “expected” given
result positionresult position Time to clickTime to click Time to first click in “session”Time to first click in “session” Deviation from average time for Deviation from average time for
queryquery
3232
Browsing FeaturesBrowsing Features Time on URLTime on URL Cumulative time on URL Cumulative time on URL
(CuriousBrowser)(CuriousBrowser) Deviation from average time on URLDeviation from average time on URL
– Averaged over the “user”Averaged over the “user”– Averaged over all results for the queryAveraged over all results for the query
Number of subsequent non-result Number of subsequent non-result URLsURLs