Using Deep Learning to Win the Booking.com WSDM WebTour21 ...
Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip...
-
Upload
eugenia-todd -
Category
Documents
-
view
212 -
download
0
Transcript of Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip...
Personalizing Web Search using Long Term Browsing History
Nicolaas Matthijs, Cambridge
Filip Radlinski, Microsoft
In Proceedings of WSDM 2011 1
Relevant result
“pia workshop”Query:
2
Outline
Approaches to personalizationThe proposed personalization strategyEvaluation metricsResultsConclusions and Future work
3
Approaches to Personalization
Observed user interactionsShort-term interests
Sriram et al. [24] and [6], session data is too sparse to personalize
Longer-term interests[23, 16]: model users by classifying previously visited Web pagesJoachims [11]: user click-through data to learn a search functionPClink [7] and Teevan et al. [28]Other related approaches: [20, 25, 26]
Representing the userTeevan et al. [28], rich keyword-based representations, no use of
web page characteristics
Commercial personalization systemsGoogleYahoo!
rich user profile
4
promote URLs
Personalization Strategy
Title Unigrams
Metadata description Unigrams
Full text Unigrams
Metadata keywords
Extracted Terms
Noun phrases
BrowsingHistory
User Profile Terms
User Profile Terms
WordNet DictionaryFiltering
Google N-GramFiltering
No Filtering
TF Weighting
TFxIDF Weighting
BM25 Weighting
User Profile Termsand Weights
Visited URLs +number of visits
Previous searches &click-through data
Data Extraction Filtering
Weighting
User Profile Generation Workflow5
Personalized Search
query
6
dog 1cat 10india 2mit 4search 93amherst 12vegas 1
BrowsingHistory
Firefox add-on: AlterEgo
Personalized Search
query
dog cat monkey banana
food
baby infant
child boy girl
forest hiking
walking gorp
baby infant
child boy girl
csail mit artificial research
robotweb
search retrieval ir
hunt
7
dog 1cat 10india 2mit 4search 93amherst 12vegas 1
Data extraction
User Profile Terms
1.6 0.26.0
0.2 2.7
1.3
Personalized Search
query
web search retrieval ir hunt
1.38
dog 1cat 10india 2mit 4search 93amherst 12vegas 1
Term weighting
Term Weighting
TF: term frequency
TF-IDF:
wTF(ti)
cow search cow
ir huntdog
=0.02
9
TF 2 100
wTF(ti)= * wTF(ti) 1
log(DFti)
dog cat monkey banana
food
baby infant
child boy cow
forest cow
walking gorp
baby infant
child boy girl
csail mit artificial research
robotcow
searchcowir
huntdog
* = 0.08
TF-IDF 2 100
1 log(103/107)
0.3 0.7 0.1 0.23 0.6 0.6
0.002 0.7 0.1 0.01 0.6
0.2 0.8 0.1 0.001
0.3 0.4
0.1 0.7 0.001
0.23 0.6
0.1 0.7 0.001 0.23 0.6
0.1 0.05
0.5 0.35 0.3
N
ni
Term Weighting
Personalized BM25
World
ri R
(rti+0.5)(N-nti
+0.5)
(nti+0.5)(R-rti
+0.5)wpBM25(ti)=log
10
Re-rankingUse the user profile to re-rank top results returned by a
search engineCandidate document vs. snippets
Snippets are more effective. Teevan et al. [28]Allow straightforward personalization implementation
MatchingFor each term occurs both in snippet and user profile, its weight will be
added to the snippet’s score
Unique matchingCounts each unique term once
Language modelLanguage model for user profile, weights for terms are used as
frequency counts
PClink Dou et al. [7]11
Scoring methods
Evaluation Metrics
Relevance judgementsNDCG@10 = Σ
Side-by-sideTwo alternative rankings side-by-side, ask users to vote for
best
Clickthrough-basedLook at the query and click logs from large search engine
InterleavedNew metric for personalized searchCombine results of two search rankings (alternating
between results, omitting duplicates) 12
Z
1 i=1
10 2reli - 1
log2(1+i)
Offline Evaluation
6 participants, 2 months of browsing historyJudge relevance of top 50 pages returned by
Google for 12 queries25 general queries (16 from TREC 2009 Web
search track), each participant will judge 6Most recent 40 search queries, judge 5Each participant took about 2.5 hours to complete
13
Offline Evaluation
14
Strategy Profile Parameters Ranking Parameters
Full text
Title Meta keywords
MetaDescr.
Extracted terms
Noun Phrases
Term weights
SnippetScoring
Google rank
URLsvisited
MaxNDCG - Rel Rel - - Rel TF-IDF LM 1/log v=10
MaxQuer - - - - Rel Rel TF LM 1/log v=10
MaxNoRank - - Rel - - - TF LM - v=10
MaxBestPar - Rel Rel - Rel - pBM25 LM 1/log v=10
Personalization strategies. Rel: relative weighting
MaxNDCG: yields highest average NDCG MaxQuer: improves the most queries MaxNoRank: the method with highest NDCG that does not take the original
Google ranking into account MaxBestPar: obtained by greedily selecting each parameter sequentially
Offline Evaluation
15
Method Average NDCG +/=/- Queries
Google 0.502 ± 0.067 -
Teevan et al. [28] 0.518 ± 0.062 44/0/28
PClink 0.533 ± 0.057 13/58/1
MaxNDCG 0.573 ± 0.042 48/1/23
MaxQuer 0.567 ± 0.045 52/2/18
MaxNoRank 0.520 ± 0.060 13/52/7
MaxBestPar 0.566 ± 0.044 45/5/22
Offline evaluation performance
MaxNDCG and MaxQuer are both significantly better Interestingly, MaxNoRank is significantly better than Google and Teevan (may
be due to overfitting on small offline data) PClink improves fewest queries, but better than Teevan on average NDCG
Offline Evaluation
16
Distribution of relevance at rank for Google and MaxNDCG rankings
3600 relevance judgements collected, 9% Very Relevant, 32% Relevant, 58% Non-Relevant
Google:places many Very Relevant results in Top 5 MaxNDCG: adds more Very Relevant results into Top 5, and succeeds in adding
Very Relevant results between Top 5 and Top 10
Online Evaluation
17
Large-scale interleaved evaluation, users performing day-to-day real searches
The first 50 results requested from Google, personalization strategies were picked randomly
Exploit Team-Draft interleaving algorithm [18] to produce a combined ranking
41 users, 7997 queries, 6033 query impressions, 6534 queries and 5335 query impressions received a click
Online Evaluation
18
Method Queries Google Vote Re-ranked Vote
MaxNDCG 2090 624(39.5%) 955(60.5%)
MaxQuer 2273 812(47.3%) 905(52.7%)
MaxBestPar 2171 734(44.8%) 906(55.2%)
Method Unchanged Improved Deteriorated
MaxNDCG 1419(67.9%) 500(23.9%) 171(8.2%)
MaxQuer 1639(72.1%) 423(18.6%) 211(9.3%)
MaxBestPar 1485(68.4%) 467(21.5%) 219(10.1%)
Results of online interleaving test
Queries impacted by personalization
Online Evaluation
19
Rank differences for deteriorated(light) and improved(dark) queries for MaxNDCG Degree of personalization per rank
For a large majority of deteriorated queries, the clicked results only loss 1 rank The majority of clicked results that improved a query gain 1 rank The gains from personalization are on average more than double the losses MaxNDCG is the most effective personalization method
Conclusions
First large-scale personalized search and online evaluation work
Proposed personalization techniques: significantly outperform default Google and best previous ones
Key to model users: use characteristics and structures of Web pages
Long-term, rich user profile is beneficial
20
Future Exploration
Parameter extensionLearning parameter weightsUsing other fields (e.g., headings in HTML) and learning
their weights
Incorporating temporal informationHow much browsing history?Whether decaying weights of older terms?How page visit duration can be used?
Making use of more personal dataUsing extracted profiles for other purposes
21
Thank you!
22