1 Web Search Personalization via Social Bookmarking and Tagging Michael G. Noll & Christoph Meinel...

Post on 12-Jan-2016

219 views 3 download

Tags:

Transcript of 1 Web Search Personalization via Social Bookmarking and Tagging Michael G. Noll & Christoph Meinel...

1

Web Search Personalization

via Social Bookmarking andTagging

Michael G. Noll & Christoph MeinelHasso-Plattner-Institut an der Universit¨at Pots

dam, GermanyISWC 2007Advisor: Prof. Hsin-Hsi Chen Reporter: Yu-Hui Chang

2008/07/30

2

Introduction

2008/07/30 Y.H. Chang 3/27

Social bookmarking and tagging

• social bookmarking:– publicly sharing your bookmarks with others– including any additional metadata

• tagging / folksonomies:– Users annotate Documents with with a flat, un

structured list of keywords called Tags

TUDRtagging

2008/07/30 Y.H. Chang 4/27

Web search personalization

• integration of user-specific data to improve results / advertising

• two main approaches:1. modify user's query:

“nyt” > “new york times”

2. re-rank search results based on user profile

5

Personalization Technique

2008/07/30 Y.H. Chang 6/27

Personalization

• Input: – user profile + document profiles

• Via social bookmarking and tagging

• Algorithm:1. calculate Similarity(user, document) for all docs

2. sort documents by similarity from highest to lowest

• Output: – re-ranked search result list

2008/07/30 Y.H. Chang 7/27

Profile

• user profile– “Tagmarking”

• A user search “research internet security”=>he/she can click single buttonto bookmark a document with auto translate tag “rese

arch”, “internet”, ”security”

• document profiles– Communicating with the bookmarking service

over its web API

Tagmarking

2008/07/30 Y.H. Chang 8/27

Personalization

• Complete process of web search personalization• * every step done in real-time

2008/07/30 Y.H. Chang 9/27

Data Aggregation: User Profile

• User’s profile example

m tags

n documents

2008/07/30 Y.H. Chang 10/27

Data Aggregation: Doc. Profile

• Document’s profile example

m tags

n users

Mu

2008/07/30 Y.H. Chang 11/27

Similarity

• Similarity(u ,d)=puT ||p‧ d||

– ||pd||: simply normalization of document profile, reset matrix element with only 1 and 0 two values ( True or False)

2008/07/30 Y.H. Chang 12/27

Similarity example

• Similarity(u ,d)=puT ||p‧ d||

11…

1111……

13192

102134

=13*0+19*1+2*0+10*1+21*0+34*1=63

2008/07/30 Y.H. Chang 13/27

Similarity score properties

• the key factor: unmodified user profile– Promotes known and similar doc., demotes

those unknown or non-similar doc.• more sophisticated normalization for both user and

document is on-going

• Score of unknown document => 0!!!• most critical factor in practice:

– “do we have sufficient data to make all this work?”

2008/07/30 Y.H. Chang 14/27

Personalization

• system setup– server: social bookmarking service– client: browser add-on

• modification of search engine UI by updating the DOM tree of the search result pages in real-time

2008/07/30 Y.H. Chang 15/27

2008/07/30 Y.H. Chang 16/27

Re-rank example

• a user with a strong interest in information technology and network security

2008/07/30 Y.H. Chang 17/27

18

Experiment and Evaluation

2008/07/30 Y.H. Chang 19/27

Experiment

• Del.icio.us– Public social bookmarking service– Large user community

• Key question:• Quantitative analysis

– How many social annotations in practice?

• Qualitative analysis– Quality evaluation

2008/07/30 Y.H. Chang 20/27

Quantitative analysis

• test set– 140 “popular tags” on del.icio.us– 1400 search results link (top 10 results )

• totaling– 981,989 bookmarks– 20,498 tag annotations (2,300 unique)

2008/07/30 Y.H. Chang 21/27

Quantitative analysis

2008/07/30 Y.H. Chang 22/27

Quantitative analysis

• we can expect to personalize approx. 85% (in the 1st page) per query in practice

Percentage of links with at least 1 tag

2008/07/30 Y.H. Chang 23/27

Qualitative analysis

• For each query, participants were presented two search result lists:

1. original list from Google Search2. The personalized version

• 8 participants evaluate the top 10 results for 13 queries each

– Participant’s job: researchers, web masters, software developers, system administrators

– The average number of bookmarks for a participant was 153.

2008/07/30 Y.H. Chang 24/27

Qualitative analysis

29.80%

70.20%6.70%

• Some discussions:– “Expert” user profiles

– Disambiguate words and contexts

Personalized version Better

Worse

Equal

25

Conclusion

2008/07/30 Y.H. Chang 26/27

Conclusion

• proposed personalization approach is feasible and viable in practice:– already sufficient user-supplied metadata available– initial evaluation of personalization quality shows very

promising results

• Open Access:• http://www.michael-noll.com/dmoz100k06/ - dat

a set• http://www.michael-noll.com/delicious-api/ - scri

pts

27

Thank you!!