IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

32
IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS * Department of Computer Science, University of Oxford, UK {zhenghua.xu,thomas.lukasiewicz,oana.tifrea}@cs.ox.ac.uk Zhenghua Xu * , Thomas Lukasiewicz * , Oana Tifrea-Marciuska * SUM 2014

description

IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Transcript of IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Page 1: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON

SIMILARITIES BETWEEN USERS

* Department of Computer Science, University of Oxford, UK {zhenghua.xu,thomas.lukasiewicz,oana.tifrea}@cs.ox.ac.uk

Zhenghua Xu*, Thomas Lukasiewicz *, Oana Tifrea-Marciuska*

SUM 2014

Page 2: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Social Web Search Personalization

q Tags are valuable resources for Social Web Personalization –  Good summaries of the corresponding documents –  Ideal data for privacy-enhanced personalization

q Collaborative tagging on the social Web is called folksonomy.

Page 3: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example

A folksonomy

§  Users and documents

§  Tags annotated by users to documents

Comedy Action

Carl

Bob

Alice

English comedy movie

Chinese action movie

Chinese comedy movie d2 d3 d1

Page 4: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Personalization using folksonomy

The state of the art works of using social tags in personalizing the search on the Social Web generally utilize the similarity between two profiles:

q User profile (tags assigned by a user to all online documents) –  Characterize user preference (e.g. pAlice)

q General document profile (tags assigned to a document) –  Characterize social summary of the online document (e.g. pd1)

Page 5: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Similarity measure

Cosine similarity

Page 6: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example

•  Carl issue a query “Interesting Chinese film”

•  Desired personalized ranking is (d3 > d1 > d2) .

Comedy Action

Carl

d1 d2 d3

English comedy movie

Chinese action movie

Chinese comedy movie

Page 7: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example

Page 8: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

State of the art UP-PR

q The personalized ranking function

where •  Score(q,d) is non-personalized textual matching score between query

and document;

•  Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile.

User Profile Personalized Ranking (UP-PR) [1]

Page 9: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example UP-PR

•  Using method UP-PR we can compute the ranking score as follows

•  Therefore, the personalized ranking is d1 > d3 > d2

•  And we wanted (d3 > d1 > d2)

α=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5

Page 10: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example UP-PR

•  This ranking (d1 > d3 > d2) is intuitively inaccurate because –  Sim(pCarl,pd3 ) should have similar value to Sim(pCarl,pd1 )

–  Score(q, d3) and Score(q, d2) should be the highest text matching score Comedy Action

Carl

d1 d2 d3

English comedy movie

Chinese action movie

Chinese comedy movie

Query: “Interesting Chinese film”

Page 11: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Social Personalized Ranking (SoPRa) [2]

State of the art SOPRA

q The personalized ranking function

•  Sim(pu, pd) is the personalizing factor measuring the similarity between user profile and general document profile;

•  Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q

•  Score(q,d) is non-personalized textual matching score between query and document;

Page 12: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example SOPRA

•  Using method SoPRA we can compute the ranking score as follows

•  The personalized ranking is d1 > d3 > d2 (narrow gap d1 and d3 )

•  And we wanted d3 > d1 > d2

α=β=δ=0.5, Score(q,d1)=0.68, Score(q,d2)=0.55, Score(q,d3)=0.5

Score(q, d3) is low is because d3 is a an online video that has little text

Page 13: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Why it does not work?

Comedy Action

Carl

d1 d2 d3

English comedy movie

Chinese action movie

Chinese comedy movie

For the query “Interesting Chinese film” we want d3 > d1 > d2

does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias

Carl did not tag d3, so the information used for preference modeling is not comprehensive

Page 14: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Why it does not work?

Comedy Action

Carl

d1 d2 d3

English comedy movie

Chinese action movie

Chinese comedy movie

For the query “Interesting Chinese film” we want d3 > d1 > d2

does not correctly characterize Carl’s real perception about d3, since tags from all users are treated equally, and the tag from Bob brings a bias

Carl did not tag d3, so the information used for preference modeling is not comprehensive

Do not treat tags from all users with equal importance for document profile

Extend the user profile with more useful information

Page 15: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Reasons

•  Different Users have different perceptions for the same document •  Not all tags assigned by all other users are equally helpful to

summarize a user’s real perception about a document

•  General document profile, treating tags from all users with equal importance, cannot properly summarize a special user’s personal perception

•  Online annotations are sparse •  user profile, based on only the tags assigned by the corresponding

user, may not contain sufficient information to comprehensively characterize the user’s preferences

Page 16: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Our approach D-PR

q Two novel profiles –  Personalized document profile

– Each user has a personalized document profile to characterize his/her perception about this document

–  Extended user profile –  Summing up all personalized document profiles of

u to more comprehensively characterize u’s preference

Dual Personalized Ranking

Page 17: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Our approach D-PR

Dual Personalized Ranking q The personalized ranking function

•  Sim(p’u pd) is the personalizing factor measuring the similarity between pu,d - the personalized document profile and p’u is the extended profile.

•  Sim(q,pd): the social matching score, how relevant the social summary of a document d is to q

•  Score(q,d) is non-personalized textual matching score between query and document;

Page 18: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Personalized Document Profile

q Users having similar perceptions about existing documents will very likely also share similar perceptions about future documents

q Given a document d and a user u, we use the perception similarities between u and other users as weights to sum up tags assigned to d by the users having high perception similarities with u.

q Thus, a perception similarity of two users can be measured by the similarity of their profiles, called profile-based perception similarity and defined as follows:

Page 19: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Estimate of Personalized Document Profile

1.  Select a set of users UT whose perception similarity with u are higher than a predefined threshold T

2.  We estimate u’s personalized document profile relative to a document d (denoted pu,d) by using perception similarities as weights to sum up the tags assigned to d by the users belonging to UT

¤  vui,d is a weighted vector of tags, whose weight of a tag is the number of times that the tag is assigned by ui to d

¤ Ud is the set of users who annotate document d

Page 20: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example D-PR

q Compute perception similarities between Carl and other users

Carl

We set threshold T to be 0.5, therefore

UT={Alice, Bob,Carl}

Page 21: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example D-PR

Carl

Page 22: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Example D-PR

α=β=δ=0.5 We get desired ranking d3 > d1 > d2

Page 23: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Analysis

q D-PR solves profile modeling problems existing in the state-of-the-art approaches in the following two ways: –  It utilizes the perception similarities to weaken the

influences of tags assigned by users having different perceptions

–  It obtains a personalized document profile for each document, so the extended user profile, computed by summing up all these personalized document profiles, contains more sufficient information to characterize the user’s preferences more comprehensively

Page 24: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Experimental Study

More than 100 000 URLs of online documents and retrieves their social annotations from Delicious.com from [3].

Page 25: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Evaluation Methodology

q  Obtaining relevance judgments is an expensive, time-consuming process ¤ who does it? ¤ what are the instructions? ¤ what is the level of agreement?

Page 26: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Evaluation Methodology

•  Reciprocal of the rank at which the first relevant document is retrieved (very sensitive to rank position)

•  Mean Reciprocal Rank (MRR) is the average of the reciprocal ranks over a set of queries

•  ri is the ranking position of the ith user query’s first relevant document in the personalized search result ordering, and n is the total number of tested queries.

Page 27: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

MRR

RR = 1/1 = 1

RR = 1/2 = 0.5

MRR = (1+0.5)/2 = 0.75

Page 28: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Evaluation Methodology

¨  Proven that if a document is annotated by a user with some tags, this document is very likely to be visited by the same user if it appears as a search result of using the same tags as the search query ¤ Therefore, for each bookmark (u, t, d), we create a

query q = t, which is issued by user u and aims at finding document d

¤ We remove all selected bookmarks to avoid promoting the annotated document with bias.

Page 29: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Results

Page 30: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Summary and Outlook

q  In this paper, we have proposed a dual personalized ranking (D-PR) function to improve personalized ranking of search on the Social Web via

q an extended user profile

q a personalized document profile.

q  In future research, we will apply our D-PR ranking function to other Social Web datasets to evaluate its performance on various kinds of social resources.

Page 31: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

Questions?

Page 32: IMPROVING PERSONALIZED SEARCH ON SOCIAL WEB BASED ON SIMILARITIES BETWEEN USERS

References

[1] S. Xu, S. Bao, B. Fei, Z. Su, and Y. Yu. Exploring folksonomy for personalized search. In Proceedings of SIGIR, pages 155–162, 2008.

[2] M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving Web search. In Proceedings of SIGIR, pages 861–864, 2013.

[3] M. G. Noll and C. Meinel. The metadata triumvirate: Social annotations, anchor texts and search queries. In Proceedings of WI-IAT, pages 640–647, 2008.