Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

29
Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008

Transcript of Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Page 1: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Making DB and IR (socially) meaningful

Sihem Amer-Yahia, Human Social Dynamics

Dagstuhl 03/10/2008

Page 2: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

2

Disclaimers

• No XML• No Querying• No religion

• Lots of Ranking• Millions of people with different opinions

• A hint of db and ir

Page 3: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

3

Abstract

Collaborative tagging and rating sites constitute a unique opportunity to leverage implicit and explicit social ties between users in search and recommendations.

In the first part of the talk, we explore different ranking semantics which account for content popularity within a network, thereby going beyond traditional query relevance. We show that the accuracy of ranking is tied to users behavior.

In the second part of the talk, we describe a set of novel questions that arise under the new ranking semantics. The first question is to revisit data processing in the presence of power law distributions and tag sparsity, and indexing in light of different user behaviors. We then explore different ways of explaining recommendations followed by a discussion on diversifying results. Diversity is a well-known problem in recommender systems, referred to as over-specialization, and in Web search. We propose to leverage explanations to achieve diversity on the basis that the same users tend to endorse similar content. Finally, we note that different topics (e.g., sports, photography) are popular at different points in time and argue for time-aware recommendations.

We conclude with a brief description of the infrastructure of Royal Jelly, a scalable social recommender system built on top of Hadoop.

Page 4: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

4

Outline

• Motivation• Ranking• Almost-new questions• Royal Jelly• Wilder ideas

Page 5: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

5

Recommendations (Amazon)

but who are these people?

Page 6: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

6

Explaining recommendations in x.qui.site

• Leveraging user-user similarities• Multiple recommendation methods

– Friends network– Shared-bookmark-interest– Shared-tag-interest– Shared-bookmark-tag-interest

• Multiple recommendation types– Bookmarks– Users– Tags

Page 7: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Yahoo! Movies now

Page 8: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Reviewers biases in Yahoo! Movies

• Leveraging item-item similarities• Socially Meaningful Attribute Collections

– Sets of items which are easy to label and serve as a socially meaningful reference set:

• Adventure movies starring Johnny Depp• Woody Allen Comedies• Scary movies from the 80’s• Moderate French restaurants in Southern CA

• Similarities between movies are defined based on their SMACs

Page 9: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

9

Social Context

• Heuristic Recommenders

– Content / Item-based (purple column): discover items similar to i2 (seed items) and see how u2 has rated them

– Collaborative / User-based (green row): discover users similar to u2 (seed users) and see how they rate i2

– Fusion / Filterbots: leveraging both similar items and similar users

uSeedu

iuratinguusimilarityiuscore'

),'()',(),(

u1 u2 ... ... un

i1 5 1 ... ... 4

i2 4 ? ... ... 5

:

:

im 5 2 ... ... 4

iSeedi

iuratingiisimilarityiuscore'

)',()',(),(

Page 10: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

10

Outline

• Motivation• Ranking• Almost-new questions• Royal Jelly• Wilder ideas

Page 11: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

11

New ranking semantics

• Collaborative tagging/reviewing sites contains a lot of high-quality user-generated: Flickr, YouTube, del.icio.us, Yahoo! Movies

• Users need help to sift through the large number of available items

• Not only relevance (in a traditional Web sense) but also about people whose opinion matters

Page 12: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

12

Data model

• Items: photos in Flickr, movies in Y!Movies, URLs in del.icio.us• Users: Seekers or Taggers

• Tagging/rating/reviewing: endorsements from users– u Taggers, Items(u) = {i Items | Tagged(u)} – Taggers(i, t) = {v | Tagged(v,i,t)}

• Network: implicit and explicit social links– u Seekers, Network(u) = {v Taggers | Link(u, v, w)} – Flickr friends, people with similar movie tastes, del.icio.us

network

Page 13: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

13

Search

• Given a seeker s and a query Q (set of tags), return items which are most relevant to Q and are most popular in s’s network

)| Network(s,t) |Taggers(if(i,t,s)

Qt

stifg(i,Q,s) ),,(

f and g are monotone, assume f = count, g = sum

)),,((, stifgs)score(i,Q Qt

Page 14: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

14

Hotlists

• Evaluate different hotlist generation methods in del.icio.us to see how best they predict user’s tagging actions

• 116,177 users who tagged 175,691 distinct URLs using 903 tags, for a total of 2,322,458 tagging actions for 1 month

• Each method defined by its seed and scope and returns the 10 best ranked items

Seed|) |Taggers(ied)score(i,Se

Page 15: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

15

People who matter

• friends • url-interest• tag-url-interest

• Coverage - overlap of hotlist with u’s tagging actions, averaged over users in scope

|)(|

),(int

|),(|

),(),(,

,

1

21,21

uitems

|tu|itemst)(u

tuitems

|tu items tu|itemst)uagr(u

_thres(u,t)tags(v)U|tuscope

thresagrvuagrvtagstUvuseed

intint

_),()(|)(

|,10)Items(u)min(|

|Items(u) HList |u)List,coverage(H

Page 16: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

16

Coverage

42.9%

81.7%

8.6%

61%

Page 17: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

17

Outline

• Motivation• Ranking• Almost-new questions

– pre-processing&indexing– explanation: why a recommendation– diversity: be innovative, stay relevant– time-awareness: what matters when

• Royal Jelly• Wilder ideas

Page 18: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

18

Pre-processing

• Tags are sparse and may mean different things– Co-occurrence analysis, association rules,

ontologies, EM

• Tails are long, very long– cut tails? average among very different users?

Page 19: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Social Meaningfulness in Y! Movies

Page 20: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

20

Indexing

• Hotlists– global (1 inverted list), global-tag (900 lists, 1 list/popular

tag), friends, url-interest, tag-url-interest (1 list/user)

• Search: – 1 list/per (user,keyword) pair– 1 list/groups of similar users– Cluster indices based on common user behavior

• Behavior does change

Page 21: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

21

Explanation

• Users relate to social biases and influences• What to display?

– all influencers: does not scale– top influencers– distribution of opinions among influencers

• 80% of your friends bookmarked this link• this reviewer rates this movie better than 40% of

all reviewers• How to display it?

– e.g., natural language pattern, visual pattern• Some relationship to DB annotations

Page 22: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

22

Diversity

• Well-know problem in recommender systems (over-specialization) and IR (Web search)

• In recommendations:– Stay as close as possible to the user’s interests– But not too close

• Woody Allen Comedies• Restaurants serving Chinese in the east village in NYC

– Post-processing based on items objective attributes

• Many possible top-k sets • Pick the most diverse• Explanation-based diversity• The same people (items) recommend the same items• Does not require presence of objective attributes• Independent from recommendation method

Page 23: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

23

Time-awareness

• Recommender systems focus on most recent (hot) items

• Recovering old URLs in del.icio.us– Some URLs are tagged heavily for a certain period then slows

down – how to find those worth recovering?

• Anticipating new URLs– New URLs come into the system, often tagged with very few initial

users – how to detect those with potential?

• Topic grouping and time patterns are key:– Event-driven activity (election, photography)

– Utilizing per topic time patterns

Page 24: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Posts with tag “photography”: consistent time pattern

photography

1500

2000

2500

3000

3500

4000

30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

New Year

WeekendsAverage: 2948STDEV: 533

Page 25: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Election

0

100

200

300

400

500

600

30 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

IowaNew Hampshire

Richardson OutThompson Out

Average: 240STDEV: 105

Posts with tag “election”: event-driven tagging

Michigan Florida

Page 26: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

26

Outline

• Motivation• Ranking• Almost-new questions• Royal Jelly• Wilder ideas

Page 27: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Royal Jelly

Page 28: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Hadoop-Pig Based Processing

del.icio.us backup database

MySQL Extract

research9

quicknever database

MySQL Load

distributed analysis and index / view generation

• Daily analysis for a window of several months worth of data

ExplanationDiversity

Page 29: Making DB and IR (socially) meaningful Sihem Amer-Yahia, Human Social Dynamics Dagstuhl 03/10/2008.

Wilder ideas

• Automatic user assessments– Users are willing to create new content– And rate it!– Let them rate recommendations– And help us define evaluation benchmarks

• Make DB social!– Social-awareness in databases and query languages

• Different DB organizations• Different query semantics

– SQL: a Social Query Language?• Who thinks like me? Who does not?