„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in...

23
„IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in...

Page 1: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

„IP“ is not always „Internet Protocol“A long and a very short example for IP problems in Web 2.0 research

Ralf Schenkel

Joint work with Tom Crecelius, Mouna Kacimi, Sebastian Michel, Thomas Neumann, Josiane Parreira, Marc Spaniol, Gerhard Weikum

Page 2: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Social Tagging NetworksDefinition: Social Tagging NetworkWebsite where people• publish + tag information• review + rate information• publish their interests• maintain network of friends• interact with friends

Common examples:• Flickr (images)• YouTube (videos)• del.icio.us (bookmarks)• Librarything (books)

• Discogs (CDs)• CiteULike (papers)• Facebook• Myspace (media)

Page 3: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

Part 1: Search in Social Tagging Networks

(long)

Page 4: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Some Statistics

Flickr: (as of Nov 2007)• 2+ billion photosFacebook: (as of Apr 2007)• 1.8 billion photos• 31 million active users• 100,000 new users per day

Myspace: (as of Apr 2007)• 135 million users (6th largest country on Earth)• 2+ billion images (150,000 req/s), millions added daily• 25 million songs• 60TB videos

Huge volume of highly dynamic data

Page 5: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Showcase: librarything.com

RatingsTagsBooks

Others

Page 6: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

librarything.com: Social Interaction

Explicit Friends

Similar Users

Comments

Page 7: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

librarything.com: Tag Clouds

Page 8: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

librarything.com: Search

Search results independent of the querying user(and the social context)

Search results independent of the querying user(and the social context)

Page 9: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Outline

• Introduction

• Modelling Social Tagging Networks– Graph Model

– Different Information Needs

• Effective Query Scoring

• Efficient Query Evaluation

• Summary & Further Challenges

Page 10: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Social Network Model

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

Page 11: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Social Network Model

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

Page 12: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Social Network Model

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

travel travel

tripvldb

travel probability

queuestravel

probability

harrypotter

Page 13: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Information Need 1: Global

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

travel travel

tripvldb

travel probability

queuestravel

probability

harrypotter

harry potter

Tags by all users equally important

Page 14: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Information Need 2: Similar Users

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

travel travel

tripvldb

travel probability

queuestravel

probability

harrypotter

travel

?Tags by users with similar tags/items

more important

Page 15: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Information Need 3: Trusted Friends

travelNorway

travelChina queueing

theory

USERS

ITEMS

TAGS

travel travel

tripvldb

travel probability

queuestravel

probability

harrypotter

probability

?Tags by closely related users

more important

Page 16: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Wishlist for Social-Aware Social Search• Search results depend on

– Global popularity of items– Collection context of the querying user (books, tags)– Social context of the querying user (trusted friends)

• Automatic tag expansion (beyond synonyms)• Scalable query processing• Explanation of results

(similar wishlist for social recommendations)

Page 17: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Fast Forward…

Imagine a 20 minutes talk aboutquantified friendship measures,

personalized scoring models,dynamic tag expansion,

scalable query processing, …

Essence:

• Context-aware personalized search

• Tags from closely related users are more important

• Different kinds of „relatedness“ possible[SIGIR 2008]

Page 18: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Experimental Evaluation: Effectiveness

Systematic evaluation of result quality difficult

Three possible setups:• Manual queries + human assessments• Queries+assessments derived from external info

(ex: DMOZ categories)• Automated assessments from context of user

– Items tagged by friends– Items tagged in the future

?

Page 19: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Prototype Implementation

[SIG

IR D

emo

2008

], [V

LDB

Dem

o 20

08]

Page 20: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Preliminary User StudyLibraryThing user study: [Data Engineering Bulletin, June 2008]• 6 librarything users with reasonably large library and friend sets• Overall 49 queries• Crawled (part of) librarything: ~1,3 mio books, ~15 mio tags,

~12,000 users, ~18,000 friends• Measured NDCG[10]

0.0 0.2 0.5 0.8 1.0

0.0 0.546 0.572 0.568 0.565 0.565

0.2 0.564 0.572 0.579 0.581 -

0.5 0.539 0.552 0.559 - -

0.8 0.515 0.546 - - -

1.0 0.465 - - - -

(1-α)(graph)

(1-α) (content)

Authors of the paper

Page 21: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

We need a benchmark collection, but…• Everybody „has“ data from Flickr, librarything• Data contains private information by definition• Data cannot be successfully anonymized (AOL)• Data must not be anonymized

(we need the users to assess results)• Data must be large scale

(a few volunteers are not enough)• Collection must be completely offline available

for stability of results (including images,…)

Page 22: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

Part 2: Web Archiving

(very short)

Page 23: „IP“ is not always „Internet Protocol“ A long and a very short example for IP problems in Web 2.0 research Ralf Schenkel Joint work with Tom Crecelius,

September 25, 2008 Dagstuhl Perspectives Workshop Web 2.0

Online Information is Volatile• Huge amount of information available online only

today• Easily lost (hardware failure, software failure,

human failure, deletion, attack, …)• Easily unaccessible (anybody knows Interleaf?)• Easily manipulated• How will historians learn about the 21th century?

Strong need for long-term preservationof the evolving Web