Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Post on 23-Dec-2014

72 views 2 download

Tags:

description

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content by Bordino

Transcript of Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content

chenwq2014/04/16

Mounia Lalmas et al.(Yahoo! Labs, CIKM 2013 Best Paper )

Mounia Lalmas

@mounialalmas

mounia-lalmas

mounialalmas

Principal Research Scientist at Yahoo! Labs

Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London

Her research focuses on three main areas: user engagementsocial media and search.

Contents1/23

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

Why/when do penguins wear sweaters?

Entity SearchBuilding an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers

SerendipityFinding something good or useful while not specifically looking for itSerendipitous search systems provide relevant and interesting results

2/23

What is entity search

How people become entitiesHow people become entities

3/23

What is entity search

Entities Extraction

Proximity Measure between two entities

Entities Ranking according to their proximity to a query entity

4/23

What is Serendipity

“making fortunate discoveries by accident”

M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010.

Serendipity = unexpectedness + relevance“Expected” result baselines from web search

Serendipity = interestingness + relevanceResult interestingness given the queryPersonal interest in result

P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009.

5/23

What is Serendipity

Intuition from recsys:

unexpectedness

usefulness u(RSi)

6/23

What connections between entities do web community knowledge portals offer?

WHAT

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?6/23

Why/when do penguins wear sweaters?

community-driven question & answer portal

•67M questions & 262M answers

•2 years [2010/2011]

•English-language

community-driven encyclopedia

•3 795 865 articles

•from end of December 2011

•English Wikipedia

minimally curatedopinions, gossip, personal info

variety of points of view

minimally curatedopinions, gossip, personal info

variety of points of view

curatedhigh-quality knowledgevariety of niche topics

curatedhigh-quality knowledgevariety of niche topics

7/23

Contents

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

8/23

Entity & Relationship Extraction

Entity defined as any concept having a Wikipedia page

1. Identify surface forms[http],

2. resolve to Wikipedia entities[Zhou],

3. rank entities using aboutness score[Paranjpe];

https://www.otexts.org/node/832

Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.

D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.

Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears)

9/23

Entity & Relationship Extraction

Aboutness

Relationship

10/23

Entity Networks

Dataset # Nodes # Edges # Isolated

Yahoo! Answers 896,799 112,595,138 69,856

Wikipedia 1,754,069 237,058,218 82,381

Wikipedia

Yahoo Answers

11/23

Retrieval

Algorithm: Lazy Random walk with restart[Chung]

[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.

12/23

Rank Aggregation

For a given query, combine the results from different search engines

Simple median-rank aggregation[Sculley]

A B C D EC D E A B

C A D B E

Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.

13/23

Contents

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

14/23

Retrieval

Wikipedia Yahoo! Answers Combined

Precision @ 5 0.668 0.724 0.744

MAP 0.716 0.762 0.782

3 label per query-result pair

Yahoo! AnswersJon RubinsteinTimothy CookKane Kramer

Steve WozniakJerry York

WikipediaSystem 7

PowerPC G4SuperDrive

Power MacintoshPower Computing Corp.

Steve Jobs Annotator agreement

(overlap): 85% Average overlap in top 5

results: 12%

15/23

What connections between entities do web community knowledge portals offer?

WHAT

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?16/23

• Sentiment

– using SentiStrength compute positive & negative scores

– compute attitude and sentimentality

– Entity-level scores

• Quality

– Flesch Reading Ease score

Attitude (Polarity) Sentimentality (Strength) Readability

Topical Category

– Yahoo Content Taxonomy

Entity Networks with Implicit Metadata17/23

Entity Networks with Metadata

Table 5: Serendipitous across different runs

| relevant & unexpected | / | unexpected |number of serendipitous results out of all of the unexpected results retrieved

| relevant & unexpected | / | retrieved |serendipitous out of all retrieved

18/23

User-perceived Quality

1. Which result is more relevant to the query?

2. If someone is interested in the query, would they also be interested in these results?

3. Even if you are not interested in the query, are these results interesting to you personally?

4. Would you learn anything new about the query?

19/23

Entity Networks with Metadata

Data General +Topic

Which result is more WP 0.162 0.194

relevant to the query? YA 0.336 0.374

Comb 0.201 0.222

If someone is interested in WP 0.162 0.176

the query, would they also YA 0.312 0.343

be interested in the result? Comb 0.184 0.222

Even if you are not interested WP 0.139 0.144

in the query, is the result YA 0.324 0.359

interesting to you personally? Comb 0.168 0.198

Would you learn anything WP 0.167 0.164

new about the query from YA 0.307 0.346

this result? Comb 0.184 0.203

Topicalcategoryconstraintpromote resultsof same topicas query entity

Sentiment andReadabilityconstraintshurt performance

Table 6: Similarity (Kendall’s tau-b[Fagin]) between result sets and reference ranking

Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.

22/23