Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

24
Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content chenwq 2014/04/16 Mounia Lalmas et al. (Yahoo! Labs, CIKM 2013 Best Paper )

description

Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content by Bordino

Transcript of Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Page 1: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Penguins in Sweaters, or Serendipitous Entity Search on User-generated-Content

chenwq2014/04/16

Mounia Lalmas et al.(Yahoo! Labs, CIKM 2013 Best Paper )

Page 2: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Mounia Lalmas

@mounialalmas

mounia-lalmas

mounialalmas

Principal Research Scientist at Yahoo! Labs

Professor of Information Retrieval at the Department of Computer Science at Queen Mary, University of London

Her research focuses on three main areas: user engagementsocial media and search.

Page 3: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Contents1/23

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

Page 4: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Why/when do penguins wear sweaters?

Entity SearchBuilding an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers

SerendipityFinding something good or useful while not specifically looking for itSerendipitous search systems provide relevant and interesting results

2/23

Page 5: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What is entity search

How people become entitiesHow people become entities

3/23

Page 6: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What is entity search

Entities Extraction

Proximity Measure between two entities

Entities Ranking according to their proximity to a query entity

4/23

Page 7: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What is Serendipity

“making fortunate discoveries by accident”

M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010.

Serendipity = unexpectedness + relevance“Expected” result baselines from web search

Serendipity = interestingness + relevanceResult interestingness given the queryPersonal interest in result

P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009.

5/23

Page 8: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What is Serendipity

Intuition from recsys:

unexpectedness

usefulness u(RSi)

6/23

Page 9: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What connections between entities do web community knowledge portals offer?

WHAT

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?6/23

Page 10: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Why/when do penguins wear sweaters?

community-driven question & answer portal

•67M questions & 262M answers

•2 years [2010/2011]

•English-language

community-driven encyclopedia

•3 795 865 articles

•from end of December 2011

•English Wikipedia

minimally curatedopinions, gossip, personal info

variety of points of view

minimally curatedopinions, gossip, personal info

variety of points of view

curatedhigh-quality knowledgevariety of niche topics

curatedhigh-quality knowledgevariety of niche topics

7/23

Page 11: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Contents

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

8/23

Page 12: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Entity & Relationship Extraction

Entity defined as any concept having a Wikipedia page

1. Identify surface forms[http],

2. resolve to Wikipedia entities[Zhou],

3. rank entities using aboutness score[Paranjpe];

https://www.otexts.org/node/832

Zhou Y, Nie L, Rouhani-Kalleh O, et al. Resolving surface forms to wikipedia topics[C]//Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010: 1335-1343.

D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.

Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears)

9/23

Page 13: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Entity & Relationship Extraction

Aboutness

Relationship

10/23

Page 14: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Entity Networks

Dataset # Nodes # Edges # Isolated

Yahoo! Answers 896,799 112,595,138 69,856

Wikipedia 1,754,069 237,058,218 82,381

Wikipedia

Yahoo Answers

11/23

Page 15: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Retrieval

Algorithm: Lazy Random walk with restart[Chung]

[1] Chung F R K. Spectral graph theory[M]. American Mathematical Soc., 1997.

12/23

Page 16: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Rank Aggregation

For a given query, combine the results from different search engines

Simple median-rank aggregation[Sculley]

A B C D EC D E A B

C A D B E

Sculley D. Rank Aggregation for Similar Items[C]//SDM. 2007.

13/23

Page 17: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Contents

1

3

What/why serendipitous search

How to build serendipitous search system

Experiments setting and analysis

14/23

Page 18: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Retrieval

Wikipedia Yahoo! Answers Combined

Precision @ 5 0.668 0.724 0.744

MAP 0.716 0.762 0.782

3 label per query-result pair

Yahoo! AnswersJon RubinsteinTimothy CookKane Kramer

Steve WozniakJerry York

WikipediaSystem 7

PowerPC G4SuperDrive

Power MacintoshPower Computing Corp.

Steve Jobs Annotator agreement

(overlap): 85% Average overlap in top 5

results: 12%

15/23

Page 19: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

What connections between entities do web community knowledge portals offer?

WHAT

WHYHow do they contribute to an interesting, serendipitous browsing experience?

Why/when do penguins wear sweaters?16/23

Page 20: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

• Sentiment

– using SentiStrength compute positive & negative scores

– compute attitude and sentimentality

– Entity-level scores

• Quality

– Flesch Reading Ease score

Attitude (Polarity) Sentimentality (Strength) Readability

Topical Category

– Yahoo Content Taxonomy

Entity Networks with Implicit Metadata17/23

Page 21: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Entity Networks with Metadata

Table 5: Serendipitous across different runs

| relevant & unexpected | / | unexpected |number of serendipitous results out of all of the unexpected results retrieved

| relevant & unexpected | / | retrieved |serendipitous out of all retrieved

18/23

Page 22: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

User-perceived Quality

1. Which result is more relevant to the query?

2. If someone is interested in the query, would they also be interested in these results?

3. Even if you are not interested in the query, are these results interesting to you personally?

4. Would you learn anything new about the query?

19/23

Page 23: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content

Entity Networks with Metadata

Data General +Topic

Which result is more WP 0.162 0.194

relevant to the query? YA 0.336 0.374

Comb 0.201 0.222

If someone is interested in WP 0.162 0.176

the query, would they also YA 0.312 0.343

be interested in the result? Comb 0.184 0.222

Even if you are not interested WP 0.139 0.144

in the query, is the result YA 0.324 0.359

interesting to you personally? Comb 0.168 0.198

Would you learn anything WP 0.167 0.164

new about the query from YA 0.307 0.346

this result? Comb 0.184 0.203

Topicalcategoryconstraintpromote resultsof same topicas query entity

Sentiment andReadabilityconstraintshurt performance

Table 6: Similarity (Kendall’s tau-b[Fagin]) between result sets and reference ranking

Fagin R, Kumar R, Mahdian M, et al. Comparing and aggregating rankings with ties[C]//Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 2004: 47-58.

22/23

Page 24: Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content