Post on 15-Apr-2017
Dynamic Collective Entity Representations for Entity RankingDavid Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
Ò Index = Knowledge Base (= Wikipedia) Ò Documents = Entities Ò “Real world entities” have a single representation
(in KB)
5
Representation is not static
Ò Associations between words and entities change over time Ò “ferguson shooting” -> Ferguson, Missouri
Ò People talk about entities all the time
6
*****
7
Dynamic Collective Entity Representations
Ò Use “collective intelligence” to mine entity descriptions to enrich representation. Ò Is like document expansion (add terms found
through explicit links) Ò Is not query expansion (terms found through
predicted links)
8
Advantages
Ò Cheap: Change document in index, leverage tried & tested retrieval algorithms
Ò Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms
Ò “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
9
Haven’t we seen this before?
Ò Anchors & queries in particular have been shown to improve retrieval [1]
Ò Tweets have been shown to be similar to anchors [2] Ò Social tags, same [3] Ò But: in batch (i.e., add data, see if/how it improves
retrieval)
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
10
Description sourcesDescription sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
11
Original entity representation
Tupac ShakurTupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem-ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author,
actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...]
Original entity description
Entity description
12
Static description sources
KB Anchors2PacTupacMakaveli
KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi
KB Redirects2pac ShakurThug Immortal
KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists
Web AnchorsWhat job did Tupac have before he was a rapper
Tupac
Tupac is arguably more influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by shooting
Tupac Shakur
Tupac Shakur reciting Shake-speare at art school
Description sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
Description sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
KB Anchors2PacTupacMakaveli
KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi
KB Redirects2pac ShakurThug Immortal
KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists
Web AnchorsWhat job did Tupac have before he was a rapper
Tupac
Tupac is arguably more influential
Tupac Amaru Shakur
Tupac Shakur-style drive-by shooting
Tupac Shakur
Tupac Shakur reciting Shake-speare at art school
13
Dynamic description sourcesDynamic expansions
tupac and the law
hiphop/icons
dead rappers
people influenced by tupac
awesomeartist rapd
Happy Birthday Tupac!!! 2Pac Gemini
RT: Las cenizas de Tupac, el mejor rapero de la historia,-fueron mezcladas con marihuana y fumadas por miembros de Outlawz
Even more crazy that this was an-nounced just one day before what would have been Pac’s 40th birth-day.
Tweets TagsQueries
Description sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
Description sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
Description sources
KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.
Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.
TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.
QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.
Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.
Dynamic sources
Static sources
14
Challenge
Ò Heterogeneity 1. Description sources 2. Entities
Ò Dynamic nature Ò Content changes over time
15
Adaptive ranking
Ò Supervised single-field weighting model Ò Features:
Ò field similarity: retrieval score per field. Ò field “importance”: length, novel terms, etc. Ò entity “importance”: time since last update.
Ò Learn optimal field weights from clicks
Supervised single-field weighting modelEeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals.
16
Experimental setup
1. Data: Ò MSN Query log (62,841 queries that yield entity clicks)
Ò For each query: Ò Produce ranking Ò Observe click Ò Evaluate ranking (MAP/P@1) Ò Expand entities (w/ descriptions from dynamic
sources) Ò [re-train ranker]
17
Results
Ò Comparing effectiveness of diff. description sources
Ò Comparing adaptive vs. non-adaptive ranker performance
18
Description sources
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000 30000
19
Feature weights over time
20
Adaptive vs. non-adaptive ranking
0.60
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0 5000 10000 15000 20000 25000 30000
21
In summary
Ò Expanding entity representations with different sources enables better matching of queries to entities
Ò As new content comes in, it is beneficial to retrain the ranker
Ò Informing ranker of “expansion state” further improves performance
22
Thank you