Entity Search: The Last Decade and the Next
-
Upload
krisztianbalog -
Category
Technology
-
view
363 -
download
0
Transcript of Entity Search: The Last Decade and the Next
En#ty Search The Last Decade and the Next
Krisz#an Balog University of Stavanger
@krisz'anbalog
10th Russian Summer School in Informa'on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016
WHAT IS AN ENTITY?
• An en#ty is an "object" or "thing" in the real world that can be dis'nctly iden'fied and is characterized by the following proper#es:
• unique iden#fier(s) • name(s) • type(s) • aKributes (or descrip#on) • (typed) rela#onships to other en##es
people
products
organiza#ons
loca#ons
OUTLINE
2Present
1Past
3Future
now-10y +10y
THE PAST
1PART
The core problem of en#ty ranking and its inves#ga#on at various benchmarking evalua#on campaigns
EVALUATION CYCLE
02. Experimental design
03. Method development
05. Repor'ng
REVISION
04. Experimental evalua'on
IDEA
01. Task defini'on
ENTITY RANKING TASK
search queryretrieval method
search results
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
Task: expert finding
Input: keyword query
Data collec'on: enterprise intranet
En'ty ID: email address
ontology engineering climate change
xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx
xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx
TREC ENTERPRISE EXPERT FINDING
• How to rank en##es that have no direct representa#ons?
• Idea: Look at co-occurrences of en##es and query terms in documents
xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx
query termsen#ty men#on
documents
PROFILE-BASED METHODS
• Build a direct term-based en#ty representa#on based on associated language usage
• "You shall know a word by the company it keeps." [Firth, 1957]
• Use document retrieval techniques for ranking en#ty profile documents
q
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx exxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
e
e
DOCUMENT-BASED METHODS
• First rank documents (or document snippets)
• Then aggregate evidence for the associated en##es
q
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
X
eX
Xe
e
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
Task: en#ty ranking in Wikipedia
Input: keyword++ query (target types/examples)
Data collec'on: Wikipedia
En'ty ID: Wikipedia ar#cle ID
Movies with eight or more Academy Awards+category: best picture oscar +category: bri#sh films +category: american films
INEX ENTITY RANKING
Movies with eight or more Academy Awards
+category: best picture oscar +category: bri#sh films +category: american films
Term-based representa3on
Category-based representa3on
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked DataTask: related en#ty finding
Input: keyword++ query (input en#ty, target type)
Data collec'on: Web
En'ty ID: en#ty homepage URL
airlines that currently use Boeing-747 planes+en'ty: Boeing-747 (clueweb09-..292) +target type: organiza#on
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
Task: en#ty search in the Web of Data
Input: keyword query
Data collec'on: RDF triples
En'ty ID: URI
nokia e73
boroughs of New York City
disney orlando
FIELDED DOCUMENT REPRESENTATION FROM RDF TRIPLES
dbpedia:Audi_A4
subject objectpredicate
subjectpredicate
literal
foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
Task: ques#on answering over RDF data
Input: natural language query
Data collec'on: RDF triples
En'ty ID: URI
Which German ci#es have more than 250000 inhabitants?
Who is the youngest Pulitzer Prize winner?
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
EVALUATION CAMPAIGNS
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
Task: ad-hoc en#ty retrieval
Input: keyword query
Data collec'on: Wikipedia + RDF triples
En'ty ID: Wikipedia ar#cle ID
NASA missions country German language
EVALUATION CAMPAIGNS
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
INEX Linked Data
Question Answering over Linked Data
DATA EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise TREC Entity
INEX Entity Ranking
SemSearch
Question Answering over Linked Data
unstructured
structured
semistructured
INEX Linked Data
• Clear trend moving towards structured data • No meaningful/successful aKempt at combining unstructured and
structured data
QUERY EVOLUTION
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
TREC Enterprise
TREC Entity
INEX Entity Ranking
SemSearch
Question Answering over Linked Data
keyword
natural language
keyword++
INEX Linked Data
• Keyword queries are s#ll the most common way to search • From providing explicit seman#c annota#ons to natural language
ques#ons
WHAT HAVE WE BEEN DOING?
• Core focus has been on retrieval models, and more specifically on en'ty representa'ons
• In terms of associated language usage, descrip#on, types, aKributes
• Richer query representa#ons (i.e., query annota#ons) were taken for granted
image source: hKps://www.pinterest.com/pin/382946774535857111/
THE BIGGER PICTURE
Understanding informa'on needs
Data source(s)
Result presenta'on & user interac'on
Retrieval method
THE PRESENT
2PART
Current research themes on various aspects of en#ty search.
DATA
KNOWLEDGE BASES
• Modern en#ty-oriented search features are fueled by knowledge bases—need con#nuous upda#ng
• Cri#cal to be able to verify the validity of data • Supply provenance informa#on for each statement
• Validity check (s#ll) needs to be performed by a human
• Can we help human editors to maintain and expand knowledge bases?
UNDERSTANDING INFORMATION NEEDS
F. Hasibi, K. Balog, and S. E. Bratsberg. Exploi'ng En'ty Linking in Queries for En'ty Retrieval. ICTIR’16.
ANNOTATING QUERIES WITH ENTITIES
• Seman#c annota#ons of queries were taken for granted so far
• How can automa'c en'ty annota'ons of queries be leveraged to improve en'ty retrieval?
barack obama parents
APPROACH
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham, the mother of Barack Obama, was an American anthropologist who …
<dbo:birthPlace>: [ <Honolulu>, <Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation D̂̂D
Term-based representation DDKnowledge base entry for ANN DUNHAM
term-basedmatching
entity-basedmatching
entity linking
<dbo:birthPlace>: [<Honolulu>, <Hawaii> ]<dbo:child>: <Barack_Obama><dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]
Query terms: <rdfs:label>: Ann Dunham<dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who …<dbo:birthPlace>: Honolulu Hawaii …<dbo:child>: Barack Obama<dbo:wikiPageWikiLink>: United States Family Barack Obama
Term-based representa3on
En3ty-based representa3on
barack obama parents
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham, the mother of Barack Obama, was an American anthropologist who …
<dbo:birthPlace>: [ <Honolulu>, <Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation D̂̂D
Term-based representation DDKnowledge base entry for ANN DUNHAM
term-basedmatching
entity-basedmatching
entity linking
<dbo:birthPlace>: [<Honolulu>, <Hawaii> ]<dbo:child>: <Barack_Obama><dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]
Query terms: <rdfs:label>: Ann Dunham<dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who …<dbo:birthPlace>: Honolulu Hawaii …<dbo:child>: Barack Obama<dbo:wikiPageWikiLink>: United States Family Barack Obama
<Barack_Obama>
en'ty annota'on (automa'c)
RESULTS
MAP
0,00
0,06
0,11
0,17
0,22
LM MLM-tc MLM-all PRMS SDM FSDM
baseline +ELR
ANALYSIS
SUMMARY
• Automa#cally annota#ng queries with en##es can significantly improve retrieval performance
• Open research problem: • How should a query be answered (list, fact, table, etc.)?
ENTITY SUMMARIES
ENTITY SUMMARIES
• Summaries serve a dual purpose • Synopsis of the en#ty • Provide evidence why the en#ty is a good answer
for the given query
• How to generate dynamic en'ty summaries that can directly address users’ informa'on needs?
• Two subtasks • Fact ranking — What should be in the summary? • Summary genera#on — How should it be presented?
ANTICIPATING INFORMATION NEEDS
J. Benetka, K. Balog, and K. Nørvåg. An'cipa'ng Informa'on Needs Based on Check-in Ac'vity. WSDM’17.
ZERO-QUERY SEARCH
• Proac8ve instead of reac8ve search • "An#cipate user needs and respond with
informa#on appropriate to the current context without the user having to enter a query" — (Allan et al., SIGIR Forum 2012)
• Using a person's check-in ac3vity as context, can we an3cipate her informa3on needs, and respond with a set of informa3on cards that directly address those needs?
Terminal
Weather21ºC
Traffic
INFORMATION NEEDS FOR ACTIVITIES
• What are relevant informa#on needs in the context of a given ac#vity?
• Use POI categories (Foursquare) to represent ac#vi#es • Mine informa#on needs from search sugges#ons
ANTICIPATING INFORMATION NEEDS
• Maximize the likelihood of sa#sfying the user's informa#on needs by considering each possible ac#vity that might follow next
• Transi#on probabili#es are es#mated based on historical check-in data
Activity A
Activity B
Activity C
Activity D
45%
34%
21%
?
Train Test80%
User 3
User 2
User 1
Check-in dataset
EVALUATION METHODOLOGY
Terminal
Weather21ºC
Traffic
RESULTSNGCD@5
0,00
0,23
0,45
0,68
0,90
Top level Second level
Most frequent informa#on needs, regardless of the last ac#vity
M0
Consider informa#on needs for all possible upcoming ac#vi#esIn addi#on, consider the informa#on needs relevant to the past ac#vity (fixed weight for all info needs)
Consider the temporal sensi#vity of each informa#on need individually
M1
M2
M3
SUMMARY
• Iden#fying informa#on needs that are relevant in the context of a given ac#vity and proac#vely presen#ng informa#on cards addressing those needs
• Open research problems • Other contexts
• (Access to data, privacy...)
THE FUTURE
3PART
Making the right informa#on available to the right person at the right #me.
IMAGINARY SCENARIO WITH AN INTELLIGENT PERSONAL ASSISTANT
I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?Sure. I want an ac've holiday with
the family in beau'ful nature.
It sounds like you would definitely love Norway. A cabin in the mountains maybe?
Could be. But I want to go kayaking and also catch some fish. And not too much rain, please.
And something fun for the kids nearby, I suppose?
Of course.
How does Oltedal sound? People have been quite successful with catching lake trout based on what I found on Instagram.
There is also a theme park and horse riding, both within 50kms.
And what about the weather? You know we’re talking about Norway, right…? Anyway, based on sta's'cs from the past 30 years, this is one of the areas with the least amount of rain if you go in August.
I see. What about accommoda'on?
Here is a list of places that I think you might like.
Any opinions on this one?
According to the reviews that I can find on the web, the cabins are well equipped, the staff is nice and they even allow guests to borrow their kayaks.
OK. Let’s find a date that works for everyone. According to your wife's calendar, her
parents will be visi'ng you in the first week of August. School starts for the kids on the week of Aug 22. So there is a two week window between Aug 8 and 21, assuming that I can cancel the regular weekly mee'ngs with your PhD students.
That's fine. The students won't mind. Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.
Guys,
What are your plans for the summer?Please upload your away times to the group wiki.
-Kr
To: XXX, YYY, ZZZ
Send
Agenda item Summer planning added
In the mean'me, I called the cabin to check availability. Their online booking system is down at the moment. They s'll have some cabins available. Do you want to see them?
No, I had enough of this for today. Mail the pictures to my wife with some kind words.
Anything else I can do for you?Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon.
Darling, You will love the place I found for us for a vacation in August. It is by the water; at night we will hear the waves. We will be able to take our morning breakfasts on the balcony, which ...
To: Wife
Send
FUTURE RESEARCH THEMES
UNDERSTANDING INFORMATION NEEDS
• Natural language conversa#onal interface
• An#cipa#ng informa#on needs • Proac#ve recommenda#ons
It sounds like you would definitely love Norway. A cabin in the mountains maybe?
And something fun for the kids nearby, I suppose?
I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?
DATA
• Long-tail en##es • On-the-fly informa#on extrac#on • "Personal" knowledge base
• "Wife", "My students", "my group", "my espresso machine", ... en##es I care about
Here is a list of places that I think you might like.
According to the reviews that I can find on the web, ...
Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon.
Breville BES860XL Barista Express Espresso Machine
RESULT PRESENTATION & USER INTERACTION
• Providing evidence • "Ac#onable" en##es
• Make booking, order item, write email, ...
• Helping the user to get things done
• Support for task comple#on
... based on sta's'cs from the past 30 years, ...
According to your wife's calendar, ...
Agenda item Summer planning added
Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.
SUMMARY
Understanding informa'on needs
Data source(s)
Result presenta'on & user interac'on
Retrieval method
• Seman#c annota#ons • An#cipa#ng info needs • Natural language
conversa#onal interfaces
• Long tail en##es • Personal knowledge base • On-the-fly informa#on extrac#on
• Hybrid approaches
• En#ty cards • Ac#onable en##es • Support for task comple#on
ACKNOWLEDGMENTS
• Joint work with • Faegheh Hasibi
• Jan Benetka
• Darío Garigliow
• Kje#l Nørvåg
• Svein Erik Bratsberg