Entity Search: The Last Decade and the Next

59
En#ty Search The Last Decade and the Next Krisz#an Balog University of Stavanger @krisz’anbalog 10th Russian Summer School in Informa’on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016

Transcript of Entity Search: The Last Decade and the Next

Page 1: Entity Search: The Last Decade and the Next

En#ty Search The Last Decade and the Next

Krisz#an Balog University of Stavanger

@krisz'anbalog

10th Russian Summer School in Informa'on Retrieval (RuSSIR 2016) | Saratov, Russia, 2016

Page 2: Entity Search: The Last Decade and the Next

WHAT IS AN ENTITY?

• An en#ty is an "object" or "thing" in the real world that can be dis'nctly iden'fied and is characterized by the following proper#es:

• unique iden#fier(s) • name(s) • type(s) • aKributes (or descrip#on) • (typed) rela#onships to other en##es

people

products

organiza#ons

loca#ons

Page 3: Entity Search: The Last Decade and the Next
Page 4: Entity Search: The Last Decade and the Next
Page 5: Entity Search: The Last Decade and the Next

OUTLINE

2Present

1Past

3Future

now-10y +10y

Page 6: Entity Search: The Last Decade and the Next

THE PAST

1PART

The core problem of en#ty ranking and its inves#ga#on at various benchmarking evalua#on campaigns

Page 7: Entity Search: The Last Decade and the Next

EVALUATION CYCLE

02. Experimental design

03. Method development

05. Repor'ng

REVISION

04. Experimental evalua'on

IDEA

01. Task defini'on

Page 8: Entity Search: The Last Decade and the Next

ENTITY RANKING TASK

search queryretrieval method

search results

Page 9: Entity Search: The Last Decade and the Next

EVALUATION CAMPAIGNS

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Page 10: Entity Search: The Last Decade and the Next

EVALUATION CAMPAIGNS

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Task: expert finding

Input: keyword query

Data collec'on: enterprise intranet

En'ty ID: email address

ontology engineering climate change

Page 11: Entity Search: The Last Decade and the Next

xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx

xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx

TREC ENTERPRISE EXPERT FINDING

• How to rank en##es that have no direct representa#ons?

• Idea: Look at co-occurrences of en##es and query terms in documents

xxx xxxx xx xx xxxx xx x xxxxxx xxx x xxxxxx xxxx xxxx xx xxxx xx xxxx xx xxxx xx xxxxxx xx xxxx xxxxx xxx x xxxxxxx

query termsen#ty men#on

documents

Page 12: Entity Search: The Last Decade and the Next

PROFILE-BASED METHODS

• Build a direct term-based en#ty representa#on based on associated language usage

• "You shall know a word by the company it keeps." [Firth, 1957]

• Use document retrieval techniques for ranking en#ty profile documents

q

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx exxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

e

e

Page 13: Entity Search: The Last Decade and the Next

DOCUMENT-BASED METHODS

• First rank documents (or document snippets)

• Then aggregate evidence for the associated en##es

q

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx

X

eX

Xe

e

Page 14: Entity Search: The Last Decade and the Next

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

EVALUATION CAMPAIGNS

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Task: en#ty ranking in Wikipedia

Input: keyword++ query (target types/examples)

Data collec'on: Wikipedia

En'ty ID: Wikipedia ar#cle ID

Movies with eight or more Academy Awards+category: best picture oscar +category: bri#sh films +category: american films

Page 15: Entity Search: The Last Decade and the Next

INEX ENTITY RANKING

Movies with eight or more Academy Awards

+category: best picture oscar +category: bri#sh films +category: american films

Term-based representa3on

Category-based representa3on

Page 16: Entity Search: The Last Decade and the Next

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

EVALUATION CAMPAIGNS

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked DataTask: related en#ty finding

Input: keyword++ query (input en#ty, target type)

Data collec'on: Web

En'ty ID: en#ty homepage URL

airlines that currently use Boeing-747 planes+en'ty: Boeing-747 (clueweb09-..292) +target type: organiza#on

Page 17: Entity Search: The Last Decade and the Next

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

EVALUATION CAMPAIGNS

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Task: en#ty search in the Web of Data

Input: keyword query

Data collec'on: RDF triples

En'ty ID: URI

nokia e73

boroughs of New York City

disney orlando

Page 18: Entity Search: The Last Decade and the Next

FIELDED DOCUMENT REPRESENTATION FROM RDF TRIPLES

dbpedia:Audi_A4

subject objectpredicate

subjectpredicate

literal

foaf:name Audi A4 rdfs:label Audi A4 rdfs:comment The Audi A4 is a compact executive car produced since late 1994 by the German car manufacturer Audi, a subsidiary of the Volkswagen Group. The A4 has been built [...] dbpprop:production 1994 2001 2005 2008 rdf:type dbpedia-owl:MeanOfTransportation dbpedia-owl:Automobile dbpedia-owl:manufacturer dbpedia:Audi dbpedia-owl:class dbpedia:Compact_executive_car owl:sameAs freebase:Audi A4 is dbpedia-owl:predecessor of dbpedia:Audi_A5 is dbpprop:similar of dbpedia:Cadillac_BLS

Page 19: Entity Search: The Last Decade and the Next

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

EVALUATION CAMPAIGNS

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Task: ques#on answering over RDF data

Input: natural language query

Data collec'on: RDF triples

En'ty ID: URI

Which German ci#es have more than 250000 inhabitants?

Who is the youngest Pulitzer Prize winner?

Page 20: Entity Search: The Last Decade and the Next

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

EVALUATION CAMPAIGNS

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Task: ad-hoc en#ty retrieval

Input: keyword query

Data collec'on: Wikipedia + RDF triples

En'ty ID: Wikipedia ar#cle ID

NASA missions country German language

Page 21: Entity Search: The Last Decade and the Next

EVALUATION CAMPAIGNS

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

INEX Linked Data

Question Answering over Linked Data

Page 22: Entity Search: The Last Decade and the Next

DATA EVOLUTION

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

TREC Enterprise TREC Entity

INEX Entity Ranking

SemSearch

Question Answering over Linked Data

unstructured

structured

semistructured

INEX Linked Data

• Clear trend moving towards structured data • No meaningful/successful aKempt at combining unstructured and

structured data

Page 23: Entity Search: The Last Decade and the Next

QUERY EVOLUTION

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

TREC Enterprise

TREC Entity

INEX Entity Ranking

SemSearch

Question Answering over Linked Data

keyword

natural language

keyword++

INEX Linked Data

• Keyword queries are s#ll the most common way to search • From providing explicit seman#c annota#ons to natural language

ques#ons

Page 24: Entity Search: The Last Decade and the Next

WHAT HAVE WE BEEN DOING?

• Core focus has been on retrieval models, and more specifically on en'ty representa'ons

• In terms of associated language usage, descrip#on, types, aKributes

• Richer query representa#ons (i.e., query annota#ons) were taken for granted

Page 25: Entity Search: The Last Decade and the Next

image source: hKps://www.pinterest.com/pin/382946774535857111/

Page 26: Entity Search: The Last Decade and the Next

THE BIGGER PICTURE

Understanding informa'on needs

Data source(s)

Result presenta'on & user interac'on

Retrieval method

Page 27: Entity Search: The Last Decade and the Next

THE PRESENT

2PART

Current research themes on various aspects of en#ty search.

Page 28: Entity Search: The Last Decade and the Next

DATA

Page 29: Entity Search: The Last Decade and the Next

KNOWLEDGE BASES

• Modern en#ty-oriented search features are fueled by knowledge bases—need con#nuous upda#ng

• Cri#cal to be able to verify the validity of data • Supply provenance informa#on for each statement

• Validity check (s#ll) needs to be performed by a human

• Can we help human editors to maintain and expand knowledge bases?

Page 30: Entity Search: The Last Decade and the Next
Page 31: Entity Search: The Last Decade and the Next

UNDERSTANDING INFORMATION NEEDS

F. Hasibi, K. Balog, and S. E. Bratsberg. Exploi'ng En'ty Linking in Queries for En'ty Retrieval. ICTIR’16.

Page 32: Entity Search: The Last Decade and the Next

ANNOTATING QUERIES WITH ENTITIES

• Seman#c annota#ons of queries were taken for granted so far

• How can automa'c en'ty annota'ons of queries be leveraged to improve en'ty retrieval?

barack obama parents

Page 33: Entity Search: The Last Decade and the Next

APPROACH

<rdfs:label>: Ann Dunham

<dbo:abstract>: Stanley Ann Dunham, the mother of Barack Obama, was an American anthropologist who …

<dbo:birthPlace>: [ <Honolulu>, <Hawaii> ]

<dbo:child>: <Barack_Obama>

<dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]

<Barack_Obama>

Annotations:

barack obama parents

Entity-based representation D̂̂D

Term-based representation DDKnowledge base entry for ANN DUNHAM

term-basedmatching

entity-basedmatching

entity linking

<dbo:birthPlace>: [<Honolulu>, <Hawaii> ]<dbo:child>: <Barack_Obama><dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]

Query terms: <rdfs:label>: Ann Dunham<dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who …<dbo:birthPlace>: Honolulu Hawaii …<dbo:child>: Barack Obama<dbo:wikiPageWikiLink>: United States Family Barack Obama

Term-based representa3on

En3ty-based representa3on

barack obama parents

<rdfs:label>: Ann Dunham

<dbo:abstract>: Stanley Ann Dunham, the mother of Barack Obama, was an American anthropologist who …

<dbo:birthPlace>: [ <Honolulu>, <Hawaii> ]

<dbo:child>: <Barack_Obama>

<dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]

<Barack_Obama>

Annotations:

barack obama parents

Entity-based representation D̂̂D

Term-based representation DDKnowledge base entry for ANN DUNHAM

term-basedmatching

entity-basedmatching

entity linking

<dbo:birthPlace>: [<Honolulu>, <Hawaii> ]<dbo:child>: <Barack_Obama><dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …]

Query terms: <rdfs:label>: Ann Dunham<dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who …<dbo:birthPlace>: Honolulu Hawaii …<dbo:child>: Barack Obama<dbo:wikiPageWikiLink>: United States Family Barack Obama

<Barack_Obama>

en'ty annota'on (automa'c)

Page 34: Entity Search: The Last Decade and the Next

RESULTS

MAP

0,00

0,06

0,11

0,17

0,22

LM MLM-tc MLM-all PRMS SDM FSDM

baseline +ELR

Page 35: Entity Search: The Last Decade and the Next

ANALYSIS

Page 36: Entity Search: The Last Decade and the Next

SUMMARY

• Automa#cally annota#ng queries with en##es can significantly improve retrieval performance

• Open research problem: • How should a query be answered (list, fact, table, etc.)?

Page 37: Entity Search: The Last Decade and the Next

ENTITY SUMMARIES

Page 38: Entity Search: The Last Decade and the Next

ENTITY SUMMARIES

• Summaries serve a dual purpose • Synopsis of the en#ty • Provide evidence why the en#ty is a good answer

for the given query

• How to generate dynamic en'ty summaries that can directly address users’ informa'on needs?

• Two subtasks • Fact ranking — What should be in the summary? • Summary genera#on — How should it be presented?

Page 39: Entity Search: The Last Decade and the Next
Page 40: Entity Search: The Last Decade and the Next

ANTICIPATING INFORMATION NEEDS

J. Benetka, K. Balog, and K. Nørvåg. An'cipa'ng Informa'on Needs Based on Check-in Ac'vity. WSDM’17.

Page 41: Entity Search: The Last Decade and the Next

ZERO-QUERY SEARCH

• Proac8ve instead of reac8ve search • "An#cipate user needs and respond with

informa#on appropriate to the current context without the user having to enter a query" — (Allan et al., SIGIR Forum 2012)

• Using a person's check-in ac3vity as context, can we an3cipate her informa3on needs, and respond with a set of informa3on cards that directly address those needs?

Terminal

Weather21ºC

Traffic

Page 42: Entity Search: The Last Decade and the Next

INFORMATION NEEDS FOR ACTIVITIES

• What are relevant informa#on needs in the context of a given ac#vity?

• Use POI categories (Foursquare) to represent ac#vi#es • Mine informa#on needs from search sugges#ons

Page 43: Entity Search: The Last Decade and the Next

ANTICIPATING INFORMATION NEEDS

• Maximize the likelihood of sa#sfying the user's informa#on needs by considering each possible ac#vity that might follow next

• Transi#on probabili#es are es#mated based on historical check-in data

Activity A

Activity B

Activity C

Activity D

45%

34%

21%

?

Page 44: Entity Search: The Last Decade and the Next

Train Test80%

User 3

User 2

User 1

Check-in dataset

EVALUATION METHODOLOGY

Terminal

Weather21ºC

Traffic

Page 45: Entity Search: The Last Decade and the Next

RESULTSNGCD@5

0,00

0,23

0,45

0,68

0,90

Top level Second level

Most frequent informa#on needs, regardless of the last ac#vity

M0

Consider informa#on needs for all possible upcoming ac#vi#esIn addi#on, consider the informa#on needs relevant to the past ac#vity (fixed weight for all info needs)

Consider the temporal sensi#vity of each informa#on need individually

M1

M2

M3

Page 46: Entity Search: The Last Decade and the Next

SUMMARY

• Iden#fying informa#on needs that are relevant in the context of a given ac#vity and proac#vely presen#ng informa#on cards addressing those needs

• Open research problems • Other contexts

• (Access to data, privacy...)

Page 47: Entity Search: The Last Decade and the Next

THE FUTURE

3PART

Making the right informa#on available to the right person at the right #me.

Page 48: Entity Search: The Last Decade and the Next

IMAGINARY SCENARIO WITH AN INTELLIGENT PERSONAL ASSISTANT

Page 49: Entity Search: The Last Decade and the Next

I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?Sure. I want an ac've holiday with

the family in beau'ful nature.

It sounds like you would definitely love Norway. A cabin in the mountains maybe?

Could be. But I want to go kayaking and also catch some fish. And not too much rain, please.

And something fun for the kids nearby, I suppose?

Of course.

How does Oltedal sound? People have been quite successful with catching lake trout based on what I found on Instagram.

There is also a theme park and horse riding, both within 50kms.

Page 50: Entity Search: The Last Decade and the Next

And what about the weather? You know we’re talking about Norway, right…? Anyway, based on sta's'cs from the past 30 years, this is one of the areas with the least amount of rain if you go in August.

I see. What about accommoda'on?

Here is a list of places that I think you might like.

Any opinions on this one?

According to the reviews that I can find on the web, the cabins are well equipped, the staff is nice and they even allow guests to borrow their kayaks.

Page 51: Entity Search: The Last Decade and the Next

OK. Let’s find a date that works for everyone. According to your wife's calendar, her

parents will be visi'ng you in the first week of August. School starts for the kids on the week of Aug 22. So there is a two week window between Aug 8 and 21, assuming that I can cancel the regular weekly mee'ngs with your PhD students.

That's fine. The students won't mind. Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.

Guys,

What are your plans for the summer?Please upload your away times to the group wiki.

-Kr

To: XXX, YYY, ZZZ

Send

Agenda item Summer planning added

Page 52: Entity Search: The Last Decade and the Next

In the mean'me, I called the cabin to check availability. Their online booking system is down at the moment. They s'll have some cabins available. Do you want to see them?

No, I had enough of this for today. Mail the pictures to my wife with some kind words.

Anything else I can do for you?Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon.

Darling, You will love the place I found for us for a vacation in August. It is by the water; at night we will hear the waves. We will be able to take our morning breakfasts on the balcony, which ...

To: Wife

Send

Page 53: Entity Search: The Last Decade and the Next

FUTURE RESEARCH THEMES

Page 54: Entity Search: The Last Decade and the Next

UNDERSTANDING INFORMATION NEEDS

• Natural language conversa#onal interface

• An#cipa#ng informa#on needs • Proac#ve recommenda#ons

It sounds like you would definitely love Norway. A cabin in the mountains maybe?

And something fun for the kids nearby, I suppose?

I see you're was'ng 'me away on Facebook. Do you have 'me now to talk about your holiday plans?

Page 55: Entity Search: The Last Decade and the Next

DATA

• Long-tail en##es • On-the-fly informa#on extrac#on • "Personal" knowledge base

• "Wife", "My students", "my group", "my espresso machine", ... en##es I care about

Here is a list of places that I think you might like.

According to the reviews that I can find on the web, ...

Order a water filter for my espresso machine. I just found out that it'll need to be replaced soon.

Breville BES860XL Barista Express Espresso Machine

Page 56: Entity Search: The Last Decade and the Next

RESULT PRESENTATION & USER INTERACTION

• Providing evidence • "Ac#onable" en##es

• Make booking, order item, write email, ...

• Helping the user to get things done

• Support for task comple#on

... based on sta's'cs from the past 30 years, ...

According to your wife's calendar, ...

Agenda item Summer planning added

Write them an email to upload their holiday plans to the group wiki, and add summer planning to the next group mee'ng's agenda.

Page 57: Entity Search: The Last Decade and the Next

SUMMARY

Understanding informa'on needs

Data source(s)

Result presenta'on & user interac'on

Retrieval method

• Seman#c annota#ons • An#cipa#ng info needs • Natural language

conversa#onal interfaces

• Long tail en##es • Personal knowledge base • On-the-fly informa#on extrac#on

• Hybrid approaches

• En#ty cards • Ac#onable en##es • Support for task comple#on

Page 58: Entity Search: The Last Decade and the Next

ACKNOWLEDGMENTS

• Joint work with • Faegheh Hasibi

• Jan Benetka

• Darío Garigliow

• Kje#l Nørvåg

• Svein Erik Bratsberg

Page 59: Entity Search: The Last Decade and the Next

QUESTIONS?

@krisz'anbalog krisz#anbalog.com