Motivation - Krisztian Balog

5
Entity Search: Building Bridges between Two Worlds Krisztian Balog, Edgar Meij, and Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl Entity search Information organized around entities Instead of finding documents about the entity, find the entity itself Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities Entity search tasks Entity ranking List completion Related entity finding Motivation To which extent are IR and SW methods capable of answering information needs related to entity finding? Where are we now? Information Retrieval Identifying and ranking entities in large volumes of data Mostly based on co-occurrences between terms and entities Generated models are not always meaningful for human consumption Where are we now? Semantic Web Structured data, naturally organized around entities Entity retrieval is as simple as running SPARQL queries? Free-text querying is more appealing to (naive) end users

Transcript of Motivation - Krisztian Balog

Page 1: Motivation - Krisztian Balog

Entity Search: Building Bridges between Two Worlds

Krisztian Balog, Edgar Meij, and Maarten de RijkeISLA, University of Amsterdamhttp://ilps.science.uva.nl

Entity search

• Information organized around entities

• Instead of finding documents about the entity, find the entity itself

• Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities

Entity search tasks

• Entity ranking

• List completion

• Related entity finding

Motivation

• To which extent are IR and SW methods capable of answering information needs related to entity finding?

Where are we now?

• Information Retrieval

• Identifying and ranking entities in large volumes of data

• Mostly based on co-occurrences between terms and entities

• Generated models are not always meaningful for human consumption

Where are we now?

• Semantic Web

• Structured data, naturally organized around entities

• Entity retrieval is as simple as running SPARQL queries?

• Free-text querying is more appealing to (naive) end users

Page 2: Motivation - Krisztian Balog

Related entity finding

• Given

• Input entity E (name plus homepage)

• Type T of the target entity (person, organization, or product)

• Narrative R (describes nature of relation)

• Return homepages of related entities

Example topics(E) Source entity name

(E) Source entity URL

(T) Target type

(R) Narrative

Medimmune, Inc.

clueweb09-en0008-26-39300

Product

Products of Medimmune, Inc.

(E) Source entity name

(E) Source entity URL

(T) Target type

(R) Narrative

Boeing 747

clueweb09-en0005-75-02292

Organisation

Airlines that currently use Boeing 747 planes.

Aim

• Compare IR and SW approaches on the related entity finding task

• Focusing on finding all relevant entities, but not on actually ranking them

Related entity findingOur variation

• TREC Entity 2009 topics (20)

• Map source entity to a Wikipedia page (17)

• Map target category to the most specific class within the DBPedia ontology

• Ground truth: Wikipedia pages from relevance assessments

Example topic(E) Source entity name

(E) Source entity URL

(T) Target type

(R) Narrative

Boeing 747

clueweb09-en0005-75-02292

Organisation

Airlines that currently use Boeing 747 planes.

Source entity

DBPedia-owl

Relation

Boeing_747

Organisation/Company/Airline

Airlines that currently use Boeing 747 planes.

IR approaches

• Aggregation of approaches employed at the TREC Entity track

• Various ways of recognizing and ranking entities

• Common to all is a mechanism for capturing the co-occurrence between source and target entities

Page 3: Motivation - Krisztian Balog

A typical IR approachQuery (input entity, relation)

Document/snippet retrieval

Answer candidate extraction

Answer candidate (type) filtering

Answer candidate ranking

Output (related entities)

Two SW approaches

• SPARQL query

• Exhaustive graph search

• Find all paths between E and T in a knowledge base

• The depth of search is limited

SELECT DISTINCT ?m ?rWHERE { ?m rdf:type dbpedia-owl:Drug . { ?m ?r dbpedia:MedImmune } UNION { dbpedia:MedImmune ?r ?m }}

SPARQL on DBPedia

Query: Products of Medimunne, Inc.

?m ?r

dbpedia:Amifostine dbp-prop:wikilink

dbpedia:Blinatumomab dbp-prop:wikilink

dbpedia:Motavizumab dbp-prop:wikilink

dbpedia:Palivizumab dbp-prop:wikilink

SPARQL on DBPediaQuery: Airlines that Air Canada has code

share flights with.

?m ?r

dbpedia:Air_Canada dbp-prop:wikilink

dbpedia:Austrian_Airlines dbp-prop:wikilink

dbpedia:Japan_Airlines dbp-prop:wikilink

dbpedia:Lufthansa dbp-prop:wikilink

dbpedia:Turkish_Airlines dbp-prop:wikilink

......

dbpedia:Air_Ontario dbp-ontology:Company/parentCompany

dbpedia:Air_Canada_Tango dbp-ontology:Company/parentCompany

dbpedia:Canadian_Airlines dbp-ontology:foundationPerson

SPARQL on DBPediaQuery: Members of the band Jefferson Airplane.

?m ?r

dbpedia:Jim_Morrison dbp-prop:wikilink

dbpedia:Jimi_Hendrix dbp-prop:wikilink

......

dbpedia:Jack_Casady dbp-ontology:associatedMusicalArtist

dbpedia:Paul_Kantner dbp-ontology:associatedMusicalArtist

dbpedia:Joey_Covington dbp-ontology:associatedMusicalArtist

dbpedia:Marty_Balin dbp-ontology:associatedMusicalArtist

......

dbpedia:Grace_Slick dbp-prop:pastMembers

dbpedia:Jorma_Kaukonen dbp-prop:pastMembers

......

Findings

• IR and SW methods find basically the same set of entities

• Most relations returned by SW methods are of type wikilink

Page 4: Motivation - Krisztian Balog

Next

• Extend search to Linked Open Data (LOD)

• We use the Linked Data Semantic Repository (LDSR)

SPARQL on LOD

?m ?r

dbpedia:Amifostine dbp-prop:wikilink

dbpedia:Blinatumomab dbp-prop:wikilink

dbpedia:Motavizumab dbp-prop:wikilink

dbpedia:Palivizumab dbp-prop:wikilink

dbpedia:Motavizumab fb:base.bioventurist.product.developed_by

dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products

dbpedia:Motavizumab fb:base.bioventurist.product.developed_by

dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products

Query: Products of Medimunne, Inc.

Graph search on LOD Findings

• More entities as well as more diverse relations

• Having more data does not automatically improve results

• Some of the identified entities are now too general

Summarizing findings

• Information Retrieval

• Excellent ways of finding associations between topics and entities

• Tend to perform better for less popular entities (not represented in LOD)

• Missing: semantics of the found associations

Summarizing findings• Semantic Web

• Has the potential of generating a large number of candidate entities and relations

• Could be as simple as instantiating a SPARQL query

• For many queries LOD is very sparse w.r.t. semantically meaningful links between entities

Page 5: Motivation - Krisztian Balog

Zooming out

• Enhance text-based models with semantic information from LOD

• Use IR models to discover and label links between entities in LOD

TREC Entity 2010

• Main task: Related entity finding

• Pilot task: List completion

• Given URIs of related entities, complete the list with additional entities from LOD

Questions?Krisztian Balog

http://staff.science.uva.nl/~kbalog