EVELIN: Exploration of Event and Entity Links in Implicit ... · tion of events. EVELIN is an...

1
EVELIN: Exploration of Event and Entity Links in Implicit Networks Andreas Spitz, Satya Almasian and Michael Gertz Database Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany Representing Document Collections as Implicit Networks of Entities EVELIN Online Query Interface Implicit entity networks are graph struc- tures that can be constructed easily from large collections of text documents. Un- like knowledge bases, implicit networks do not encode explicit, discrete relations between entities. Instead, they represent the strength of entity relations as weights of the edges between entities. These weights support ad-hoc information re- trieval on the underlying document col- lection. The recently introduced LOAD networks [1] include entities of the types location, organization, actor, and date, which frequently appear in the descrip- tion of events. EVELIN is an interactive online tool for the exploration and analy- sis of such entity-centric networks. How It Works: For edges between entities x X and y Y , we generate weights ω (x, y ) from their textual distances δ (measured in the number of sentences) over all instances I in which x and y are mentioned together. We normalize by neighbourhood size N . ω (x, y ) := log |Y | |N (x) Y | X iI e -δ (x,y,i) Demo Document Collection: LOAD network created from the text of the English Wikipedia (excluding struc- tured tables). All entity mentions are linked to Wikidata entities by following Wikipedia links. Temporal expressions are annotated with HeidelTime [2], entity types classified according to YAGO [3]. Online Demonstration: The EVELIN online search interface for the demonstration data is available as a web application. The in- terface is mobile friendly and you can try it out for yourself at: evelin.ifi.uni-heidelberg.de Input Entity Selection Entity Queries and Term Queries Composing Queries from Entities: Enter a string and use the pop-up menu to select one of the suggested Wikidata entities from the database of stored en- tities that occur in the documents. Alter- natively, press enter to input the string as an untyped search term. Continue adding more entities or launch a query. A query consists of a set of query enti- ties Q and some target set X . They are launched by selecting the corresponding target set button. The result of a query is a ranking of all entities in X accord- ing to their relation strength to entities in the query set. An entity query results in a ranking of locations, organizations, ac- tors or dates. Entities in the result include links to the Wikidata knowledge base. A term query returns a ranking of stemmed keywords for the query entities. Query Extension: Discovered entities can be added to grow the query with a double click on the entity. How It Works: We rank all entities x in target set X in relation to entities in the query set Q in a two component ranking scheme. It ac- counts for the number of common neigh- bours and the weights of edges between the target and query entities. r (x|Q) := |N (x) Q|- 1+ 1 s max X q Q ω (q,x) Sentence Queries Similar to entities, sentences can be used as a query target. Here, the aim is to extract sentences that best describe an entity or a combination of entities. How It Works: A set of the most relevant terms T Q for the query entities in Q is determined and used to refine the ranking of sentences s in a two component scheme. r (s|Q) := |N (s) Q| + |N (s) T Q | |T Q | +1 Page Queries Entities or sets of entities can be linked to informative pages in the document col- lection that describe them well. Since we use Wikipedia as a document collection, we link entities to Wikipedia pages. How It Works: Ranking scores are computed for sen- tences s based on the input entities and then propagated to the pages p. r (p|Q) := max sp {|N (s)Q|}+ 1 s max X sp r (s|Q) Subgraph Queries For exploratory purposes, subgraphs can be extracted around the set of query enti- ties to highlight the immediate neighbour- hood. Here, we consider the three high- est ranked entities of each type for the subgraph construction. Due to the den- sity of neighbourhoods, this extraction tends to be slower than ranking queries. Contact Information: Andreas Spitz [email protected] http://dbs.ifi.uni-heidelberg.de/ References [1 ] A. Spitz and M. Gertz: Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events. SIGIR’16, 2016 [2 ] J. Str¨ otgen and M. Gertz: Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 47(2): 269–298, 2013 [3] F. Mahdisoltani, J. Biega and F. Suchanek: YAGO3: A Knowledge Base from Multilingual Wikipedias. CIDR’14, 2014 This demonstration was presented at the 26 th International World Wide Web Conference (WWW ’17), April 3-7, 2017, Perth, Australia.

Transcript of EVELIN: Exploration of Event and Entity Links in Implicit ... · tion of events. EVELIN is an...

Page 1: EVELIN: Exploration of Event and Entity Links in Implicit ... · tion of events. EVELIN is an interactive online tool for the exploration and analy-sis of such entity-centric networks.

EVELIN:Exploration of Event and Entity Links in Implicit Networks

Andreas Spitz, Satya Almasian and Michael GertzDatabase Systems Research Group, Heidelberg University, Im Neuenheimer Feld 205, 69120 Heidelberg, Germany

Representing Document Collections as Implicit Networks of Entities EVELIN Online Query Interface

Implicit entity networks are graph struc-tures that can be constructed easily fromlarge collections of text documents. Un-like knowledge bases, implicit networksdo not encode explicit, discrete relationsbetween entities. Instead, they representthe strength of entity relations as weightsof the edges between entities. Theseweights support ad-hoc information re-trieval on the underlying document col-lection. The recently introduced LOADnetworks [1] include entities of the typeslocation, organization, actor, and date,which frequently appear in the descrip-tion of events. EVELIN is an interactiveonline tool for the exploration and analy-sis of such entity-centric networks.

How It Works:For edges between entities x ∈ X andy ∈ Y , we generate weights ω(x, y) fromtheir textual distances δ (measured in thenumber of sentences) over all instances Iin which x and y are mentioned together.We normalize by neighbourhood size N .

ω(x, y) :=

(log

|Y ||N(x) ∩ Y |

)∑i∈I

e−δ(x,y,i)

Demo Document Collection:LOAD network created from the text ofthe English Wikipedia (excluding struc-tured tables). All entity mentions arelinked to Wikidata entities by followingWikipedia links. Temporal expressionsare annotated with HeidelTime [2], entitytypes classified according to YAGO [3].

Online Demonstration:The EVELIN online search interface forthe demonstration data is available as aweb application. The in-terface is mobile friendlyand you can try it out foryourself at:

evelin.ifi.uni-heidelberg.de

Input Entity Selection Entity Queries and Term Queries

Composing Queries from Entities:Enter a string and use the pop-up menuto select one of the suggested Wikidataentities from the database of stored en-tities that occur in the documents. Alter-natively, press enter to input the stringas an untyped search term. Continueadding more entities or launch a query.

A query consists of a set of query enti-ties Q and some target set X. They arelaunched by selecting the correspondingtarget set button. The result of a queryis a ranking of all entities in X accord-ing to their relation strength to entities inthe query set. An entity query results ina ranking of locations, organizations, ac-tors or dates. Entities in the result includelinks to the Wikidata knowledge base. Aterm query returns a ranking of stemmedkeywords for the query entities.

Query Extension:Discovered entities can be added to growthe query with a double click on the entity.

How It Works:We rank all entities x in target set X inrelation to entities in the query set Q ina two component ranking scheme. It ac-counts for the number of common neigh-bours and the weights of edges betweenthe target and query entities.

r(x|Q) := |N(x)∩Q| − 1 +1

smax

∑q∈Q

ω(q, x)

Sentence Queries

Similar to entities, sentences can beused as a query target. Here, the aimis to extract sentences that best describean entity or a combination of entities.

How It Works:A set of the most relevant terms TQ forthe query entities in Q is determined andused to refine the ranking of sentences sin a two component scheme.

r(s|Q) := |N(s) ∩Q| + |N(s) ∩ TQ||TQ| + 1

Page Queries

Entities or sets of entities can be linkedto informative pages in the document col-lection that describe them well. Since weuse Wikipedia as a document collection,we link entities to Wikipedia pages.

How It Works:Ranking scores are computed for sen-tences s based on the input entities andthen propagated to the pages p.

r(p|Q) := maxs∈p{|N(s)∩Q|}+ 1

smax

∑s∈p

r(s|Q)

Subgraph Queries

For exploratory purposes, subgraphs canbe extracted around the set of query enti-ties to highlight the immediate neighbour-hood. Here, we consider the three high-est ranked entities of each type for thesubgraph construction. Due to the den-sity of neighbourhoods, this extractiontends to be slower than ranking queries.

Contact Information:Andreas [email protected]://dbs.ifi.uni-heidelberg.de/

References[1 ] A. Spitz and M. Gertz: Terms over LOAD: Leveraging Named Entities for Cross-Document

Extraction and Summarization of Events. SIGIR’16, 2016[2 ] J. Strotgen and M. Gertz: Multilingual and Cross-domain Temporal Tagging.

Language Resources and Evaluation, 47(2): 269–298, 2013[3 ] F. Mahdisoltani, J. Biega and F. Suchanek: YAGO3: A Knowledge Base from Multilingual Wikipedias. CIDR’14, 2014

This demonstration was presented at the 26th International World Wide Web Conference (WWW ’17), April 3-7, 2017, Perth, Australia.