Representing Texts as contextualized Entity Centric Linked Data Graphs
-
Upload
andre-freitas -
Category
Technology
-
view
626 -
download
3
description
Transcript of Representing Texts as contextualized Entity Centric Linked Data Graphs
,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andre Freitas1 Joao C. P. da Silva2 Danilo S. Carvalho2
Sean O’Riain1 Edward Curry1
1DERI - Ireland2Universidade Federal Rio de Janeiro - Brazil
August 27, 2013
Freitas, Silva, Carvalho, O’Riain, Curry 1/ 54
,
Outline
Motivation & Objective
Existing Approaches
Structured Discourse Graphs (SDGs)
Representation RequirementsSemantic Model Elements & Graph PatternsSemantic Model - Formalization
Extraction
Graphia ExtractorPreliminary EvaluationExtraction Examples
Conclusion & Future Work
Freitas, Silva, Carvalho, O’Riain, Curry 2/ 54
,
Motivation & Objective
Freitas, Silva, Carvalho, O’Riain, Curry 3/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 4/ 54
,
Motivation
Freitas, Silva, Carvalho, O’Riain, Curry 5/ 54
,
Motivation
Integration of unstructured information into the Linked Data Web
Natural Language Texts
Terminological variation
Complex context patterns
Ambiguous sentences
Linked Data
Terminological and structural regularity
Consensual semantics - agreement between data consumers
Freitas, Silva, Carvalho, O’Riain, Curry 6/ 54
,
Motivation
Traditional Information Extraction
Triples - Primary concerns
accuracy
consistency
lexical and structural normalization (schema/ontology)
Complemented - Best-Effort Information Extraction
domain-independency
context capture
maximization of the text semantics representation
wider extraction scope
Freitas, Silva, Carvalho, O’Riain, Curry 7/ 54
,
Motivation
What is the relationship between Barack Obama and Indonesia?
Freitas, Silva, Carvalho, O’Riain, Curry 8/ 54
,
Objective
Structured Discourse Graph (SDG) Model
Improve semantic integration, representation and interpretation of
unstructured text within the context of Linked Data.
Freitas, Silva, Carvalho, O’Riain, Curry 9/ 54
,
Objective
Features
Entity-Centric Data Model
Ontology/Vocabulary Agnostic Representation
Context Representation
Graph Extraction
Interpretation (Navegational) Model
Freitas, Silva, Carvalho, O’Riain, Curry 10/ 54
,
Existing Approaches
Freitas, Silva, Carvalho, O’Riain, Curry 11/ 54
,
Existing Approaches
Simple Relations (Triples)
Discourse Representation Structure (DRS)
Ontology-based
Freitas, Silva, Carvalho, O’Riain, Curry 12/ 54
,
Our Work
Focuses on:
Providing a principled description of a representation modelcomplementary to existing approaches
Supports context representation
Defining a graph representation approach for texts which canbe easily integrated to the Linked Data Web
Freitas, Silva, Carvalho, O’Riain, Curry 13/ 54
,
Structured Discourse Graphs - SDGs
Representation Requirements
Freitas, Silva, Carvalho, O’Riain, Curry 14/ 54
,
SDGs - Representation Requirements
Entity-centric graph model
Entity pivot : named entity (subject or object)
Document Centric ⇒ Entity Centric
Maximized representation of text semantics
Information in sentence ⇒ graph representation
Resulting graph should support an algorithmic interpretation ofthe extracted SDG
Conceptual model independency
Ontology agnostic
Freitas, Silva, Carvalho, O’Riain, Curry 15/ 54
,
SDGs - Representation Requirements
Context Capture & Representation
Contextual information related to a triple statement should berepresented in the extracted graph
Contextual statements (such as temporality) define the contextin which another statement holds
Dependencies between sentences in the text should also bemade explicit
Freitas, Silva, Carvalho, O’Riain, Curry 16/ 54
,
SDGs - Representation Requirements
Pay-as-you-go semantic reference
Unstructured text may contain complex semantic dependencies
Should support the evolution and refinement of the semanticmodel
Standardized representation compatibility
Maximize compatibility with a standards-based datarepresentation format, facilitating
graph integration
interoperability on the Web
Freitas, Silva, Carvalho, O’Riain, Curry 17/ 54
,
Structured Discourse Graphs - SDGs
Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 18/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigiddesignators stands for the referent
Rigid designators: people, places, organization, biologicalspecies, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 19/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigiddesignators stands for the referent
Rigid designators: people, places, organization, biologicalspecies, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 20/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Text Segmentation: (subject,predicate,object)
Named Entities
Refer to the description of entities for which one or many rigiddesignators stands for the referent
Rigid designators: people, places, organization, biologicalspecies, substances, etc
Freitas, Silva, Carvalho, O’Riain, Curry 21/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Co-Referential Elements
Two types co-references: pronominal and non-pronominal
Co-references can refer to either intra or inter sentences
Substituting the co-referent term by the named entity
Can corrupt the semantics of the representation
Freitas, Silva, Carvalho, O’Riain, Curry 22/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Context Elements
A semantic interpretation may depend on different contexts(temporal context)
The main contextual information is intra-sentence
Intra-sentence context ⇒ reification
Freitas, Silva, Carvalho, O’Riain, Curry 23/ 54
,
SDGs - Semantic Model Elements
In late 1988, Obama entered Harvard Law School.
Quantifiers & Generic Operators
Quantifier: one, two, (cardinal numbers), many (much),some, all, thousands of, one of, several, only, most ofNegation: notModal: could, may, shall, need to, have to, must, maybe,always, possiblyComparative: largest, smallest, most, largest, smallest, thesame, is equal, like, similar to, more than, less than
Freitas, Silva, Carvalho, O’Riain, Curry 24/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Text segmentation: (Obama,entered,Harvard Law School)
Named entities: Obama, Harvard Law School
Resolve co-references: Barack Obama
Context representation: time
Freitas, Silva, Carvalho, O’Riain, Curry 25/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
In late 1988, Obama entered Harvard Law School.
Resolved & Normalized Entities
Resolved entities: a node-substitution in the graph was madefrom a co-reference to a named entity
Normalized entities: entities are transformed to a normalizedform (September 1st of 2010 to 01/09/2010)
Freitas, Silva, Carvalho, O’Riain, Curry 26/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Later in 2007, Obama sponsored an amendment to theDefense Authorization Act to add safeguards for
personality-disorder military discharges.
Freitas, Silva, Carvalho, O’Riain, Curry 27/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 28/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
He served three terms representing the 13th District in theIllinois Senate from 1997 to 2004.
Non-Named (Generic) Entities
Non-named entities map to non-rigid designators
Are more subject to vocabulary variation
Have more complex compositional patterns
Freitas, Silva, Carvalho, O’Riain, Curry 29/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Following high school, Obama moved to Los Angeles in 1979to attend Occidental College.
Triple Trees
Not all facts extracted can be represented in one triple.
Transformation from the syntactic tree to a set of triples
The sentence subject defines the root node
Interpretation: DFS traversal of the tree
Freitas, Silva, Carvalho, O’Riain, Curry 30/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
He won election to the U.S. Senate in Illinois in November2004.
Pronominal Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 31/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
On Agust 23, Obama announced his election of DelawareSenator Joe Biden as his vice presidential running mate.
Pronominal Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 32/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
As a member of the Senate Foreign Relations Commitee,Obama made official trips to Eastern Europe, the Middle
East, Central Asis and Africa.
Conjunctive Co-Reference
Freitas, Silva, Carvalho, O’Riain, Curry 33/ 54
,
SDGs - Semantic Model Elements & Graph Patterns
Freitas, Silva, Carvalho, O’Riain, Curry 34/ 54
,
Structured Discourse Graphs - SDGs
Semantic Model - Formalization
Freitas, Silva, Carvalho, O’Riain, Curry 35/ 54
,
Semantic Model - Formalization
Graph pattern: atomic graph structure which maps to adiscourse structure
Named and Non-named Entities: [[ne]], [[∼ ne]] ∈ U,where U is a set of IRIs
Basic Triple: tr = (es , p, eo) where es , eo representnamed/non-named entities associated with the subject (s) andobject (o), and p represents a relation between es and eo ,interpreted by [[tr ]] = ([[es ]], [[p]], [[eo ]]) ∈ U × U × U
Core Triple: trc = (nes , p, neo)
Freitas, Silva, Carvalho, O’Riain, Curry 36/ 54
,
Semantic Model - Formalization
Reification Triple: trrei = (tr , reilink , reiobj )
tr - basic triplesreilink - relationreiobj - reification object
Temporal Reification: special reification triple
reilink : timereiobj : explicit or implicit data references
Interpretation:[[trrei ]] = ([[tr ]], [[reilink ]], [[reiobj ]]) ∈ U × U × U
Freitas, Silva, Carvalho, O’Riain, Curry 37/ 54
,
Semantic Model - Formalization
Quantifier Operators & Generic Operators:opt = (eo , oplink , op)
eo - object element in a basic triple trop - specific operator of eo wrt oplink
Interpretation:
[[opt]] = ([[proj3(tr)]], [[oplink ]], [[op]]) = ([[eo ]], [[oplink ]], [[op]])
where proj3(tr) = proj3((es , p, eo)) = eo
Freitas, Silva, Carvalho, O’Riain, Curry 38/ 54
,
Semantic Model - Formalization
Conjunctive Co-Reference: ccr =⋃n
i=0{(e, conjlinki , nei )}
Interpretation:
[[ccr ]] = {([[e]], [[conjlinki ]], [[nei ]]) ∈ U × U × U such that
[[e]] = [[proj3(tr)]] and (∧ni=0[[nei ]] sameas [[e]])}
Freitas, Silva, Carvalho, O’Riain, Curry 39/ 54
,
Semantic Model - Formalization
Possessive/Reflexive/Demonstrative Co-Reference:
pcr = {(∼ nei , coreflink , pr), (pr , coreflink , ej )}
Interpretation:
[[pcr ]] ={([[proj1(tr)]], [[coreflink ]], [[pr ]]), ([[pr ]], [[coreflink ]], [[proj3(tr)]]) ∈U × U × U such that tr = (∼ nei , p, ej)}
Freitas, Silva, Carvalho, O’Riain, Curry 40/ 54
,
Semantic Model - Formalization
Extracted Graph: set of basic and reified triples and generic,quantifier and co-reference operators.
Paths
Basic: sequence of basic triples
Reified: basic paths with some reified triples associated
Operational: basic paths with some operators associated
Complex: contains both reified and operational paths
Freitas, Silva, Carvalho, O’Riain, Curry 41/ 54
,
Semantic Model - Formalization
Context Triples: (tr , contextlink , ct) which indicates that abasic triple tr can be associated with a specific context ct[[context]] = ([[tr ]], [[contextlink ]], [[ct]]) ∈ U3 × U × U
Multi-Context Graphs: is an extracted graph with morethan one context associated to its triples
if all basic triples in a path belong to an unique (same)context, the path is an unique context (basic, reified,operational or complex) path
otherwise, we call this path a multi-context path
Freitas, Silva, Carvalho, O’Riain, Curry 42/ 54
,
Extraction
Graphia
Freitas, Silva, Carvalho, O’Riain, Curry 43/ 54
,
Graphia
http://graphia.dcc.ufrj.br
Graphia is an information extraction pipeline
Takes factual text as input and produces SDGs as output
Graphia’s modules combine state-of-art NLP tools with anefficient set of heuristics
Can build graphs for sentences andentire documents.
Freitas, Silva, Carvalho, O’Riain, Curry 44/ 54
,
Graphia - Extraction Pipeline Architecture
Freitas, Silva, Carvalho, O’Riain, Curry 45/ 54
,
Graphia - Preliminary Evaluation
1033 relations (triples) from 150 sentences from 5 randomlyselected Wikipedia articles
Manually classified the graphs
Correct Graphs: fully consistent with the semantic model
Complete Graphs: correct graph which maps all theinformation of a sentence
Interpretable Graphs: graph fragment which has the correctsemantics of its basic triple paths
Freitas, Silva, Carvalho, O’Riain, Curry 46/ 54
,
Graphia - Extraction Examples
In 2002 GE acquired the wind power assets of Enron during itsbankruptcy proceedings.
Freitas, Silva, Carvalho, O’Riain, Curry 47/ 54
,
Graphia - Extraction Examples
In 1935, GE was one of the top 30 companies tradedat the London Stock Exchange.
Freitas, Silva, Carvalho, O’Riain, Curry 48/ 54
,
Graphia - Extraction Examples
The Radio Corporation of America (RCA) was founded by GE in1919 to further international radio.
Freitas, Silva, Carvalho, O’Riain, Curry 49/ 54
,
Graphia - Extraction Examples
It acquired ScanWind in 2009.
Freitas, Silva, Carvalho, O’Riain, Curry 50/ 54
,
Graphia - Extraction Examples
The new company, named GXS, is based in Gaithersburg,Maryland.
Freitas, Silva, Carvalho, O’Riain, Curry 51/ 54
,
Conclusion & Future Work
Freitas, Silva, Carvalho, O’Riain, Curry 52/ 54
,
Conclusion & Future Work
Conclusion
SDGs provide a discourse representation as a set ofcontextualized relationships
Supports entity-centric integration between graphs
Conceptual Model Independency
Worth putting effort on enumerable patterns (timestamps,operators)
Future Work
More complex sentence patterns
Freitas, Silva, Carvalho, O’Riain, Curry 53/ 54
,
Representing Texts as Contextualized
Entity-Centric Linked Data Graphs
Andre Freitas1 Joao C. P. da Silva2 Danilo S. Carvalho2
Sean O’Riain1 Edward Curry1
1DERI - Ireland2Universidade Federal Rio de Janeiro - Brazil
August 27, 2013
Freitas, Silva, Carvalho, O’Riain, Curry 54/ 54