Semantic relations: new (terminological) challenges in a world of Linked Data
-
Upload
nathalie-aussenac-gilles -
Category
Data & Analytics
-
view
222 -
download
0
Transcript of Semantic relations: new (terminological) challenges in a world of Linked Data
Semantic relations: new challenges in a world of
linked data
Nathalie Aussenac-Gilles
(IRIT – CNRS, Toulouse, France)
Outline of the talk
• Semantic relations
– what do we mean?
– why do we need them for?
• Finding semantic relations
– what are the issues?
– what can large corpora and machine learning do for you?
– how can terminology contribute to Linked Data?
2Semantic relations - Aussenac09/01/2015
Semantic relations,what do we mean?
Research field
• Linguistics: semantic relations, semantic roles, discourse relations
• Terminology– Weak structure
– Stored in DB or SKOS models
• Information extraction
What is a relation
A tree comprises at least a trunk, roots and branches.
A tree [Plants] comprises [meronymy] at least a trunk, roots and branches.
tree has_parts trunk, roots, branches(tree, has parts, trunk) … in a gardening terminology
looks for relations between instances
09/01/2015 Semantic relations - Aussenac 3
tree Plantation year Species Branches
Tree1 1990 Oak > 20
Tree2 1995 Oak 15
whole
parts
Semantic relations,what do we mean?
Research field• Domain Ontology engineering
– Formal (logic, RDF, OWL …) and may lead to infer new knowledge
– The relation is part of a network– May be shared or not
• Semantic web– Independent triples– Publically available in data
repositories with W3C Standard format
– Connect triples with existing ones, with web ontologies
What is a relation
bot:Tree bot:has_part bot:Branch
09/01/2015 Semantic relations - Aussenac 4
Trunk
Has-part
Root
Plant
Fonguscereals
Has-part
Root
is_a
TreeHas-part
Branch
bot:myTree
bot:has-part
bot:MyTreeRoots
bot:Treebot:has-
partbot:Branch
rdf:Type
Example: tree in DBPedia
5
dbpedia-owl:tree
dbpedia-owl:Species
dbpedia-owl:Place
Semantic relations - Aussenac09/01/2015
dbpedia-owl:PhysicalEntity
rdfs:subClassOf
dbpedia-owl:Organism
Example: Plants in DbPedia
6
owl:SameAsyago:WordNet_Plant_
100017222dbpedia-owl:Plant
Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources
dbpedia-owl:Acer_Stone
bergae
dbpedia-owl:Alopecurus_ca
rolinianus
dbpedia-owl:Alsmithia_long
ipesdbpedia-owl:…
rdf:typerdf:typerdf:typerdf:type
Semantic relations - Aussenac09/01/2015
dbpedia-owl:PhysicalEntity
rdfs:subClassOf
dbpedia-owl:Organism
Example: Plants in DbPedia
7
owl:SameAsyago:WordNet_Plant_
100017222dbpedia-owl:Plant
Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources
dbpedia-owl:Acer_Stone
bergae
dbpedia-owl:Alopecurus_ca
rolinianus
dbpedia-owl:Alsmithia_long
ipesdbpedia-owl:…
rdf:typerdf:typerdf:typerdf:type
Semantic relations - Aussenac09/01/2015
dbpedia-owl:PhysicalEntity
rdfs:subClassOf
dbpedia-owl:Organism
Example: Plants in DbPedia
8
owl:SameAsyago:WordNet_Plant_
100017222dbpedia-owl:Plant
Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources
dbpedia-owl:Acer_Stone
bergae
dbpedia-owl:Alopecurus_ca
rolinianus
dbpedia-owl:Alsmithia_long
ipesdbpedia-owl:…
rdf:typerdf:typerdf:typerdf:type
Semantic relations - Aussenac09/01/2015
Semantic relations: why do we need them for?
Research field
• Linguistics: to understand how language produces meaning
• Terminology: to capture domain specific terms and meanings
• Information extraction: to collect structured data – the schema (classes and relations) is known
Relations contribute to• semantics and language
interpretation• formal semantics and
discourse semantics
• Structure terminologies and make browsing easier
• Connect terms• Give meaning to “concepts”
• Find related entities (values) to build DB and then mine these data (statistics)
09/01/2015 Semantic relations - Aussenac 9
Semantic relations: why do we need them for?
Research field
• Formal ontologies: to define axioms and inferences associated with relations
• Domain ontologies: human and machine “understanding”
• Linked data: interoperability, data connection from various sites or applications
Relations contribute to• Inferences and reasoning• Ex: sub-classes inherit of some class
properties• Ex : transitivity (cf. some parthood rel.),
symmetry • Ex: cardinality …
• Relations define concepts by differences and similarities
• Relations have labels and are human readable ; each one can be processed in a specific way
• Relations connect resources (data)• Their semantics is defined in ontologies
or produce behaviors in inference engines (rdf:subClassOf)
09/01/2015 Semantic relations - Aussenac 10
09/01/2015 Semantic relations - Aussenac 11
Erarht Rahm http://dbs.uni-leipzig.de/file/paris-Octob2014.pdf
Semantic web specificities- Relations connect web data called resources- Relations connect data with ontology classes: importance of hypernymy- Relations may map ontology classes
Finding semantic relations, what are the issues?
• Knowledge sources: – where can we find relations?
• Extraction techniques– How can we identify them?
• Representation – Which way do I represent this information?
• Validation– What makes a relation representation valild? Relevant?
09/01/2015 Semantic relations - Aussenac 12
Finding semantic relations, what are the issues?
• Knowledge sources – text, human experts, existing “semantic” resources (lexicon,
terminologies, ontologies, Linked Data vocabularies)– Domain specific vs general knowledge
• Extraction techniques– “obvious” language regularities, known relations and classes (or
entities) -> Patterns• Issues : domain dependence, domain coverage, variation and
flexibility, rigidity (need to be regularly updated) • Research issues: automatic building by machine learning
– “more implicit” language regularities, medium size corpora, open list of classes/entities -> supervised learning
– Very large corpora, unexpected relations -> unsupervised learning
13Semantic relations - Aussenac09/01/2015
Pattern based relation extraction, an issue: variation
• A tree comprises at least a trunk, roots and branches.
• With branches reaching the ground, the willow is an ornamental tree.
• The tree of the neighbor has been delimed.
• He climbs on the branches of the tree.
• This tree is wonderful. Its branches reach the ground.
• Contains: very systematic pattern; the parts may be difficult to spot; enumeration > various parts
• With: meronymy pattern only in some genres (such as catalogs, biology documents)
• Delimed : Term and pattern are in the same word; requires background knowledge: delimed -> has_partbranches (and branches are cut)
• Of : Very ambiguous pattern; polysemy reduced in [verb N1 of N2]
• Its : very ambiguous pattern; necessity to take into account two sentences
14Semantic relations - Aussenac09/01/2015
Pattern based relation extraction, learning patterns (1)
• Patterns are seen (and stored) as lexicalizations of ontology properties
• Patterns are “extracted” from syntactic dependencies between related entities (in triples)
• Assumes that patterns are structured around ONE lexical entry
• Lemon format for lexical ontologies
• Entries can be frames
09/01/2015 Semantic relations - Aussenac 15
ATOLL—A framework for the automatic induction of ontology lexica S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)
Pattern based relation extraction, learning patterns (1)
• Patterns are seen (and stored) as lexicalizations of ontology properties
• Patterns are “extracted” from syntactic dependencies between related entities (in triples)
• Assumes that patterns are structured around ONE lexical entry
• Lemon format for lexical ontologies
• Entries can be frames
09/01/2015 Semantic relations - Aussenac 16
ATOLL—A framework for the automatic induction of ontology lexica S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)
Pattern based relation extraction, learning patterns (2)
09/01/2015 Semantic relations - Aussenac 17
Michelle Obama is the wife of Barack Obama, the current president.Michelle Obama allegedly told her husband, Barack Obama, to ..Michelle Obama, the 44th first lady and wife of President Barack
Dbpedia:spouse
Find all lexicalizations of the entities: Michelle Obama, Mrs. Obama, Michelle Robinson …
Pattern based relation extraction, learning patterns (3)
09/01/2015 Semantic relations - Aussenac 18
• Pattern = shortest path btw the 2 entities in the dependency graph
[MichelleObama (subject), wife (root), of (preposition), BarackObama (object)]
• Lexical entry in the ontology
Relation extraction:learning relations from enumerative structures
09/01/2015 Semantic relations - Aussenac 19
IS_A
IS_A
Learning relations from an parallel enumerative structure = - classification task to identify the relation (IS_A, part_Of, other)- Term extraction to identify the primer and the items
Relation extraction:learning relations from enumerative structures
• Corpus– 745 enumerative structures from
Wikipedia pages– 3 relation types: taxonomic,
ontological_non_taxonomic, non_ontological
• Classification task– Feature definition– Automatic evaluation of features– 3 algorithms are compared : SVM,
MaxEntropy and baseline (majority)– Training of the 2 algorithms
• Results– 82% f-measure for SVM– Best result with a 2 step process
(ontological yes/no -> feature and then taxonomic yes/no)
09/01/2015 Semantic relations - Aussenac 20
From intepretation to representation
• A tree comprises at least a trunk, roots and branches.
• With branches reaching the ground, the willow is an ornamental tree.
• The tree of the neighbor has been delimed.
Tree
Trunk
Branches
Has-part Roots
Ornamental Tree
Willow Tree Has-part Branches
Has-part Branches
Has-part Branches
Semantic relations - Aussenac 2109/01/2015
From intepretation to representation
• A tree comprises at least a trunk, roots and branches.
• With branches reaching the ground, the willow is an ornamental tree.
• The tree of the neighbor has been delimed.
• He’s climbing on the branches ofthe tree.
• This tree is wonderful. Itsbranches reach the ground.
Tree
Trunk
Branches
Has-part Roots
Has-part Branches
Has-part Branches
Semantic relations - Aussenac 2209/01/2015
NeighborTree
Instance _of
Finding semantic relations:what can large corpora and machine
learning do for you ?
• Learning patterns– Poor results
– Requires very large data sets
– Reasonable for general knowledge
• Learning relations– Much more relevant
– Large variety of approaches in the state of the art
– Key step = select feature
– more features, better the results
09/01/2015 Semantic relations - Aussenac 23
Finding semantic relations, what can terminology do for the semantic web?
• Terminology can play a key role – change its practices – contribute to enrich the LOD with QUALITY data
• Contribute to define /evaluate tools– evaluate machine learning tools – Test and experiment them on various textual genre, in
various domains– Propose more features
• Propose terminology tools – … to improve these approaches with the studies carried
out during the last 20 years to build terminologies and terminological knowledge bases.
– term extractors, relation extractors, patterns
24Semantic relations - Aussenac09/01/2015
Finding semantic relations, what can terminology do for the semantic web?
• Contribute to enrich the LOD with terminologies
– Terminologies as structuring networked vocabularies
– Publish your ontologies as public web resources
– Conform to linked data requirements
– Use standards format like SKOS (Simple KnowledgeOrganization System) the W3C’s OWL ontology for creating
thesauruses, taxonomies, and controlled vocabularies
• Contribute to the multilingual LOD
– Add lexicon to ontologies
– Multilingual efforts: LEMON, MONNET projects DID NOT include terminologists
09/01/2015 Semantic relations - Aussenac 25
The LOD and the linguistic LOD
need you!
http://linguistics.okfn.org/resources/llod/
09/01/2015 Semantic relations - Aussenac 26
Some landmark KOS LD implementations
• Many Libraries
– Swedish National Library’s Libris catalogue and thesaurus http://libris.kb.se/
– Library of Congress’ vocabularies, including LCSH http://id.loc.gov/
– DNB’s Gemeinsame Normdatei (incl. SWD subject headings) http://dnb.info/gnd/• Documentation at https://wiki.dnb.de/display/LDS
– BnF’s RAMEAU subject headings http://stitch.cs.vu.nl/
– OCLC’s DDC classification http://dewey.info/ and VIAF http://viaf.org/
– STW economy thesaurus http://zbw.eu/stw
– National Library of Hungary’s catalogue and thesauri
– http://oszkdk.oszk.hu/resource/DRJ/404(example)
• Other fields
– Wikipedia categories through Dbpedia http://dbpedia.org/
– New York Times subject headings http://data.nytimes.com/
– IVOA astronomy vocabularies http://www.ivoa.net/Documents/latest/Vocabularies.html
– GEMET environmental thesaurus http://eionet.europa.eu/gemet
– Agrovoc http://aims.fao.org/
– Linked Life Data http://linkedlifedata.com/
– Taxonconcept http://www.taxonconcept.org/
– UK Public sector vocabularies http://standards.esd.org.uk/ (e.g., http://id.esd.org.uk/lifeEvent/7)
09/01/2015 Semantic relations - Aussenac 27