Post on 13-Jan-2015
description
1
1Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology
Y. Tzitzikas 1,2 , C. Alloca 1 , C. Bekiari 1 , Y. Marketakis 1 , P. Fafalios 1,2 , M. Doerr 1 , N. Minadakis 1 , T.Patkos 1 , L. Candela 3
1 Institute of Computer Science, FORTH-ICS2 Computer Science Department, University of Crete, GREECE
3 Consiglio Nazionale delle Ricerche, CNR-ISTI, Pisa, Italy
7th Metadata and Semantics Research Conference (MTSR), Thessaloniki, Nov 19-22, 2013
2
Outline
• Context, Problem, Objectives
• Main Approaches for Integration
• The Followed Approach
– The Ontology MarineTLO• Objectives, Benefits, Architecture
– The MarineTLO-based Warehouse
• Exploitation Scenarios
• Concluding Remarks
3Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Context: iMarine
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
4
Id: It is an FP7 Research Infrastructure Project (2011-2014)
Final goal: launch an initiative aimed at establishing and operating an e-infrastructure supporting the principles of the Ecosystem Approach to fisheries management and conservation of marine living resources.
Partners:
3
Problem and objectives
The Problem
• There are several sources of the marine domain, but each of them stores complementary information structured according to its needs.
Our objective
• Harmonize and integrate (link, connect) information of the marine domain – Specific motivating scenario and use cases will be given at the end
5Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Marine Information: in several sources
6
WoRMS: World Register of Marine Species
Registers more than 200K species
ECOSCOPE- A Knowledge Base About Marine Ecosystems (IRD, France)
FLOD (Fisheries Linked Data) of
Food and Agriculture Organization (FAO) of the United Nations
FishBase: Probably the largest and most extensively
accessed online database
of fish species.
DBpediaYannis Tzitzikas et al., MTSR 2013,
Thessaloniki
4
Marine Information: in several sources
7
Taxonomic information
Ecosystem information (e.g. which fish eats which fish)
Commercial codes
General information, occurrence data, including information from other sources
General information, figuresYannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Storing
complementaryinformation
Marine Information: in several sources
8
Web services (SOAP/WSDL)
RDF + OWL files
SPARQL Endpoint
Relational Database
SPARQL EndpointYannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Using and accessed through
different technologies
5
Main approaches for Integration
In general there are two main approaches for integration
Warehouse approach (materialized integration)
• Design Phase: The underlying sources (and their parts) have to be selected
• Creation Phase: Process for getting and creating the warehouse
• Maintenance Phase: Ability to create the warehouse from scratch, and/or ability to update parts of it
• Mappings are exploited to extract information from data sources, to transform it to the target model and then to store it at the central repository
Mediator approach (virtual integration)
• The mediator receives a query formulated in terms of the unified model/schema. The mappings are used to enable query translation. The derived sub-queries are sent to the wrappers of the individual sources, which transform them into queries over the underlying sources. The results of these sub-queries are sent back to the mediator where they are assembled to form the final answer
9Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Main approaches for integration (cont.)
Warehouse• Benefit: Flexibility in transformation
logic (including ability to curate and fix problems)
• Benefit: Decoupling of the release management of the integrated resourcefrom the management cycles of the underlying sources
• Benefit: Decoupling of access load from the underlying sources.
• Benefit: Faster responses (in query answering but also in other tasks, e.g. if one wants to use it for applying an entity matching technique).
• Shortcomings You have to pay the cost for hosting the warehouse. You have to refresh periodically the warehouse
10
Mediator• Benefit: One advantage (but in some
cases disadvantage) of virtual integration is the real-time reflection of source updates in integrated access
• Comment: The higher complexity of the system (and the quality of service demands on the sources) is only justified if immediate access to updates is indeed required.
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
6
Main approaches for integration (cont.)
In both cases we need a unified model/schema
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
11
The ontology MarineTLO(Marine Top Level Ontology)
7
MarineTLO: Objectives
• MarineTLO aims at being a global core model that – provides a common, agreed-upon and understanding of the concepts
and relationships holding in the marine domain to enable knowledge sharing, information exchanging and integration between heterogeneous sources
– covers with suitable abstractions the marine domain to enable the most fundamental queries,
– can be extended to any level of detail on demand, and
– allows data originating from distinct sources to be adequately mappedand integrated
• MarineTLO is not supposed to be the single ontology covering the entirety of what exists
13Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
MarineTLO: Benefits from a Top-Level Ontology
• The adoption of a global core model has various benefits:
– reduced effort for improving and evolving• the focus is given on one model, rather than many (the results are
beneficial for the entire community
– reduced effort for constructing mappings• this approach avoids the inevitable combinatorial explosion and
complexities that results from pair-wise mappings between individual metadata formats and/or ontologies
14Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
8
MarineTLO: Key Design Principles
• Formulation– It is an object-oriented semantic model, expressed to a form
comprehensible to both documentation experts and information scientists while readily can be converted to machine-readable formats such as RDF Schema, OWL, etc
• Metaclasses– certain types of inference about classes is supported in an analogous
way as classes support certain types of inference about instances
• Monotonicity– It aims to be monotonic in the sense of Domain Theory: the existing
constructs and the deductions made from them should remain valid and well-formed, even as new constructs are added to the MarinTLO
15Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
MarineTLO: Query capabilities
It allows formulating complex queries, e.g.:
1.Given the scientific name of a species, find its predators with
the related taxon-rank classification and with the different codes that the organizations use to refer to them.
2. Given the scientific name of a species, find the ecosystems, waterareas and countries that this species is native to, and the common names that are used for this species in each of the countries
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
16
9
The notion of competence queries as driver
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
17
#Query For a scientific name of a species (e.g. Thunnus Albacares or Poromitra Crassiceps),find/give me
Q1 the biological environments (e.g. ecosystems) in which the species has been introduced and moregeneral descriptive information of it (such as the country)
Q2 its common names and their complementary info (e.g. languages and countries where they areused)
Q3 the water areas and their FAO codes in which the species is native
Q4 the countries in which the species lives
Q5 the water areas and the FAO portioning code associated with a country
Q6 the presentation w.r.t Country, Ecosystem, Water Area and Exclusive Economical Zone (of thewater area)
Q7 the projection w.r.t. Ecosystem and Competitor, providing for each competitor the identificationinformation (e.g. several codes provided by different organizations)
Q8 a map w.r.t. Country and Predator, providing for each predator both the identification informationand the biological classification
Q9 who discovered it, in which year, the biological classification, the identification information, thecommon names - providing for each common name the language, the countries where it is usedin.
MarineTLO as Product
• The “full” version of MarineTLO (Version3.0.0)– aims at covering any part of the marine domain
– contains 70 classes and 41 properties
• The “operational” version, for the needs of iMarine(Version 3.0.0)– used for building MarineTLO Warehouse (Version 3.0.0)
– contains 92 classes and 41 properties
– applied for integrating data mainly from FLOD, ECOSCOPE, part of WoRMS and FISHBASE sources
• URL: www.ics.forth.gr/isl/MarineTLO
18Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
10
FORTH, i-Marine, Ostend, January 2013 19
TLO Entity
Temporal Phenomenon
Persistent Item Actor
Physical Man Made Thing
Man Made Thing
Conceptual Object
Physical Thing
Event
S-Class Level (Version 3.0.0)
Exclusive Economic Zone
Codification System
Identifier EEZCode
FAOGearTypeIdentifier
FAOVesselTypeIdentifier
Man Made Object
Vessel
Water Area
Area
Sub Area
Division
Sub Division
Ecosystem
Human ActivityAttribute Assignment
Country Code Assignment
Ecosystem Code Assignment
Scientific Name Assignment
Common Name Assignment
Water Area Code Assignment
Country
Class Level (excerpt)
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
19
FORTH, i-Marine, Ostend, January 2013 20
TLO Entity Type
Temporal Phenomenon Type
Persistent Item Type
Actor TypeDigital Object type
Conceptual Object Type
Identifier Type
Physical Thing Type
Event Type
Equipment Type
Gear Type
Vessel Type
Ecosystem Type
Human Activity Type
Marine Ecosystem Type
Attribute Assignment Type
Biotic ElementType
Fish Base MarineAnimal TypeMarine Animal
Type
DBpedia MarineAnimal Type
WoRMS MarineAnimal Type
FLOD Marine
Animal Type
ECOSCOPE MarineAnimal Type
Meta Class Level (Version 3.0.0)Meta Class Level (excerpt)
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
20
11
Example 1: ThunnusAlbacares
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
21
MarineSpecies
relatedIdentifierAssigment relatedAuthorshipAssigment
Thunnus_albacares
blank_node_Thunnus_albacares
assignedName
“Thunnus Albacares”
Actor
blank_node_Bonnaterre
name
reference
Attribute AssignmentPersistentItem
Scientific Name Assignment
Event
assignedDate assignedIdentifier
assignedDate
“1788”
relatedIdentifierAssigment relatedAuthorshipAssigment
“Bonnaterre”
name
assignedName
Example 2: Scientific name assignment
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
22
12
Ecosystem
isAssocitedWith isAssociatedWith
Antarctic Elephant IAtlantic Antarctic
Country Water Area
Marine Species
usualluIsBioticElementOf
nativeIntroducedEndemic
usualluIsBioticElementOf
nativeIntroducedEndemic
usualluIsBioticElementOf
nativeIntroducedEndemic
isAssocitedWith isAssocitedWith
Poromitra crassiceps
Example 3: Species Establishment
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
23
Exploiting MarineTLO
13
Ways to use/exploit MarineTLO
1. For constructing semantic warehouses which:– can answer queries which cannot be answered by the underlying
sources individually
– can aid the construction of mappings between instances
– can be exploited for various other task
• We shall see how they are exploited in the context of semantic post-processing of search results
2. Various other uses
– For publishing Linked Data
– For mashing up facts
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
25
Publishing Linked Data,
Mashups
For semantic-post processing
of search results
Constructing Warehouses offering
Complex query answering
26Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
14
The MarineTLO-based Warehouse
MarineTLO
Warehouse
Warehouse construction and evolution process
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
28
Define requirements in terms of competence queries
Fetch the data from the selected sources (SPARQL endpoints, services, etc)
Queries
Transform and Ingest to the Warehouse
Inspect the connectivity of the Warehouse
Formulate rules creating sameAs relationships
Apply the rules to the warehouse
Rules for Instance Matching
sameAs triples
Ingest the sameAs relationships to the warehouse
Test and evaluate the Warehouse (using competence queries)
produces
creates
Warehouse
produces
Triples
uses
uses
uses
MaTWare
MaTWare
MaTWare
15
The MarineTLO-based warehouse’s contents: used sources
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
29
RDF Triple StoreMarineTLO
FLOD ECOSCOPEWoRMS (part of)
FLOD-to-TLO mapping
ReplicateReplicate
ECOSCOPE-to-TLOmapping
WoRMS-to-TLOmapping
Replicate
DBpedia-to-TLOmapping
FishBase-to-TLOmapping
DBpedia(part of)
FishBase(part of)
Replicate Replicate
The MarineTLO-based warehouse’s contents: in numbers
iMarine 2nd Review, September 2013,Brussels
SourceSpecies Number
DBpedia 14,291
FLOD 10,849
WoRMS 1124
Ecoscope 277
FishBase 31,277
Common Species (size of intersections)
FLOD WoRMS Ecoscope Fishbase
DBpedia 3,046 731 56 9833
FLOD 768 73 6141
WoRMS 53 1288
Ecoscope 53
• Now contains information about 37,000 distinct marine
species (including Fishbase). Number of triples: 2,970,058
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
30
16
The MarineTLO-based warehouse’s contents: concepts
iMarine 2nd Review, September 2013,Brussels
Concepts Ecoscope FLOD WoRMS DBpedia Fishbase
Species
Scientific Names
Authorships
Common Names
Predators
Ecosystems
Countries
Water Areas
Vessels
Gears
EEZYannis Tzitzikas et al., MTSR 2013,
Thessaloniki31
Exploiting the MarineTLO-based Warehouse
forSemantic Post-Processing of Search Results
17
For Semantic Post-Processing: The process
queryterms (top-L) results
(+ metadata)
Entity Mining
Semantic Analysis
Visualization/Interaction(faceted search, entity
exploration, annotation, top-k graphs, etc.)
entities / contents
semanticdata
webbrowsing
contents
• Grouping,• Ranking • Retrieving more
properties33
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
MarineTLO
Warehouse
XSearch-Portlet Screenshot
Search Results
Result of Entity
Mining
Result of textual
clustering
34Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
The Warehouse is used
The Warehouse is used
18
From FLOD
From DBpedia
From Ecoscope
From WoRMS
Example of an EntityCardof Xsearch (if the entity’s type is Species)
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
The Warehouse is used
XSearch as a bookmarklet
Annotating entities over the original page
explorationEntity
exploration
36Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
The Warehouse is used
19
Concluding Remarks
Concluding Remarks
• To tackle the need for having integrated sets of facts about marine species, and thus to assist research about species and biodiversity, we have described a top level ontology for that domain.
– It provides a unified and coherent core model for schema mapping which enables formulating and answering queries which cannot be answered by any individual source.
• We detailed the process of constructing MarineTLO-based warehouses. The current warehouse contains information about more than 37K marine species
• We have identified and described particular use cases and applications that exploit this ontology and it warehouse.
38Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
20
Future Work and Research
• Next steps
– Finalize and make accessible the next release of the warehouse (in 2013)
• Current and Future Research
– Focus on quality/connectivity issues
Yannis Tzitzikas et al., MTSR 2013, Thessaloniki
39
Links
• MarineTLO
• http://www.ics.forth.gr/isl/MarineTLO/
• TripleStores– MarineTLO-Warehouse: http://virtuoso.i-marine.d4science.org:8890/sparql
– also browsable through http://virtuoso.i-marine.d4science.org:8890/fct
• Systems– X-Search and gCube Search
• Portlet: https://i-marine.d4science.org/ (in various VREs, e.g. FCPPS , iSearch)
• Web Applications:
– http://62.217.127.118/x-search/ (over Bing and MarineTLO-Warehouse)
– http://62.217.127.118/x-search-fao/ (over ECOSCOPE and MarineTLO-Warehouse)
40Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
21
Thank you for your attention
41Yannis Tzitzikas et al., MTSR 2013,
Thessaloniki
Visit and send us feedback:www.ics.forth.gr/isl/MarineTLO