Daniele Bailo
METADATA& BROKERINGa modern approach EPISODE#2
Previously on…Metadata & Brokering#1
Main concepts- Digital Data- Metadata- Brokering system- The triad <PID, MD, DO>- Database- APIs (web services)
Side concepts- Ontologies / Semantics- PID- Digital Object- Standard- Interoperability- Open Access
Dataset
Dataset
DatasetData
setDataset
DatasetData
setDataset
Dataset
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
Features1. APIs2. <PID, metadata,
DO>3. Contextualization
metadata4. Support ontologies
Data from Irpinia
<PID, metadata, DO>
request response
THE PERFECT SYSTEM#6 Metadata driven canonical Brokeringwith contextualization & PID
BROKERING SYSTEM
NEW & OLD CHARACTERS
Metadata
Purposes1. Discovery (humans
& machines)2. Contextualization:
which is the context of the data
3. Use it for processing or other advanced tasks
Usually attached to D.O.
Interoperability
What & WhyEnables 2 system to1. Exchange
information2. Understand
information
Usually achieved through:- Agreed language - Software
“translators” interfaces thin layers
...ma che parli Arabo???
Ontologies
Why an ontology?It is the way machines manage “meaning”
How does it work?1. Connects concepts2. Needs vocabulary
Issues• Many ontologies
exist• Vocabulary Mapping
Michelini
CNT
Is Director of
INGV
Is section of Gresta
Is president of
Sailing
Has hobby
Trieste
Is Born
Italy
Located in
Boat
use
sea
use
Metadata Catalogue#1
PurposesStore metadata:e.g. 1. producer 2. date of creation 3. data format format
Misleading Example (why?)
Metadata Catalogue#2How to implement it?
Single table (bad habit)One table with all data
Multi table (good habit)- Data is stored in
multiple tables (one for concept)
- Tables are linked- Can contextualize
data
Metadata catalogue = relational database *
(*)= also noSQL... We’ll see it later..
Single table
Multi table
Metadata Catalogue#2How to implement it?
Single table (bad habit)One table with all data
Multi table (good habit)- Data is stored in
unique tables (one for concept)
- Tables are linked- Can contextualize
data
Metadata catalogue = relational database *
(*)= also noSQL... We’ll see it later..
Single table
Multi table and contextualization
Catalogue Interface
Human interface (GUI)Website or portal
Machine interface- API or Web service - which execute
scripts or queries- Returns metadata in
a given standard
What is it?It does something for the user(deliver value to customer)*
A “thin layer”We usually don’t know what’s under the hood
Examples- FDSN stations- FDSN dataselect
(web) serviceFDSN stations
FDSN Dataselect
Database(MD catalogue)
Waveformrepository
CKAN
CKAN GUI
METADATAcatalogue
CKAN APIs
EIDA stations ISIDE stations
Metadatareplication
What is it?- Metadata Catalogue- With interfaces
(GUI+API)- No direct
CKAN <-> sources connection
Examples- Works FDSN stations- Doesn’t work with
FDSN dataselect
Plugins
Plugins
Plugins Plugins
Plugins
Plugins
Plugins Plugins
Brokering System(e.g. VERCE framework)
BROKER GUI
METADATAcatalogue
BROKER APIs
EIDA stations
ISIDE stations
Metadatareplication
What is it?- Metadata Catalogue- With interfaces
(GUI+API)- System manager- Other modules- BROKER <-> sources
interactive connection
Examples- EIDA stations- EIDA dataselect- Processing Job at
CINECA
System manager
Interactiveaccess to service
EIDA dataselect
Processing facility
? ? ?
Comments&
Questions
Why the example was misleading?
A global viewData initiatives
RDA-”regulate” data sharing/use
EUDAT- Common data infrastructure
EGI- Organize National Grid Infrastructures (CINECA)
EPOS- ESFRI integrating Solid Earth data
RDADo for data what has been done for the internet (TCP/IP)
RDA concepts
Data FabricWhat?Identifies mechanisms, standard, components and interfaces making data science efficient and cost effective
Data Management Plan• Data management • Data analysis • Data preservation • Data publication • Data sharing
[UK data Archive http://www.data-archive.ac.uk/]
RDA concepts
Data Fabric
[RDA WG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf]
How to store?How to register?
How to discover?How to cite?
How to document processing?
How to integrate?
How to collect new DP?
How to access?
How to describe data?How to discover data?Metadata system
WE ALREADY KNOW EVERYTHING ABOUT IT
METADATAcatalogue
How to have standards?How to preserve data?Registry systemWhat?
An agreed/legacy catalog of:- data formats
(schemas)- metadata formats- Vocabularies &
semantic categories- Data types- Trusted repositories- ….
Registry
Ahaa.. Ma ‘npratica è ‘n
database..
…anfatti…
How to register/cite data or publications?
PID system
Purpose - DO / publication can
be uniquely referenced
- Assign a PID at data creation times
Issues- Need for a simple
mechanism to implement it
- Now EUDAT can help- Peter & Massimo
comments…
How to access data?
AAI system (federeated & distributed)Purpose - Authenticate users- Authorize users
Issues- Delegation- Many system,
sometimes non interoperable
How to store data?
Data repository (trusted)What? - Store data- Couple with PIDs- Ensure preservation
(not curation)- Can be trusted (DSA)
Opportunity- INGV DSA
repository…
How to document data processing?
Workflow enginesPurpose - Tracks data
transformation- Allows versioning- Allows reproducibility
Comments- Interoperability
among various workflow engines
- VERCE did it
Brokering System(e.g. VERCE framework)
BROKER GUI
METADATAcatalogue
BROKER APIsFull version include- Metadata Catalogue- interfaces (GUI+API)- System manager- AAI system- Workflow engine
External actors- PID System- Trusted repositories- Registries- Processing facilities
System manager
Dataset
Dataset Data
setDataset Data
setDataset
API API
AAI system
Workflow Engine
Trusted repository
Trusted repository
RegistryPID
system
HPCcenter
Q&A
Top Related