An ecosystem to support FAIR data

62
AN ECOSYSTEM TO SUPPORT FAIR DATA Luiz Olavo Bonino - [email protected] April 3rd 2017

Transcript of An ecosystem to support FAIR data

AN ECOSYSTEM TO SUPPORT FAIR DATA

Luiz Olavo Bonino - [email protected]

April 3rd 2017

FAIR DATA PRINCIPLES

Findable:F1. (meta)data are assigned a globally unique and

persistent identifier;

F2. data are described with rich metadata;

F3. metadata clearly and explicitly include the

identifier of the data it describes;

F4. (meta)data are registered or indexed in a

searchable resource;

Accessible:A1. (meta)data are retrievable by their identifier

using a standardized communications protocol;

A1.1 the protocol is open, free, and universally

implementable;

A1.2. the protocol allows for an authentication and

authorization procedure, where necessary;

A2. metadata are accessible, even when the data

are no longer available;

Interoperable:I1. (meta)data use a formal, accessible,

shared, and broadly applicable language for

knowledge representation.

I2. (meta)data use vocabularies that follow

FAIR principles;

I3. (meta)data include qualified references to

other (meta)data;

Reusable:R1. meta(data) are richly described with a

plurality of accurate and relevant attributes;

R1.1. (meta)data are released with a clear and

accessible data usage license;

R1.2. (meta)data are associated with detailed

provenance;

R1.3. (meta)data meet domain-relevant

community standards;

FAIR DATA PRINCIPLES - METADATA

Findable:F1. metadata are assigned a globally unique and

persistent identifier;

F2. data are described with rich metadata;

F3. metadata clearly and explicitly include the

identifier of the data it describes;

F4. (meta)data are registered or indexed in a

searchable resource;

Accessible:A1. metadata are retrievable by their identifier

using a standardized communications protocol;

A1.1 the protocol is open, free, and universally

implementable;

A1.2. the protocol allows for an authentication

and authorization procedure, where necessary;

A2. metadata are accessible, even when the

data are no longer available;

Interoperable:I1. metadata use a formal, accessible, shared,

and broadly applicable language for

knowledge representation;

I2. metadata use vocabularies that follow FAIR

principles;

I3. metadata include qualified references to

other (meta)data;

Reusable:R1. metadata are richly described with a

plurality of accurate and relevant attributes;

R1.1. metadata are released with a clear and

accessible data usage license;

R1.2. metadata are associated with detailed

provenance;

R1.3. metadata meet domain-relevant

community standards;

FAIR DATA PRINCIPLES - DATA

Findable:F1. data are assigned a globally unique and

persistent identifier;

F2. data are described with rich metadata;

F3. metadata clearly and explicitly include the

identifier of the data it describes;

F4. (meta)data are registered or indexed in a

searchable resource;

Accessible:A1. data are retrievable by their identifier using a

standardized communications protocol;

A1.1 the protocol is open, free, and universally

implementable;

A1.2. the protocol allows for an authentication

and authorization procedure, where necessary;

A2. metadata are accessible, even when the

data are no longer available;

Interoperable:I1. data use a formal, accessible, shared, and

broadly applicable language for knowledge

representation;

I2. data use vocabularies that follow FAIR

principles;

I3. data include qualified references to other

(meta)data;

Reusable:R1. data are richly described with a plurality of

accurate and relevant attributes;

R1.1. data are released with a clear and

accessible data usage license;

R1.2. data are associated with detailed

provenance;

R1.3. data meet domain-relevant community

standards;

FAIR DATA PRINCIPLES - SUPPORTING INFRASTRUCTURE

Findable:F1. (meta)data are assigned a globally unique and

persistent identifier;

F2. data are described with rich metadata;

F3. metadata clearly and explicitly include the

identifier of the data it describes;

F4. (meta)data are registered or indexed in a

searchable resource;

Accessible:A1. (meta)data are retrievable by their identifier

using a standardized communications protocol;

A1.1 the protocol is open, free, and universally

implementable;

A1.2. the protocol allows for an authentication and

authorization procedure, where necessary;

A2. metadata are accessible, even when the data

are no longer available;

Interoperable:I1. (meta)data use a formal, accessible,

shared, and broadly applicable language for

knowledge representation.

I2. (meta)data use vocabularies that follow

FAIR principles;

I3. (meta)data include qualified references to

other (meta)data;

Reusable:R1. meta(data) are richly described with a

plurality of accurate and relevant attributes;

R1.1. (meta)data are released with a clear and

accessible data usage license;

R1.2. (meta)data are associated with detailed

provenance;

R1.3. (meta)data meet domain-relevant

community standards;

FAIR transformation FAIR transformation

Analysis transformation Analysis transformation

FAIR DATA ECOSYSTEM (DTL)

Create Publish AnnotateFind

011001

1

110010

1

100110

0

BYOD FAIR Hackathon

FAIR DATA ECOSYSTEM (DTL)

Create Publish AnnotateFind

011001

1

110010

1

100110

0

DataFAIRportDTL

BRING YOUR OWN DATA - BYOD

■ Goals:

■ Learn how to make data linkable “hands-on” with experts

■ Create a “telling story” to demonstrate its use

■ Make FAIR Data at the source

■ Composition:

■ Data owners – specialists on given datasets

■ Data interoperability experts

■ Domain experts

Source: Marcos Roos

Domain Expert

Data Owner FAIR Data Expert

BYOD

BYOD

BYOD Planning

Preparation Execution Follow Up

BYOD Planning

Preparation

Identify Plan

Datasets

Attendees' profile

Output data access

Tentative dates

Tentative venue

Costs

Funds

Coordination

Set date

Invite attendees

Set venue

Catering

Lodging

Financial planning

Publicity

Working document

Preparatory calls

Data hosting

Software hosting

Documentation hosting

BYOD Planning

Execution

Day One

Introduction

SW, LD, Ontology intro

Use case intro

Workgroups division

Working sessions

WWW/TTTALA

Day Two

Progress report

Working sessions

Groups reports

WWW/TTTALA

Day Three

Data integration

Answer driving question

Explore data

Demo improvement

Final report

WWW/TTTALA

BYOD Planning

Follow-Up

D+15

Report difficulties

Clarifications

Next steps

D+45

Report difficulties

Clarifications

Next steps

Implementation

Expand FAIRification

Implement solution

Scale-up solution

Deploy

Based on OpenRefine

FAIRIFICATION PROCESS

■ Retrieve original data

■ Dataset identification and analysis

■ Definition of the semantic model

■ Data transformation

■ License assignment

■ Metadata definition

■ FAIR Data resource (data, metadata, license)

deployment

FAIRIFICATION

FAIR Data Resource

submit generate

Generic

semantic

model

FAIRIFIER

■ Transform non-FAIR datasets into FAIR Data Resources

(dataset in FAIR format, license and metadata)

■ Data munging

■ Semantic modeling

■ License definition

■ Metadata definition and extraction

■ Data publication

FAIRIFIER

FAIRIFICATION PROCESS

■ Retrieve original data

■ Dataset identification and analysis

■ Definition of the semantic model

■ Data transformation

■ License assignment

■ Metadata definition

■ FAIR Data resource (data, metadata, license)

deployment

FAIRIFICATION

FAIR Data Resource

submit generate

Semantic

model

FAIRIFICATION - NEW DATASET TYPE

FAIR Data Resource

submit generate

FAIR Data

Model Registrysto

re

Semantic

Model &

Non-FAIR

- FAIR

mapping

FAIRIFICATION - RECURRING DATASET TYPE

FAIR Data Resource

submit generate

FAIR Data

Model Registry

qu

ery

Semantic

Model &

Non-FAIR

- FAIR

mappingretr

iev

e

FAIR DATA POINT

A particular class of FAIR Data System that provides access to datasets in a FAIR manner. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a non-FAIR dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.

FAIR Data Point metadata

Title

Responsible institution(s)

Contact

FAIR API version

License

FAIR Data Point metadata

Catalog metadata

Title

Theme taxonomy

Issued date

DCAT

FAIR Data Point metadata

Catalog 1 metadata

Dataset metadataTitle

Publisher

License

Theme(s)

Version

DCAT/HCLS

FAIR Data Point metadata

Catalog 1 metadata

Dataset 1 metadata

Distribution metadataTitle

Media type

Download/access URL

License

DCAT

FAIR Data Point metadata

Catalog metadata

Dataset metadata

Distribution metadata

Data record metadataType

Domain

Range

RML

FAIR Data Point metadata

Catalog 2

metadataCatalog 1 metadata

Dataset 1 metadata

Distribution 1.a

metadata

Data record

metadata

Distribution 1.b

metadata

Dataset 2 metadata

Distribution 2.a

metadata

Data record

metadata

Distribution 2.b

metadata

Dataset 3 metadata

Distribution 3.a

metadata

Data record

metadata

METADATA LAYERS

Data Repository (FDP)

(Dataset) Catalog(s)

Dataset

Distribution

Data Record

FAIR DATA POINT - ARCHITECTURE

FAIR DATA POINT - GUI - FOR TECHIES

FAIR DATA POINT - GUI - FOR “NORMAL" PEOPLE

}}

Repository

metadata

Catalog

metadata

summary

FAIR DATA POINT - GUI

}}

Repository

metadata

Catalog

metadata

summary

}Dataset/

distribution

metadata

summary

} Catalog

metadata

FAIR DATA POINT - GUI - DATASET

FAIR DATA POINT

EXISTING DATA REPOSITORIES

EXTENDING EXISTING DATA REPOSITORIES

+

FAIR HACKATHON - GOALS

■ Align solutions with FAIR Data Point specifications.

■ Metadata content

■ API

■ Data

FAIR HACKATHON OUTCOME

■ FAIR data model for solutions content;

■ Architecture of the required adjustments/extensions;

■ Technical specification of the adjustments/extensions;

■ Proof-of-concept of the adjusted solution;

FDP-COMPLIANT (BETA) SOLUTIONS

RDRF

0110011

1100101

1001100

011001111001011001100

metadata

indexretrieves

metadata

search

interfaces

(GUI and API)

■ Allow third-party annotation on existing knowledge

bases

■ Capture the provenance of the annotator and the

original statement

Open RDF

Knowledge AnnotatorORKA

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

DEMO: HTTP://DEV-VM.FAIR-DTLS.SURF-HOSTED.NL:8080/#/

ANNOTATIONS GO TO NANOPUB STORE

■ A particular class of FAIR Data System to provide

support for data interoperability;

■ Supports publication and access to FAIR data.

■ Fosters an ecosystems of applications and services;

■ Federated architecture: different FAIRports (and other

FAIR Data Systems) are interconnectable;

■ Supports citations of datasets and data items;

■ Provides metrics for data usage and citation;

F A IR

QUESTIONS?

Luiz Bonino

[email protected]

www.dtls.nl

METADATA LAYERS

Data Repository (FDP)

(Dataset) Catalog(s)

Dataset

Distribution

Data Record

DCAT/HCLS

RML

METADATA LAYERS’ EXTENSIONS - VOCABULARIES

Data Repository (FDP)

(Dataset) Catalog(s)

Dataset

Distribution

Data Record

METADATA LAYERS’ EXTENSIONS - VOCABULARIES

DCATdcat:publisher

biosch:organization

"@context" : "http://schema.org" ,"@type" : "NGO","address" : {

"@type" : "PostalAddress" ,"addressLocality" : "Utrecht, The Netherlands""postalCode" : “3511 GC" ,"streetAddress" : “Catharijnesingel 54"

},"email" : “info(at)dtls.nl" ,"@type" : “Organization”,“@type”: “not-for-profit”,"name" : “Dutch Techncentre for Life Sciences" ,"telephone" : "( 31) 85 30 30 711"

METADATA LAYERS’ EXTENSIONS - VOCABULARIES

dbpedia: biobank

edam: biobank

METADATA LAYERS’ EXTENSIONS - EXTENDED MODEL

Data Repository (FDP)

(Dataset) Catalog(s)

Dataset

Distribution

Data Record

DatA Tag Suite

(DATS)

PROV

DatA Tag Suite

(DATS)

Dataset

Publication

citations primaryPublications