A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL ... · EUDAT B2FIND (only Metadata) Create deposit...

23
A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL IMPLEMENTATION OF DATA MANAGEMENT

Transcript of A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL ... · EUDAT B2FIND (only Metadata) Create deposit...

A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL IMPLEMENTATION OF DATA MANAGEMENT

SURFSARA’S TECHNICAL IMPLEMENTATION

Data – where is the problem?

3

?

Supporting the Data Life Cycle

4

GIVING ACCESS TO

DATA

CREATINGDATA

PRESERVINGDATA

RE-USING DATA

PROCESSINGDATA

ANALYSINGDATA

Data ingest service

Archive

The current challenge for users

Offer:

Different storage media

Different protocols to steer data transfers

Different clients and user interfaces

Different services

Accommodating various needs and use cases

Requires users to:

Proper book-keeping

Knowledge about storage infrastructure

When to use what?

5

WebDAV gridFTP ssh/scp

Goal:

One entrance point to data

Good interface to compute services

Configurable data policies per use case or community

Data Management PlatformiRODS enabling Data Orchestration

6

iRODS hosting

• Storage scale-out

• iRODS environment

Integration

• Persistent Identifiers

• Repositories

• Compute Workflows

• Comanage

Orchestration

• Metadata handling

• ConfigurableData policies

Other services: Data management Training, Consultancy

Data Management Platform and related Services

7

DATA MANAGEMENT PLATFORM

Off-Site Storage

Object Store

?

User Interfaces

On-SiteStorage

CLIAPIs

PluginsApps

Clients

- Metadata handling- Data repositories- Cloud storage- Notebooks

CLIAPIs

iRODS hosting @ SURFsara

8

DATA MANAGEMENT PLATFORM

User interacts with platform• Put data• Get data• Edit metadata• Search through

metadata

NFS

Rules:• Communicates between storage and iRODS• Retrieves status of data• Communicates status to iRODS client

SURF archive

Plugin SURFsara storage into any iRODS instance

SURFarchive connection:

Tiered storage system is by default not transparent to iRODS

iRODS hosting @ SURFsara

9

Plugin SURFsara storage into any iRODS instance

SURF archive connection:

Tiered storage system is by default not transparent to iRODS

Targeted at:

Infrastructure administrators

Purpose:

Seamlessly scale out toSURFsara storage

Use SURFsara storage systemsaccording to institutional orcommunity-specific policies

Moving to production.

Future plans: Host whole iRODSinstances for communities andinstitutes

Service integration

10

DATA MANAGEMENT PLATFORM

Off-Site Storage

Object Store

?

User Interfaces

On-SiteStorage

CLIAPIs

PluginsApps

Clients

- Metadata handling- Data repositories- Cloud storage- Notebooks

CLIAPIs

Data publication

11

User Workspace Data steward space

/zone/home/user/collection+ metadata

/zone/repository/collection+ metadata+ access for data steward

External Publication Platforms

Figshare

DataVerse

Zenodo

SURF Digital Rep

EUDAT B2SHARE

EUDAT B2FIND (only Metadata)

Create deposit Retrieve publicationInformation, e.g. DOI

Pyth

on

pu

blicatio

n clie

nt

User:Upload data, Attach metadata, Flag for publication

Data steward:Data quality checks, Metadata checks, Automatic metadata extraction and adding Create draft and publish

Example metadata mappingDataverse

12

iRODS key value Dataverse access

TITLE String 0 title

ABSTRACT String 7 dsDescription

PID/TICKET for collection iRODS Ticket or PID to iRODS data 4 otherId

TECHNICALINFO

{"irods_host": "", "irods_port": 1247, "irods_user_name": "anonymous", "irods_zone_name": ""}; iget/ils -t <ticket> <path>

27 dataSources

OTHER http endpoint for iRODS, e.g. Metalnx 3 alternativeURL

CREATOR Surname, First name 5 author

Data PIDs http://hdl.handle.net/<PID> 29 otherReferences

Data TICKETs String, <ticket>, <path> 29 otherReferences

SUBJECT controlled vocabulary 8 subject

Data publication

13

Targeted at:

Institutional and community Data stewards/Data managers

Purpose:

Implement cross-service data policies

Status:

Proof of concept

Use case from Maastricht UB/Maastricht data hub

Compute integration

14

DATA MANAGEMENT PLATFORMCLIAPIs

HPC data node

On-Site Storage

iRODS sends job

Triggers workflow

HPC cluster

Data node needs to be managed by iRODS as resource server

Future work

iRODS consortium is working on Lustre plugin

Test plugin with the Dutch national HPC cluster

Compute integration

15

DATA MANAGEMENT PLATFORM

Off-Site Storage

Object Store

?

User Interfaces

On-SiteStorage

CLIAPIs

PluginsApps

Clients

- Metadata handling- Data repositories- Cloud storage- Notebooks

CLIAPIs

PRACE and ELIXIR training

Data Management PlatformiRODS enabling Data Orchestration

16

iRODS hosting

• Storage scale-out

• iRODS environment

Integration

• Persistent Identifiers

• Repositories

• Compute Workflows

• Comanage

Orchestration

• Metadata handling

• Configurable Data policies

Other services: Data management Training, Consultancy

ORGANISING DATA MANAGMENTON NATIONAL LEVEL

SURF: Who are we?

Knowledge sharing

Shared digital infrastructures and services

ICT market place for acquiring software, cloud services and digital content

18

Not-for-profit cooperative for ICT in Dutch education and research

Slide taken from Frank Heere, iRODS UGM, 2017, Utrecht

Members:

14 Research universities

8 Academic Medical Centers

34 Universities of applied sciences

6 Research institutions

Many other research and education institutes

SURF: Companies

SURFmarket

negotiates with ICT providers on behalf of institutes connected to SURF

SURFnet

supports, develops and operates advanced, reliable and interconnected ICT

infrastructure networks

SURFsara

Netherlands' national supercomputing centre, supplies high-performance

computing services, data storage and visualizations

Hub to EU infrastructures

19 Slide taken from Frank Heere, iRODS UGM, 2017, Utrecht

RDM-TEC

20

Universities and research instiutes are scouting for data management solutions

Some already started with own implementations

Own implementation costs a lot of effort

Research Data Management Technology Expertise Consortium (RDM-TEC)

Aim of RDM-TEC:

Sustainable Consortium

Defines functional blocks

Manages technical RDM infrastructure services and software

Provides technical experts for on-site implementation

Improving orchestration and speed of action across all members

Enabling cross institute data management

RDM-TEC Vision and stake holders

21

Professionalise data management for research

iRODS shows potential to orchestrate data in the whole data life cycle

Stakeholders:

10 institutions which (will) all use iRODS

Three software producers

• Utrecht University

• Data centre Groningen

• SURFsara

22

RDM-TEC - Activities

Facilitating the exchange of knowledge about technical developments at institutes, the Netherlands and in the EU

Developing and make software components available to the benefit of the community

Making technical (architectural) choices

Provisioning resources (people) on location for SURF members

Realising and providing infrastructure services, such as hosting or storage facilities

Data management services:

Hylke Koers

AAI: Fatih TurkmenGerben VenekampHarry Kodden

Special thanks to Saskia van Eeeuwijk, Mark Cole and Frank Heere

DM: Arthur NewtonChristine StaigerSharif IslamStefan Wolfsheimer

Arthur Newton, Christine Staiger (SURFsara)Arthur.newton(at)surfsara.nl, Christine.staiger(at)surfsara.nl