A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL ... · EUDAT B2FIND (only Metadata) Create deposit...
Transcript of A NATIONAL STRATEGY TO SUPPORT THE TECHNICAL ... · EUDAT B2FIND (only Metadata) Create deposit...
Supporting the Data Life Cycle
4
GIVING ACCESS TO
DATA
CREATINGDATA
PRESERVINGDATA
RE-USING DATA
PROCESSINGDATA
ANALYSINGDATA
Data ingest service
Archive
The current challenge for users
Offer:
Different storage media
Different protocols to steer data transfers
Different clients and user interfaces
Different services
Accommodating various needs and use cases
Requires users to:
Proper book-keeping
Knowledge about storage infrastructure
When to use what?
5
WebDAV gridFTP ssh/scp
Goal:
One entrance point to data
Good interface to compute services
Configurable data policies per use case or community
Data Management PlatformiRODS enabling Data Orchestration
6
iRODS hosting
• Storage scale-out
• iRODS environment
Integration
• Persistent Identifiers
• Repositories
• Compute Workflows
• Comanage
Orchestration
• Metadata handling
• ConfigurableData policies
Other services: Data management Training, Consultancy
Data Management Platform and related Services
7
DATA MANAGEMENT PLATFORM
Off-Site Storage
Object Store
?
User Interfaces
On-SiteStorage
CLIAPIs
PluginsApps
Clients
- Metadata handling- Data repositories- Cloud storage- Notebooks
CLIAPIs
iRODS hosting @ SURFsara
8
DATA MANAGEMENT PLATFORM
User interacts with platform• Put data• Get data• Edit metadata• Search through
metadata
NFS
Rules:• Communicates between storage and iRODS• Retrieves status of data• Communicates status to iRODS client
SURF archive
Plugin SURFsara storage into any iRODS instance
SURFarchive connection:
Tiered storage system is by default not transparent to iRODS
iRODS hosting @ SURFsara
9
Plugin SURFsara storage into any iRODS instance
SURF archive connection:
Tiered storage system is by default not transparent to iRODS
Targeted at:
Infrastructure administrators
Purpose:
Seamlessly scale out toSURFsara storage
Use SURFsara storage systemsaccording to institutional orcommunity-specific policies
Moving to production.
Future plans: Host whole iRODSinstances for communities andinstitutes
Service integration
10
DATA MANAGEMENT PLATFORM
Off-Site Storage
Object Store
?
User Interfaces
On-SiteStorage
CLIAPIs
PluginsApps
Clients
- Metadata handling- Data repositories- Cloud storage- Notebooks
CLIAPIs
Data publication
11
User Workspace Data steward space
/zone/home/user/collection+ metadata
/zone/repository/collection+ metadata+ access for data steward
External Publication Platforms
Figshare
DataVerse
Zenodo
SURF Digital Rep
EUDAT B2SHARE
EUDAT B2FIND (only Metadata)
Create deposit Retrieve publicationInformation, e.g. DOI
Pyth
on
pu
blicatio
n clie
nt
User:Upload data, Attach metadata, Flag for publication
Data steward:Data quality checks, Metadata checks, Automatic metadata extraction and adding Create draft and publish
Example metadata mappingDataverse
12
iRODS key value Dataverse access
TITLE String 0 title
ABSTRACT String 7 dsDescription
PID/TICKET for collection iRODS Ticket or PID to iRODS data 4 otherId
TECHNICALINFO
{"irods_host": "", "irods_port": 1247, "irods_user_name": "anonymous", "irods_zone_name": ""}; iget/ils -t <ticket> <path>
27 dataSources
OTHER http endpoint for iRODS, e.g. Metalnx 3 alternativeURL
CREATOR Surname, First name 5 author
Data PIDs http://hdl.handle.net/<PID> 29 otherReferences
Data TICKETs String, <ticket>, <path> 29 otherReferences
SUBJECT controlled vocabulary 8 subject
Data publication
13
Targeted at:
Institutional and community Data stewards/Data managers
Purpose:
Implement cross-service data policies
Status:
Proof of concept
Use case from Maastricht UB/Maastricht data hub
Compute integration
14
DATA MANAGEMENT PLATFORMCLIAPIs
HPC data node
On-Site Storage
iRODS sends job
Triggers workflow
HPC cluster
Data node needs to be managed by iRODS as resource server
Future work
iRODS consortium is working on Lustre plugin
Test plugin with the Dutch national HPC cluster
Compute integration
15
DATA MANAGEMENT PLATFORM
Off-Site Storage
Object Store
?
User Interfaces
On-SiteStorage
CLIAPIs
PluginsApps
Clients
- Metadata handling- Data repositories- Cloud storage- Notebooks
CLIAPIs
PRACE and ELIXIR training
Data Management PlatformiRODS enabling Data Orchestration
16
iRODS hosting
• Storage scale-out
• iRODS environment
Integration
• Persistent Identifiers
• Repositories
• Compute Workflows
• Comanage
Orchestration
• Metadata handling
• Configurable Data policies
Other services: Data management Training, Consultancy
SURF: Who are we?
Knowledge sharing
Shared digital infrastructures and services
ICT market place for acquiring software, cloud services and digital content
18
Not-for-profit cooperative for ICT in Dutch education and research
Slide taken from Frank Heere, iRODS UGM, 2017, Utrecht
Members:
14 Research universities
8 Academic Medical Centers
34 Universities of applied sciences
6 Research institutions
Many other research and education institutes
SURF: Companies
SURFmarket
negotiates with ICT providers on behalf of institutes connected to SURF
SURFnet
supports, develops and operates advanced, reliable and interconnected ICT
infrastructure networks
SURFsara
Netherlands' national supercomputing centre, supplies high-performance
computing services, data storage and visualizations
Hub to EU infrastructures
19 Slide taken from Frank Heere, iRODS UGM, 2017, Utrecht
RDM-TEC
20
Universities and research instiutes are scouting for data management solutions
Some already started with own implementations
Own implementation costs a lot of effort
Research Data Management Technology Expertise Consortium (RDM-TEC)
Aim of RDM-TEC:
Sustainable Consortium
Defines functional blocks
Manages technical RDM infrastructure services and software
Provides technical experts for on-site implementation
Improving orchestration and speed of action across all members
Enabling cross institute data management
RDM-TEC Vision and stake holders
21
Professionalise data management for research
iRODS shows potential to orchestrate data in the whole data life cycle
Stakeholders:
10 institutions which (will) all use iRODS
Three software producers
• Utrecht University
• Data centre Groningen
• SURFsara
22
RDM-TEC - Activities
Facilitating the exchange of knowledge about technical developments at institutes, the Netherlands and in the EU
Developing and make software components available to the benefit of the community
Making technical (architectural) choices
Provisioning resources (people) on location for SURF members
Realising and providing infrastructure services, such as hosting or storage facilities
Data management services:
Hylke Koers
AAI: Fatih TurkmenGerben VenekampHarry Kodden
Special thanks to Saskia van Eeeuwijk, Mark Cole and Frank Heere
DM: Arthur NewtonChristine StaigerSharif IslamStefan Wolfsheimer
Arthur Newton, Christine Staiger (SURFsara)Arthur.newton(at)surfsara.nl, Christine.staiger(at)surfsara.nl