Using EUDAT services to replicate, store, share, and find ... · 31 October 2013 Event 3 Some...
Transcript of Using EUDAT services to replicate, store, share, and find ... · 31 October 2013 Event 3 Some...
31 October 2013 DIGITAL PRESERVATION OF CULTURAL DATA WORKSHOP, 2nd EUDAT conference
1
Maciej Brzeźniak, Norbert Meyer, PSNC
Damien Lecarpentier, CSC
Using EUDAT services to replicate, store, share, and find cultural heritage data
…in PSNC… and beyond…
31 October 2013 Event 2
Plan and purpose of presentation
Summarize the work done and planned in EUDAT an DCH-RP project
• How DCH sector can use e-Infrastructures?
Undersdand how selected EUDAT services can be used in DCH domain?
• How EUDAT CDI architecture can be integrated with domain-specific services?
• Overview of current services (Simple Store, Safe Replication)
• Presentation of integration work in DCH-RP
• Discussion on possible future extensions / services
31 October 2013 Event 3
Some aspects of EUDAT
EUDAT – European Data Infrastructure • Vision: to support a Collaborative Data Infrastructure
• Aims:
• Provide a sustainable platform of technologies, tools, services driven by user needs
• Engage users in defining/shaping a platform for shared services
• Support data-intensive, multi-disciplinary research: • Humanities and Social Science: CLARIN
• But also: earth science (ENES, Earth system modelling; EPOS: European Plane Observing System), ecology (LifeWatch), Virtual Physiological Human (VPH)
• Deliver common low-level services that are required to provide the level of interoperation and trust of data
• Ensure that the data infrastructure is robust/scalable (able to address ‚data tsumami’)
• Build community/domain-specific services on top of the common services with participation of users
31 October 2013 Event 4
CDI layers Commons vs community- specific services
Low-level services:
Cloud storage
• S3
• CDMI
Local LTS: long-term storage
CDI layers vs tools and services in DCH Integration challenges…
High-level services:
Grid storage
• FTS
• GridFTP
dLibra dArceo dLab
EUDAT
simple store
EUDAT storage
• Safe Replication
eCulture Science Gateway
Invenio
Another community solution
Low-level services:
Cloud storage
• S3
• CDMI
Local LTS: long-term storage
CDI layers vs tools and services in DCH Integration challenges…
High-level services:
Grid storage
• FTS
• GridFTP
dLibra dArceo dLab
EUDAT
simple store
EUDAT storage
• Safe Replication
eCulture Science Gateway
Invenio
Another community solution
DCH-RP <-> EUDAT proof of concept: Do they go together?
31 October 2013 Event 7
EUDAT: services for DCH domain
Services covered in the presentationis
• Simple Store service: • enable researchers and scientists to upload, store and share date
• designated for the „long tail of data”
• Safe Replication: • Allow communities to replicate data to selected EUDAT data centers
• Automated replication (iRODS), PID registration (EPIC)
• Data Staging: • Staging data from user community premises/systems (iRODS)
• to computing systems, e.g. PRACE’s HPC centres (GridFTP, FTS)
• Metadata service: • Joint metadata domain for all EUDAT data centres
• Searchable catalogue covering all data stored within EUDAT
• AAI: • Provide a solution for a working AAI system in a federated
31 October 2013 Event 8
EUDAT for DCH Simple Storage Service
Simple Storage Service: • Address the issues of small user groups and indivitual users
• Provides solution for “long tail data”: often stored on laptops and departmental servers
Functionality: • alowing registered users to upload typical data objects
into the EUDAT store
• enabling users to share such objects and collections with other researchers,
• lets utilising other EUDAT services:
• Safe Replication
• PIDs
• etc.
• May be integrated with AAI
More: http://www.eudat.eu/simple-store
31 October 2013 Event 9
EUDAT for DCH Simple Storage Service
Simple Storage Service internals: • Referred also as Researcher Data Store
• Based on Invenio:
• Developed by CERN
• http://invenio-software.org/
• Storage backend:
• Disk
• iRODS (EUDAT safe replication)
• Front-end:
• Developed a new submission portal to invenio
31 October 2013 Event 10
Simple Storage Service:
EUDAT for DCH Simple Storage Service
31 October 2013 Event 11
Simple Storage Service:
EUDAT for DCH Simple Storage Service
31 October 2013 Event 12
Simple Storage Service:
EUDAT for DCH Simple Storage Service
31 October 2013 Event 13
EUDAT for DCH Simple Storage Service
31 October 2013 Event 14
EUDAT for DCH Simple Storage Service
31 October 2013 Event 15
EUDAT for DCH Simple Storage Service
31 October 2013 Event 16
EUDAT for DCH Simple Storage Service
31 October 2013 Event 17
31 October 2013 Event 18
EUDAT for DCH Simple Storage Service
31 October 2013 Event 19
Safe Replication Service: • Allows communities to replicate data to EUDAT data centers
• Can be integrated with portals and community tools
Functionality: • Ingested file/data are:
• automatically replicated to many data centres
• get persistent identifiers registered (PIDs based on EPIC)
• Various interfaces supported
• iRODS: icommand, API,
• WebDAV, GridFTP
• Can replicate data on top of various different data stores:
• Disks, tapes, HSMs
• Clouds (e.g. S3)
More: http://www.eudat.eu/safe-replication
EUDAT for DCH Safe Replication Service
EUDAT: Simple Storage Service
Safe Replication Service:
Integration option 1: EUDAT Simple Store + EUDAT Safe Replication
Simple Store Service
Data Store1
Data Store2
Safe Replication
Easy deposit & access to the data
Transparent replication of data, persistent storage
Support for sharing
Integration option2: Community portal/solution + EUDAT Safe Replication
Community service
Data Store1
Data Store2
Safe Replication
Storage & access typical for community
Transparent replication of data
Support for sharing
Option 1 in practice: EUDAT Simple store + EUDAT Safe Replication
Data Store1 @PSNC
Data Store2 @EPCC
Safe Replication
Transparent replication of data
Support for sharing
Memory institutions:
EUDAT Simple Store
Simple Store Service
Easy storage & access to the data
EUDAT Safe replication
Provided by PSNC
Provided by PSNC (Poznan) & EPCC (Edinburgh)
DCH-RP <-> EUDAT PoC 1
Option 2 in practice: Community CMS solution + EUDAT Safe Replication
Data Store1 @PSNC
Data Store2 @EPCC
Safe Replication
Transparent replication of data
Support for sharing
Memory institutions:
EUDAT Simple Store
Feature-full domain-specific tool with large user base
EUDAT Safe replication
Provided by PSNC or EUDAT partners
Provided by PSNC (Poznan) & EPCC (Edinburgh)
Community service
DCH-RP <-> EUDAT PoC 2
Domain-specific solutions dLibra/dArceo/dLab
DCH-RP <-> EUDAT PoC 2
10
1
1 21
1
5
2
3
1
4
3
1
1
1
1
1
2
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
Basic statistics: • ± 100 digital libraries • several hundreds memory institutions • over 1,1 M of digital objects • 98% content delivered via services
based on dLibra software (http://dlibra.psnc.pl/) which uses Solr for content and metadata indexing and searching
http://dlab.psnc.pl/
EUDAT <-> DCH-RP PoC summary
We investigate two possible ways to offering data preservation services for DCH
• Top-down solution for ‚citizen scientists’ / ‚citizen DCH people’
• Well-established solution backed by EUDAT Safe Replication
We exploit the layered EUDAT CDI architecture
• In theory: It enables integration with existing solutions
• We try to understand how it works in practice
Planned extensions of simple store Discussion trigger 1
Development roadmap
• Premium service: • Customisation for layout, metadata – for community needs
• increased storage capcity
• increased support
• increased bandwidth
• Premium vs regular service: • Providing premium service requires enrolling with EUDAT
• Regular services to be offered to ‚citizen scientists/users’ – no close relationship needed
• AAI integration • On the roadmap
Possible extensions of simple store Discussion trigger 2
From yesterdays discussion about Simple Store:
• Thousand of files?
• Upload reliability / robustness?
=> Batch upload
• API to be developed
» Enables integration with existing systems
» Tools can be offered by EUDAT to support batch upload
• Collections upload?
=> Support for meta-data extraction
• Implemented client-side?
• E.g. based on pre-prepared collections (e.g. DIPs)
Possible extensions of simple store Implementation: user-side tool?
Client-side application
Data Meta-Data
Simple Store API Data upload
API
Meta-data exchange
API
Meta-data extraction and upload
Data upload control
Possible extensions of simple store Implementation: user-side tool?
Client-side application
Data Meta-Data
Simple Store API Data upload
API
Meta-data exchange
API
Meta-data extraction and upload
Data upload control
Higlights:
• Automation: ease of use, reliability, performance
• Functionality: data upload, meta-data extraction
Possible extensions of simple store Implementation: user-side tool?
Client-side application
Data Meta-Data
Simple Store API Data upload
API
Meta-data exchange
API
Meta-data extraction and upload
Data upload control
Higlights:
• Automation: ease of use, reliability, performance
• Functionality: data upload, meta-data extraction
Challenges:
• Portability
• Universality: standards need to be identified
• Sustainability
Possible extensions of simple store Implementation: user-side tool?
Client-side application
Data Meta-Data
Simple Store API Data upload
API
Meta-data exchange
API
Meta-data extraction and upload
Data upload control
Higlights:
• Automation: ease of use, reliability, performance
• Functionality: data upload, meta-data extraction
Challenges:
• Portability
• Universality: standards need to be identified
• Sustainability
Discussion needed!
Possible extensions of simple store Implementation: user-side tool?
Extensions of EUDAT services discussion Summary
Message:
• EUDAT infrastructure and services are layered, modular • This enables integration
• Further extensions possible • Users are welcome to influence them
• We want to make sure that we recognised and support necessary standards
• Technical details / organisation / etc. to be discussed
31 October 2013 DIGITAL PRESERVATION OF CULTURAL DATA WORKSHOP, 2nd EUDAT conference
36
Maciej Brzeźniak, Norbert Meyer, PSNC
Damien Lecarpentier, CSC
Using EUDAT services to replicate, store, share, and find cultural heritage data
…in PSNC… and beyond…
THANK YOU!