Eudat and Big Data in Science
-
Upload
insidehpc -
Category
Technology
-
view
281 -
download
0
description
Transcript of Eudat and Big Data in Science
EUDAT
EUDAT and Big Data in Science
Wolfgang Gentzsch, Advisor, EUDAT
HPCC 2013 Newport RI, 26-28 March 2013
Data trends
2
Increasing complexity and variety
Gigabytes
Terabytes
Petabytes
Exabytes
Zettabytes
Exp
on
enti
al g
row
th
• Where to store it?
• How to find it?
• How to make the most of it?
• How to ensure
interoperability?
If there are hundreds of Research Infrastructures, how
many different data management systems can we sustain?
3
The EUDAT Case
Tru
st
Data
C
ura
tion
Common Data Services
Users
User functionalities, data capture
& transfer, virtual research
environments
Persistent storage, identification,
authenticity, workflow execution,
mining
Data
Generators
Community Support Services
Data discovery & navigation,
workflow generation, annotation,
interpretability
Collaborative Data Infrastructure
-A framework for the future? -
5
Data Centers and Communities
6
• EPOS: European Plate Observatory System
• CLARIN: Common Language Resources and Technology Infrastructure
• ENES: Service for Climate Modelling in Europe
• LifeWatch: Biodiversity Data and Observatories
• VPH: The Virtual Physiological Human
• All share common challenges:
– Reference models and architectures
– Persistent data identifiers
– Metadata management
– Distributed data sources
– Data interoperability
Five research communities on Board
7
8
9
10
11
12
13
Communities ↔ Data Centers
Data Staging Safe Replication Simple Store
AAI Metadata Catalogue
Dynamic replication
to HPC workspace
for processing
Data curation and
access optimization
Researcher data
store (simple
upload, share and
access)
Aggregated EUDAT metadata domain.
Data inventory
Network of trust
among
authentication
and
authorization
actors
EUDAT Portal Integrated APIs and harmonized access to EUDAT facilities
Building Blocks of the CDI
SAFE_REPLICATION@EUDAT
16
Allow communities to replicate
data to selected data centers
for storage and do this in a
robust, reliable and highly
available manner.
Improve data curation and
accessibility.
More info: [email protected]
DATA_STAGING@EUDAT
17
Allow the communities to
dynamically replicate a subset
of their data stored in EUDAT
to an HPC workspace in order
to be processed.
More info: [email protected]
METADATA@EUDAT
18
Create a joint metadata
domain for all data stored by
EUDAT data centers and a
catalogue which exposes the
data stored within EUDAT,
allowing data searches.
The EUDAT repository should
provide an inventory of
metadata from different
communities More info: [email protected]
SIMPLE_STORE@EUDAT
19
Create an easy to use service that
will help researchers mediated by
the participating communities to
upload and store data which is not
part of the officially handled data
sets of the community.
This service will address the long
tail of “small” data and the
researchers/citizen scientists
creating/manipulating them.
More info: [email protected]
Persistent_Identifyers@EUDAT
20
Deploy a robust, highly
available and effective PID
service that can be used within
the communities and by
EUDAT.
Keeping track of the “names”
of data sets deposited with
the CDI requires robust
mechanisms.
More info: [email protected]
AAI@EUDAT
21
Provide a solution for a working
AAI system in a federated
scenario.
Design the AA infrastructure to
be used during the EUDAT
project and beyond.
More info: [email protected]
OPERATION TEAM
22
Work plan for the next months
• Moving the services to a production environment
• Capturing additional requirements
• Integrating new partners to EUDAT (in particular research communities) – Working groups, pilots, observers and associate partners
• Collaborating with other initiatives – European e-Infrastructures: EGI, PRACE, DANTE, HELIX
NEBULA, SCIDIPS-ES, etc.
– Global initiatives: RDA, CODATA, etc
• Defining EUDAT’s path to sustainability – Cost and funding models
– Governance
23
Welcome to the 2nd EUDAT Conference!
24
28-30 October 2013, Rome
•International event with
keynotes from Europe and
US
• A forum to discuss the
future of data infrastructures
• Project presentations and
poster sessions
• Training tutorials