EUDAT Towards a European Collaborative Data Infrastructure
description
Transcript of EUDAT Towards a European Collaborative Data Infrastructure
EUDAT
Towards a European Collaborative Data Infrastructure
Damien Lecarpentier – CSC, IT Center for Science, FinlandISC’11, Hamburg, 20 June 2011
Outline of the talk
EUDAT concept
EUDAT consortium
EUDAT service approach
Expected benefits and challenges of a CDI
Initiative funded through FP7 e-Infrastructure Call 9 (WP11): INFRA-2011-1.2.2: Data infrastructure for e-Science (november 2010) Call 9 Objective: ”Establish a peristent and robust service infrastructure for scientific data in Europe that
responds to the need of data-intensive Science of 2020” Budget 43M€
EUDAT selected for funding (three-year project) Official starting date: 1st October 2011 Biggest budget of the call: 9,3 M€ EC Grant Total Budget: 16,3 M€
Consortium 23 partners representing 13 countries 15 user communities from a wide range of disciplines (Biomed, Earth Science, Climate, SSH, etc.)
Targets EUDAT objective: “To deliver a Collaborative Data Infrastructure (CDI) with the capacity and capability for
meeting researchers’ needs in a flexible and sustainable way, across geographical and disciplinary boundaries.”
EUDAT Key facts and objectives
The infrastructure must be Collaborative The infrastructure must be driven by researchers’ needs The infrastructure must be sustainable yet flexible The infrastructure must be pan-European The infrastructure must be multi-disciplinary
The current data infrastructure landscape: challenges and opportunities Long history of data management in Europe: several existing data infrastructures
dealing with established and growing user communities (e.g., ESO, ESA, EBI, CERN)
New Research Infrastructures are emerging and are also trying to build data infrastructure solutions to meet their needs (CLARIN, EPOS, ELIXIR, ESS, etc.)
A large number of projects providing excellent data services (EURO-VO, GENESI-DR, Geo-Seas, HELIO, IMPACT, METAFOR, PESI, SEALS, etc.)
However, most of these infrastructures and initiatives address primarily the needs of a specific discipline and user community
Challenges Compatibility, interoperability, and cross-disciplinary research
Data growth in volume and complexity (the so-called “data tsunami”) strong impact on costs threatening the sustainability of the infrastructure
Opportunities
Potential synergies do exist: although disciplines have different ambitions, they have common basic needs and requirements that could be matched with generic pan-European services supporting multiple communities and ensuring greater interoperability.
Strategy needed at pan-European level
Towards a Collaborative Data Infrastructure
Source: HLEG report, p. 31
EUDAT will focus on building this generic data infrastructure layer and offer a trusted domain for long term data preservation accompanied with related services to store, identify, authenticate and mine these data.
This need be done in close collaboration with the Communities Core services must match the requirements of the communities Community services can also be incorporated into the common data service infrastructure
when they are of use to other communities.
The EUDAT Consortium
The EUDAT Communities
The EUDAT Communities
The EUDAT Communities (by field)
EUDAT targets all scientific disciplines (discipline neutral):
To enable the capture and identify cross-discipline requirements To involving the scientists of all the communities in the shaping of the
infrastructure and its services
Biological and Medical Science VPH, ELIXIR, BBRMI, ECRIN
Environmental Science ENES, EPOS, Lifewatch, EMSO, IAGOS-ERI, ICOS
Social Sciences and Humanities CLARIN
Physical Sciences and Engineering WLCG, ISIS
Material Science ESS…
Energy EUFORIA…
EUDAT Services Activities – Iterative Design
EUDAT’s Services activity is concerned with identification of the types of data services needed by the European research communities, delivering them through a federated data infrastructure and supporting their users
1. Capturing Communities Requirements (WP4)
Services to be deployed must be based on user communities needs Strong engagement and collaboration with user communities (EUDAT
communities and beyond) to capture requirements
2. Building the services (WP5)
User requirements must be matched with available technologies Need to identify:
available technologies and tools to develop the required services (technology appraisal) gaps and market failures that should be addressed by EUDAT research activities
Services must be designed, built and tested in a pre-production test bed environment and made available to WP4 for evaluation by their users
3. Deploying the services and operating the federated infrastructure (WP6)
Services must be deployed on the EUDAT infrastructure and made available to users, with interfaces for cross-site, cross-community operation
Reliability, 24h/7d availability and accessibility of the shared services, with operational security, data integrity and compliance with stakeholder requirements and policies.
Core services are building blocks of EUDAT‘s Common Data Infrastructuremainly included on bottom layer of data services
Fundamental Core Services• Long-term preservation• Persistent identifier service• Data access and upload• Workspaces• Web execution and workflow services• Single Sign On (federated AAI)• Monitoring and accounting services• Network services
Extended Core Services (community-supported)• Joint meta data service• Joint data mining service
EUDAT core services
No need to match the needs of all at the same time, addressing a group of communities can be very valuable, too
Service Model Approach and Generic Collaboration
Generic Service Model• Fundamental Core Services meet
strongly overlapping service requirements
• Extended Core Services are mainly community-supported, community requirements are typically overlapping between some disciplines
Collaboration between Teams• Fundamental Core Services are operated and
supported by an Operations Team which collaborates across the participating centres.
• Extended Core Services and other joint multi-disciplinary service must be community-supported, the requirements are overlapping between a specific subset of disciplines
EUDAT Kick-Off
Service deployment
SERVICE DESIGN
USER REQUIREMENTS
SERVICE DEPLOYMENT
2012 2013 2014 2015
1st User Forum 4th User Forum2nd User Forum 3rd User Forum
First Services available
Cross-Community
Services
Full core Services deployed
Sustainability Plan
EUDAT Timeline
Expected benefits of a Collaborative Data Infrastructure
Enabling multi-disciplinary data intensive research and collaboration Development of common services supporting research communities
Support to existing scientific communities’ infrastructures Support to smaller communities through access to sophisticated services
Inter-disciplinary collaboration and exploitation of synergies between communities Communities from different disciplines working together to build services Data sharing between disciplines
Collaboration with other large-scale infrastructure European e-Infrastructures: Géant, PRACE,EGI, etc. Global initiatives in the US, Japan, Australia, etc.
Ensuring wide access to and preservation of data in a sustainable way
A robust generic infrastructure capable of handling the scale and complexity of data that will be generated over the next 10-20 years
Greater access to existing data and better management of data for the future Increased security by managing multiple copies in geographically distant locations
Put Europe in a competitive position for important data repositories of world-wide relevance
Economies of scale and cost-efficiency Shared resources and work are less costly
Challenges and Opportunities
Delivering high level multi-disciplinary data services
Achieving a high level of interoperability in the context of diversity of data, research disciplines and practices
Need to strongly involve the different communities in the design and evaluation of services EUDAT as a platform to discuss interoperability issues (along with other initiatives: e.g DAITF)
Building trust among stakeholders
Trust between service providers and users but also between the researchers and disciplines themselves Trust in the EUDAT infrastructure, the data deposited and collected, data integrity
Ensuring the sustainaibility of the infrastructure Providing a framework and a plan to ensure the continuity of services beyond the immediate funding
window, through the setting up of a sustainable entity Funding and business models Parnerships (new communities, industry, etc.) and governance models
“Do the difficult things while they are easy and do the great things while they are small. A journey of a
thousand miles must begin with a single step.”
Lao Tzu
The beginning of a long journey…
How to get in touch with EUDAT?
Kimmo Koski, CSC - IT Center for ScienceEUDAT Project Coordinator
Peter Wittenburg, Max Planck Institute for Psycholinguistics at Nijmegen (MPI-PL)
EUDAT Scientific [email protected]
Damien Lecarpentier, CSC - IT Center for ScienceEUDAT Project Manager
EUDAT@ISC’11
BoF session on “e-Infrastructure for science in Europe”, on Tuesday 21 June, 14:30-15:15, Hall B
Partners’ booths at ISC:
CSC #146 BSC # 114 DKRZ # 140 EPCC # 152
THANK YOU!