EGI support for · 2017-11-14 · EGI: Advanced computing for research EGI is a federation of...
Transcript of EGI support for · 2017-11-14 · EGI: Advanced computing for research EGI is a federation of...
Foreword
Dear Reader,
EGI traces its origin back to the early days ofdistributed computing and was initially designedto cater for the computational needs of the LargeHadron Collider.
Today, EGI operates the largest, federated e-infrastructure of the world and is committed tosupport all fields of science, from physics toastronomy, environmental sciences to humanities,life sciences, chemistry and many otherdisciplines.
On behalf of the EGI community, I am happy topresent you this brochure: a selection of collabo-rations through which we support the setup andoperation of European Research Infrastructures.
EGI now supports 31 Research Infrastructure andcommunities. This brochure features a small, butrepresentative subset, completed during the EGI-Engage project. The use cases display the diversityof challenges that research facilities raise towardIT providers, and the richness of responses thatthe EGI community offers to tackle thesechallenges.
I hope the booklet will also serve as a catalogue ofblueprints for emerging Research Infrastructuresthat are going to explore the power of e-infrastructures in the coming years.
Gergely SiposUser Community Support ManagerEGI Foundation
Environmental sciences
page 4
page 5
page 6
page 7
page 8
page 9
page 10
page 11
page 12
page 13
page 14
page 15
Humanities
Physics & Astronomy
Life & Biomedical sciences
EGI & Research Infrastructures | 1
The collaborative work described in this publication wasco-funded through the EGI-Engage project, supported bythe European Union Horizon 2020 program, under grantnumber 654142.
EGI: Advanced computing for research
EGI is a federation of almost 300 data and computecentres and 21 cloud providers united by a mission tosupport research activities, business and innovation withadvanced computing services. EGI provides technical andhuman services, from distributed high-throughputcomputing and cloud computing, storage and dataresources to consultancy, support and co-development.
The federation is governed by the EGI Council andcoordinated by the EGI Foundation, with headquarters inAmsterdam, the Netherlands. Since its establishment in2010, the EGI e-infrastructure has been deliveringunprecedented data analysis capabilities to tens ofthousands of researchers from over a hundred virtualresearch communities covering many scientific disciplines.
In March 2017, EGI became the first European-widepublicly-funded e-infrastructure to be certified to ISOstandards, a sign of our dedication to continuouslyimprove our service offering.
Belgium
Finland
Bulgaria Croatia Czech Republic Estonia
France Germany Greece Italy
Rep of Macedonia Netherlands Poland Portugal Romania
Slovakia Slovenia Spain Sweden Switzerland
Turkey United Kingdom
CERN participates in the EGI Council as aninternational research organisation
2 | EGI & Research Infrastructures
Countries represented in the EGI Council
How EGI supports Research Infrastructures
EGI Services featured in this publication
Cloud Compute High-ThroughputCompute
Cloud ContainerCompute
Online Storage Data Transfer
Validated Software& Repository
Check-in Workload Manager Accounting AttributeManagement
Operational Tools ConfigurationDatabase
HelpdeskSecurityCoordination
EGI & Research Infrastructures | 3
EGI reaches out to Research Infrastructures (RIs) to discussusage scenarios and assess requirements and preferences.The discussion generates an initial joint service configura-tion, development and delivery plan.
The document is then used to build a Competence Centre(CC): a distributed working group where national e-infra-structure initiatives, scientific institutes, technology andservice providers join forces. The CC collectively refinesand implements the collaboration workplan.
CC membership is based on joint interest and matchingpriorities of the RIs and members of the EGI Community.
The CCs collect and analyse requirements, integratecommunity-specific applications with EGI services, fosterinteroperability across infrastructures and evolve theservices through a user-centric development model.
CCs also take ownership of the resulting services, configuringand providing them through Service Level Agreements toscientific user communities brought in by the RIs, and jointlypromoting through events and channels.
EISCAT_3D: The next generation incoherent scatterradar system
About EISCAT_3DEISCAT_3D will be the world’s leading facility forresearch into the upper atmosphere and near-Earth space or the geospace environment.Construction kicked off in September 2017, withthe first stage of the radar system expected tobecome operational in 2021.
The EISCAT_3D facility will be distributed acrossthree sites in Scandinavia, each with a 70 metrecircular area and 10,000 antennas.
ChallengeThe EISCAT_3D radar opens up new opportunitiesfor physicists to explore a variety of researchfields, but it comes with significant challenges inhandling large-scale experimental data. Theincoming data rates of 99Tb/s need to bereduced to a few PB/year to be within fundinglimits.
EGI & EISCAT_3D CollaborationEISCAT and EGI set up a Competence Centre(CC) in the context of the EGI-Engage project toprovide researchers with data analysis tools toimprove their scientific discovery opportunities.The team developed a web portal for researchersto discover, access and analyse the datagenerated by EISCAT_3D.
The CC opted for the EGI Workload Managertool, based on DIRAC interware. The serviceprovides a web-based graphical interface and acommand line interface to interact with datasearch and job management.
The system also facilitates the development ofdata models and modelling tools within theEISCAT_3D community, and is a pilot of a centralportal service for scientists to interact andcompute with EISCAT data.
Partners involvedEISCAT Scientific Association
EGI: CNRS, CSC, EGI Foundation, SNIC & UAB
Website: http://eiscat3d.seCloudCompute
High-ThroughputCompute
WorkloadManager
A webportal for scientific data discovery and analysis
Kris
tian
Pikn
er/W
ikim
edia
Com
mon
s
Services used by EISCAT_3D
4 | EGI & Research Infrastructures
ICOS: Integrated Carbon Observation System
About ICOSICOS is a pan-European research infrastructurefor quantifying and understanding Europe'sgreenhouse gas (GHG) balance. Its mission is tocollect high-quality observational data and topromote its use, e.g. to model GHG fluxes or tosupport verification of emission data. ICOS bringstogether 120+ measurement stations across theatmosphere, ecosystem and marine domains.
ChallengeICOS needed cloud resources to support thedevelopment of the Footprint Tool, a new webapplication for calculating the GHG footprint of agiven location over time, based on meteorologicalanalyses and emissions data. In addition, ICOSwants to deploy cloud resources for near realtime processing of Ecosystem GHG Flux data.
ICOS was at the 1st ‘Design your e-infrastructure’Workshop in 2016 and is also a test case forinteroperability of EGI and EUDAT services.
EGI & ICOS CollaborationIn the Footprint Tool case, data is stored onB2SAFE and taken to/from the ICOS CarbonPortal with B2STAGE. Users interact only withthe ICOS Carbon Portal, which instantiatesvirtual machines (VMs) in the EGI FederatedCloud. Within EGI, data is saved in storage linkedto the EGI Data Hub and accessible by all theICOS VMs. These include a VM hosting a webinterface handling model run requests andoutput visualisation, a set of worker node VMs,and one VM that manages data sharing acrossthe worker nodes.
For the Ecosystem GHG Flux case, ICOS wouldbuild on a current pilot under development inthe framework of ENVRIplus. This is currentlyusing the gCube Data Analytics platform,combining resources from D4Science and EGI toorchestrate parallelised HTC computationalprocesses.
Partners involvedThe ICOS use case is still in a testing phase andis supported through the incubator VO,fedcloud.egi.eu and uses resources from theCESNET-MetaCloud cloud provider.
Website: https://www.icos-ri.eu/Cloud Compute Online Storage
Piloting applications on the EGI Federated Cloud
EGI & Research Infrastructures | 5
Cloud ContainerCompute
Services used by ICOS:
EMSO: European Multidisciplinary Seafloorand water-column Observatory
About EMSOEMSO is a large-scale research infrastructure ofseafloor & water-column observatories, set upto monitor long-term environmental processesand their interactions. The observatories mea-sure many chemical and physical parameters,such as water temperature and acidity, or theground/water interactions during earthquakesand tsunamis. The EMSODEV H2020 project isworking on the data management systems.
ChallengeCollecting and analysing ocean data on a day-to-day basis is crucial to monitoring environmentalprocesses. To do this, EMSODEV needed todevelop a Data Management Platform (DMP) toset up a flexible and scalable data managementservice for a long-term, high-resolution and(near)-real-time monitoring.
EGI & EMSO CollaborationEMSODEV expressed interest in evaluating theuse of EGI Federated Cloud resources forhosting the DMP. EGI secured a pool of cloudresources provided by data centres in Italy,Portugal and Spain through a Service LevelAgreement (SLA). In total, 4 cloud providerscommitted 9 Terabytes of storage capacity andabout 340 virtual CPU cores.
EMSODEV developed the DMP platform on topof the EGI Federated Cloud with the EGIsupport on virtualisation, storage, networkingand security.
The experimental prototype of the DMP is nowdeployed in the RECAS-BARI cloud and fullyintegrated with the EGI Federated Cloud andthe EGI Authentication and Authorisationservices.
Partners involvedEMSO: INGV
EGI: CESGA, EGI Foundation, INFN-Padova,NCG-INGRID-PT, RECAS-BARI
Website: http://www.emsodev.eu
Services used by EMSO:
Cloud Compute Online Storage
6 | EGI & Research Infrastructures
Access to cloud resources from EGI Federation providers
ESA: European Space Agency
About ESAThe European Space Agency (ESA) is Europe’sgateway to space. Its mission is to shape thedevelopment of Europe’s space capability andensure that investment continues to deliverbenefits to the citizens of Europe and the world.ESA is an international organisation with 22member states.
ChallengeTerradue is a company specialised in deliveringcloud services for earth sciences and was taskedby ESA to lead the development of a cloudinfrastructure to support the Geohazards andHydrology thematic exploitation platforms.Terradue needed cloud resources to make thispossible and to be able to handle massive andcomplex data streams.
EGI & ESA CollaborationThe two thematic exploitation platforms wereconnected to the EGI Federated Cloud toguarantee enough computational power fortheir use cases.
This task was successfully completed bydeveloping an interface between TerradueCloud Platform, one of the available interfacesof the EGI Federated Cloud.
Seven EGI cloud providers from Italy, UK,Greece, Germany, Poland, Belgium and Spaincommitted the cloud resources necessary forthe project through Service and OperationalLevel Agreements.
Partners involvedESA: Terradue
EGI: 100%IT, BEgrid-BELNET, CESGA,CYFRONET-Cloud, EGI Foundation, GoeGrid,HG-09-Okeanos-Cloud, RECAS-BARI
Website: https://www.terradue.com/
Cloud resources to enable two Thematic Exploitation Platforms
Cloud Compute
Services used by ESA / Terradue
EGI & Research Infrastructures | 7
Online Storage
EPOS: European Plate Observing System
About EPOSEPOS is the ESFRI initiative for the solid Earthsciences. EPOS wants to establish a Europeaninfrastructure where datasets, metadata andresources can be searched and accessed acrosssites. This will create new e-science opportunitiesto monitor and better understand the dynamicsof earthquakes, volcanic eruptions and othergeological processes.
ChallengeThe understanding of solid Earth dynamics andtectonic processes relies on the analysis ofseismological data, ground deformation fromterrestrial and satellite observations, geologicaland petro-chemical studies and lab experiments.Setting up a RI to integrate and enable secureaccess 100s of Terabytes of data and use themfor 3D simulation including data staging andvisualisation is an incredible challenge.
EGI & EPOS CollaborationDuring the EGI-Engage project, EGI established aCompetence Centre for EPOS to collect, analyseand compare Earth Science community needswith EGI technical offerings. The work of theEPOS Competence Centre covered three areas:
(1) Establish a prototype service for theauthentication and authorization of EPOS usersto access federated e-infrastructure services;
(2) Setup a cloud-based virtual researchenvironment to conduct MISFIT earthquakesimulations;
(3) Enable the computation of Sentinel-1 data ona scalable cloud platform, in collaboration withthe European Space Agency (see also page 7).
Partners involvedEPOS: KNMI, INGV
EGI: CNRS, CYFRONET, EGI Foundation,Fraunhofer SCAI, GRNET
Website: https://www.epos-ip.org/
Services used by EPOS:
Cloud Compute Online Storage
Online applications, big data and user identification
8 | EGI & Research Infrastructures
DARIAH: Digital Research Infrastructurefor Arts and Humanities
About DARIAHDARIAH EU is a pan-European infrastructure forarts and humanities scholars working withcomputational tools. Research in arts andhumanities often involves huge data sets thatcan benefit from digital research methods. Thegoal is to provide tools and services to enhanceinnovation and generate new knowledge.
ChallengeDARIAH EU members required a way to providetransparent access to cloud storage resources inorder to organise, store and process their datasetsand to improve the resilience of the data.
They also wanted to investigate ways to makedigital applications and databases available toarts and humanities scholars.
EGI & DARIAH collaborationDARIAH EU and EGI set up a joint team as aCompetence Centre (CC) in the context of theEGI-Engage project. The CC started by identifyingpilot applications, for example: a Simple SemanticSearch Engine (SSE) to allow users to search inthe Knowledge Base, or the DBO@Cloud, a cloud-based repository with a centuries-old collectionof Bavarian dialects.
The CC developed the DARIAH Science Gatewayto host the applications and provide access tofile transfer and cloud compute services(available at: go.egi.eu/dariahSG). The DARIAHScience Gateway is now in production and relieson the resources committed by two EGIFederated Cloud data centres: INFN-Bari andINFN-Catania.
Partners involvedDARIAH: AAS, DANS, RBI
EGI: EGI Foundation, GWDG, INFN-Bari,INFN-Catania, MTA-SZTAKI
Website: http://www.dariah.eu/
Services used by DARIAH EU:
Cloud Compute Online Storage
A science gateway to host applications and access to cloud resources
EGI & Research Infrastructures | 9
Joaq
uim
Alve
sG
aspa
r/W
ikim
edia
Com
mon
s
About WLCGWLCG is a global collaboration of more than 170computing centres in 42 countries, linking upnational and international grid infrastructures.The mission is to provide global computingresources to store, distribute and analyse thedata generated by the High Energy Physicsexperiments hosted by the Large HadronCollider (LHC) at CERN.
ChallengeThe observations collected by the LHC detectorsgenerate an enormous amount of data thatneeds to be stored, distributed across hundredsof scientific institutions worldwide and thenanalysed by thousands of scientists. This regularlyinvolves data transfers of +80 Petabytes / monthand ~700 million computing jobs / year, whichconsume about 5 billion CPU hours.
EGI & WLCGThe services provided by the EGI Federation areessential to run a global exascale, distributedcomputing infrastructure such as WLCG.
The collaboration between WLCG and what isnow EGI is over 10 years old: WLCG has beeninvolved in every step of the development of EGIand is the biggest consumer of EGI computeresources. The four largest EGI Virtual Organi-sations are all LHC experiments: ATLAS, ALICE,CMS and LHCb.
The computing is in itself an achievement, butthe scientific results also endorse the approach.WLCG managed the computational challengesthat led, for example, to the discovery of theHiggs Boson in 2012 by ATLAS and CMS, or theLHCb discovery of ‘charmonium particles’ earlierin 2017.
Resource providers: EGI Federated Data Centres | Website: http://wlcg.web.cern.ch/
Services used by WLCG:
Storing, sharing and analysing an unprecedented scale of LHC data
10 | EGI & Research Infrastructures
WLCG: Worldwide LHC Computing Grid
Online Storage
HTC
Accounting
Data Transfer
Attribute Management
Software Repository
Helpdesk
Operational Tools
Security Coordination
Configuration Database
CTA: Cherenkov Telescope Array
About CTAThe Cherenkov Telescope Array (CTA) will be theworld’s leading gamma-ray public observatory.CTA will be used to understand the role of high-energy particles in the most violent phenomenaof the Universe and to search for annihilatingdark matter particles.
ChallengeCTA is a team of 1350 scientists and engineersfrom 32 countries and will be a distributed arrayof more than 100 telescopes built in Spain andthe European Space Observatory site in Chile.
This collaborative model comes with itschallenges: transferring data from telescopes toscientists worldwide, archiving and finding theprocessing power for data reduction and large-scale Monte Carlo simulations.
EGI & CTA CollaborationCTA is using EGI’s High-Throughput Computeand Online Storage services to handlecomputational demands during the project’spreparatory phase.
The compute and storage services are providedvia the CTA Virtual Organisation, one of themost active user groups of EGI. Since 2012 theconsortium has used EGI services to guide thechoice of the best sites to host CTA telescopes inthe North and in the South hemispheres.
CTA EGI usage (2013-2016)
> 360 million HS06 CPU hours
> 11 Petabytes of data transferred
> 2 Petabytes currently in storage
> 11 million computation jobs
Resource providersCC-IN2P3, CYFRONET-LCG2, DESY-ZN, GRIF,IN2P3-LAPP, INFN-T1
With additional resources provided by theNational e-Infrastructures of the Czech Republic,France, Germany, Italy, Poland and Spain.
Website: https://www.cta-observatory.org/
Services used by CTA:
HTC and storage solutions for CTA’s computational challenges
Akih
iro
Ikes
hita
,Mer
o-TS
K,In
tern
atio
nal
High-ThroughputCompute
Online Storage
EGI & Research Infrastructures | 11
About LifeWatchLifeWatch is a Research Infrastructure set up tosupport the fields of ecosystems research andbiodiversity by equipping scientists with accessto data, analytical tools and state-of-the-artvirtual laboratories.
ChallengeDeveloping the LifeWatch objectives requires aclose collaboration between the scientific groupsleading the development of digital tools and ITexperts that can advise on the best technologiesto carry out the vision.
These tools will allow for new avenues ofresearch, but they also come with a highdemand for computational resources.
CollaborationLifeWatch partnered with EGI through an EGI-Engage Competence Centre (CC) to:
> deploy the basic tools required to supportdata management, data processing andmodelling for ecological observatories,
> evaluate the services required to supportworkflows for the deployment of virtual labs
> and support the participation of citizens inLifeWatch's observation records, e.g. inuploading and processing of images.
The work led to a total of seven LifeWatchservices for data management and modellingcurrently in production using the computationalresources of the EGI Federated Cloud.
Partners involvedLifeWatch: CESGA, ICETA, INRA, UNIZAR,UPVLC, VLIZ
EGI: CSIC, EGI Foundation, INFN, LIP
Website: http://www.lifewatch.eu/
Services used by LifeWatch:
Computing resources, access to data, tools and virtual labs
Cloud Compute
Did
ier
Des
coue
ns/W
ikim
edia
Com
mon
s
12 | EGI & Research Infrastructures
LifeWatch: European Infrastructure for Biodiversityand Ecosystem Research
ELIXIR: bringing together life science resources fromacross Europe
About ELIXIRELIXIR unites Europe’s leading life scienceorganisations in managing and safeguarding theincreasing volume of data being generated bypublicly funded research. It coordinates,integrates and sustains bioinformatics resources– such as databases, compute services, applica-tions and training – across its member statesand enables users in academia and industry toaccess what is vital for their research.
ChallengeThe ELIXIR community initiated the developmentof the reference architecture for biomedicalapplications, called the ‘ELIXIR ComputePlatform’. The platform is envisaged as afederation of compute and storage resources,operated according to community-agreedprinciples and with the use of centrally provideduser access management services.
EGI & ELIXIR CollaborationELIXIR was one of the Competence Centres (CCs)in the EGI-Engage project. The CC had two mainactivities:
> Established a cloud infrastructure: the CCbrought together four providers (EMBL-EBI,CNRS, GRNET, CESNET) into a federated cloudresource pool accessible for biomedicalresearchers with ELIXIR user IDs. The teamsimplified the way to federate clouds, performeda compatibility study with a US-based cloudsystem (JetStream) and integrated the Terraformorchestrator technology to simplify porting ofscientific applications.
> Building on this cloud, the CC set up a set ofscientific demonstrators to pilot its use. Theywere: META-Pipe (a metagenomics pipeline),cBioPortal, Marine Metagenomics, PhenoMeNaland Insyght Comparative Genomics.
Partners involvedThe ELIXIR Competence Centre:
CESNET, CSC, CNRS, EGI Foundation, EMBL-EBI,GRNET, SURFsara, University of Indiana
Website: https://www.elixir-europe.org/
Services used by ELIXIR:
OperationalTools
Check-in
A federated cloud for scientific applications
EGI & Research Infrastructures | 13
Cloud Compute
MoBrain: Translational Research fromMolecule to Brain
About MoBrainThe MoBrain Competence Center (CC) hasdeveloped online portals for life scientistsworldwide, by integrating structural biology andmedical imaging services and data.
The MoBrain collaboration relies on the existingexpertise available within the WeNMR and N4Uprojects and technology providers.
ChallengeMoBrain provides biomedical scientists with aportfolio of portals for the study of molecularforces and biomolecular interactions to improvedrug design, treatments and diagnostics.
To do this, the MoBrain community needsadvanced computational power to sustain thework of their portals.
EGI & MoBrain CollaborationIn 2016, Mobrain signed an SLA (service levelagreement) with seven EGI data centres allowingthe CC to use High-Throughput Computing andOnline Storage services needed to developportals for life and brain scientists worldwide. Intotal, the providers pledged around 75 millionhours of computing time and 50 Terabytes ofstorage capacity.
The MoBrain portals powered by EGI resourcesare: HADDOCK, DisVis, AMBER, CS-Rosetta,FANTEN and PowerFit.
To give an example, HADDOCK has so farprocessed more than 178,000 submissions fromover 9,900 scientists, which translates into about9 million High-Throughput Compute jobs peryear on the EGI infrastructure.
Partners involvedMoBrain: University of Utrecht, CIRMMP-Florence,INFN-Padova and the consortium partners
EGI: CESNET-MetaCloud, EGI Foundation, INFN-Padova, NCG-INGRID-PT, NIKHEF, RAL-LCG2,SURFsara, TW-NCHC
Website: http://go.egi.eu/MoBrain
Services used by MoBrain:
High-ThroughputCompute
Online Storage
Compute resources to power online portals for biomedical researchers
14 | EGI & Research Infrastructures
ERIC: European Research Initiative on chroniclymphocytic leukemia
About ERICERIC is a European organisation dedicated toimproving the outcome of patients with chroniclymphocytic leukemia (CLL) and relateddiseases. As of November 2017, the initiativecounts 907 members from 64 countries.
ChallengeThe majority of published research is eitherhidden behind subscription walls or lost in thefog of nondescript information.The ERIC initiative wants to address this issue inthe field of chronic lymphocytic leukemia byimplementing a hub of accurate CLL-specificinformation within a federated environment andincorporating both data and computationalservices.
EGI & ERIC CollaborationERIC was selected as a use case of the second'Design your e-infrastructure' Workshop in 2016.During the event, participants identified theneed of a Virtual Research Environment (VRE)for data repository (patient and clinical trial data)and the computational services to support it.
EGI is supporting this initiative and facilitates theallocation of cloud resources to integrate theCLL VRE and its applications into the EGI Cloudservices. The VRE is made of a portal, a databaseof CLL datasets, analysis applications and aGalaxy workflow system. EGI and ERIC havecompleted the migration of the portal onto theEGI Federated Cloud. The portal has now beenextended to support federated authenticationmechanisms using the OIDC protocol.
Partners involved
ERIC
EGI: BEgrid-BELNET, EGI Foundation
Website: http://www.ericll.org/
A dedicated VRE to facilitate open access to health related information
Services used by ERIC:
Cloud Compute Online Storage
EGI & Research Infrastructures | 15
November 2017
This publication was written and prepared by theCommunications and User Community Support Teamsof the EGI Foundation. We would like to thank:
> all the researchers and teams of the ResearchInfrastructures and projects for their support.
> all the EGI service & resource providers for makingthe collaborations possible.
The content of this brochure is correct to the best ofour knowledge as of November 2017. Moreinformation about EGI's support for ResearchInfrastructures is available on our website:
https://www.egi.eu/use-cases/research-infrastructures/
Design: EGI Communications Team
Cover design concept: Tom Jansen
Copyright: EGI-Engage Consortium, Creative CommonsAttribution 4.0 International License.
The EGI-Engage project was co-funded for 30 monthsby the European Union (EU) Horizon 2020 programunder grant number 654142, as a collaborative effortinvolving more than 70 institutions in over 30 countries.
The butterfly image in the cover: © Derek Ramsey /derekramsey.com (used with permission, viaWikimedia Commons). The other Wikimedia Commonsimages used in this publication are credited in thecaption. A special thanks is in order to all Wikicommonsphotographers.
16 | EGI & Research Infrastructures