EGI support for · 2017-11-14 · EGI: Advanced computing for research EGI is a federation of...

18
EGI support for Research Infrastructures EGI: advanced computing for research www.egi.eu

Transcript of EGI support for · 2017-11-14 · EGI: Advanced computing for research EGI is a federation of...

EGI support forResearch Infrastructures

EGI: advanced computing for research

www.egi.eu

Foreword

Dear Reader,

EGI traces its origin back to the early days ofdistributed computing and was initially designedto cater for the computational needs of the LargeHadron Collider.

Today, EGI operates the largest, federated e-infrastructure of the world and is committed tosupport all fields of science, from physics toastronomy, environmental sciences to humanities,life sciences, chemistry and many otherdisciplines.

On behalf of the EGI community, I am happy topresent you this brochure: a selection of collabo-rations through which we support the setup andoperation of European Research Infrastructures.

EGI now supports 31 Research Infrastructure andcommunities. This brochure features a small, butrepresentative subset, completed during the EGI-Engage project. The use cases display the diversityof challenges that research facilities raise towardIT providers, and the richness of responses thatthe EGI community offers to tackle thesechallenges.

I hope the booklet will also serve as a catalogue ofblueprints for emerging Research Infrastructuresthat are going to explore the power of e-infrastructures in the coming years.

Gergely SiposUser Community Support ManagerEGI Foundation

Environmental sciences

page 4

page 5

page 6

page 7

page 8

page 9

page 10

page 11

page 12

page 13

page 14

page 15

Humanities

Physics & Astronomy

Life & Biomedical sciences

EGI & Research Infrastructures | 1

The collaborative work described in this publication wasco-funded through the EGI-Engage project, supported bythe European Union Horizon 2020 program, under grantnumber 654142.

EGI: Advanced computing for research

EGI is a federation of almost 300 data and computecentres and 21 cloud providers united by a mission tosupport research activities, business and innovation withadvanced computing services. EGI provides technical andhuman services, from distributed high-throughputcomputing and cloud computing, storage and dataresources to consultancy, support and co-development.

The federation is governed by the EGI Council andcoordinated by the EGI Foundation, with headquarters inAmsterdam, the Netherlands. Since its establishment in2010, the EGI e-infrastructure has been deliveringunprecedented data analysis capabilities to tens ofthousands of researchers from over a hundred virtualresearch communities covering many scientific disciplines.

In March 2017, EGI became the first European-widepublicly-funded e-infrastructure to be certified to ISOstandards, a sign of our dedication to continuouslyimprove our service offering.

Belgium

Finland

Bulgaria Croatia Czech Republic Estonia

France Germany Greece Italy

Rep of Macedonia Netherlands Poland Portugal Romania

Slovakia Slovenia Spain Sweden Switzerland

Turkey United Kingdom

CERN participates in the EGI Council as aninternational research organisation

2 | EGI & Research Infrastructures

Countries represented in the EGI Council

How EGI supports Research Infrastructures

EGI Services featured in this publication

Cloud Compute High-ThroughputCompute

Cloud ContainerCompute

Online Storage Data Transfer

Validated Software& Repository

Check-in Workload Manager Accounting AttributeManagement

Operational Tools ConfigurationDatabase

HelpdeskSecurityCoordination

EGI & Research Infrastructures | 3

EGI reaches out to Research Infrastructures (RIs) to discussusage scenarios and assess requirements and preferences.The discussion generates an initial joint service configura-tion, development and delivery plan.

The document is then used to build a Competence Centre(CC): a distributed working group where national e-infra-structure initiatives, scientific institutes, technology andservice providers join forces. The CC collectively refinesand implements the collaboration workplan.

CC membership is based on joint interest and matchingpriorities of the RIs and members of the EGI Community.

The CCs collect and analyse requirements, integratecommunity-specific applications with EGI services, fosterinteroperability across infrastructures and evolve theservices through a user-centric development model.

CCs also take ownership of the resulting services, configuringand providing them through Service Level Agreements toscientific user communities brought in by the RIs, and jointlypromoting through events and channels.

EISCAT_3D: The next generation incoherent scatterradar system

About EISCAT_3DEISCAT_3D will be the world’s leading facility forresearch into the upper atmosphere and near-Earth space or the geospace environment.Construction kicked off in September 2017, withthe first stage of the radar system expected tobecome operational in 2021.

The EISCAT_3D facility will be distributed acrossthree sites in Scandinavia, each with a 70 metrecircular area and 10,000 antennas.

ChallengeThe EISCAT_3D radar opens up new opportunitiesfor physicists to explore a variety of researchfields, but it comes with significant challenges inhandling large-scale experimental data. Theincoming data rates of 99Tb/s need to bereduced to a few PB/year to be within fundinglimits.

EGI & EISCAT_3D CollaborationEISCAT and EGI set up a Competence Centre(CC) in the context of the EGI-Engage project toprovide researchers with data analysis tools toimprove their scientific discovery opportunities.The team developed a web portal for researchersto discover, access and analyse the datagenerated by EISCAT_3D.

The CC opted for the EGI Workload Managertool, based on DIRAC interware. The serviceprovides a web-based graphical interface and acommand line interface to interact with datasearch and job management.

The system also facilitates the development ofdata models and modelling tools within theEISCAT_3D community, and is a pilot of a centralportal service for scientists to interact andcompute with EISCAT data.

Partners involvedEISCAT Scientific Association

EGI: CNRS, CSC, EGI Foundation, SNIC & UAB

Website: http://eiscat3d.seCloudCompute

High-ThroughputCompute

WorkloadManager

A webportal for scientific data discovery and analysis

Kris

tian

Pikn

er/W

ikim

edia

Com

mon

s

Services used by EISCAT_3D

4 | EGI & Research Infrastructures

ICOS: Integrated Carbon Observation System

About ICOSICOS is a pan-European research infrastructurefor quantifying and understanding Europe'sgreenhouse gas (GHG) balance. Its mission is tocollect high-quality observational data and topromote its use, e.g. to model GHG fluxes or tosupport verification of emission data. ICOS bringstogether 120+ measurement stations across theatmosphere, ecosystem and marine domains.

ChallengeICOS needed cloud resources to support thedevelopment of the Footprint Tool, a new webapplication for calculating the GHG footprint of agiven location over time, based on meteorologicalanalyses and emissions data. In addition, ICOSwants to deploy cloud resources for near realtime processing of Ecosystem GHG Flux data.

ICOS was at the 1st ‘Design your e-infrastructure’Workshop in 2016 and is also a test case forinteroperability of EGI and EUDAT services.

EGI & ICOS CollaborationIn the Footprint Tool case, data is stored onB2SAFE and taken to/from the ICOS CarbonPortal with B2STAGE. Users interact only withthe ICOS Carbon Portal, which instantiatesvirtual machines (VMs) in the EGI FederatedCloud. Within EGI, data is saved in storage linkedto the EGI Data Hub and accessible by all theICOS VMs. These include a VM hosting a webinterface handling model run requests andoutput visualisation, a set of worker node VMs,and one VM that manages data sharing acrossthe worker nodes.

For the Ecosystem GHG Flux case, ICOS wouldbuild on a current pilot under development inthe framework of ENVRIplus. This is currentlyusing the gCube Data Analytics platform,combining resources from D4Science and EGI toorchestrate parallelised HTC computationalprocesses.

Partners involvedThe ICOS use case is still in a testing phase andis supported through the incubator VO,fedcloud.egi.eu and uses resources from theCESNET-MetaCloud cloud provider.

Website: https://www.icos-ri.eu/Cloud Compute Online Storage

Piloting applications on the EGI Federated Cloud

EGI & Research Infrastructures | 5

Cloud ContainerCompute

Services used by ICOS:

EMSO: European Multidisciplinary Seafloorand water-column Observatory

About EMSOEMSO is a large-scale research infrastructure ofseafloor & water-column observatories, set upto monitor long-term environmental processesand their interactions. The observatories mea-sure many chemical and physical parameters,such as water temperature and acidity, or theground/water interactions during earthquakesand tsunamis. The EMSODEV H2020 project isworking on the data management systems.

ChallengeCollecting and analysing ocean data on a day-to-day basis is crucial to monitoring environmentalprocesses. To do this, EMSODEV needed todevelop a Data Management Platform (DMP) toset up a flexible and scalable data managementservice for a long-term, high-resolution and(near)-real-time monitoring.

EGI & EMSO CollaborationEMSODEV expressed interest in evaluating theuse of EGI Federated Cloud resources forhosting the DMP. EGI secured a pool of cloudresources provided by data centres in Italy,Portugal and Spain through a Service LevelAgreement (SLA). In total, 4 cloud providerscommitted 9 Terabytes of storage capacity andabout 340 virtual CPU cores.

EMSODEV developed the DMP platform on topof the EGI Federated Cloud with the EGIsupport on virtualisation, storage, networkingand security.

The experimental prototype of the DMP is nowdeployed in the RECAS-BARI cloud and fullyintegrated with the EGI Federated Cloud andthe EGI Authentication and Authorisationservices.

Partners involvedEMSO: INGV

EGI: CESGA, EGI Foundation, INFN-Padova,NCG-INGRID-PT, RECAS-BARI

Website: http://www.emsodev.eu

Services used by EMSO:

Cloud Compute Online Storage

6 | EGI & Research Infrastructures

Access to cloud resources from EGI Federation providers

ESA: European Space Agency

About ESAThe European Space Agency (ESA) is Europe’sgateway to space. Its mission is to shape thedevelopment of Europe’s space capability andensure that investment continues to deliverbenefits to the citizens of Europe and the world.ESA is an international organisation with 22member states.

ChallengeTerradue is a company specialised in deliveringcloud services for earth sciences and was taskedby ESA to lead the development of a cloudinfrastructure to support the Geohazards andHydrology thematic exploitation platforms.Terradue needed cloud resources to make thispossible and to be able to handle massive andcomplex data streams.

EGI & ESA CollaborationThe two thematic exploitation platforms wereconnected to the EGI Federated Cloud toguarantee enough computational power fortheir use cases.

This task was successfully completed bydeveloping an interface between TerradueCloud Platform, one of the available interfacesof the EGI Federated Cloud.

Seven EGI cloud providers from Italy, UK,Greece, Germany, Poland, Belgium and Spaincommitted the cloud resources necessary forthe project through Service and OperationalLevel Agreements.

Partners involvedESA: Terradue

EGI: 100%IT, BEgrid-BELNET, CESGA,CYFRONET-Cloud, EGI Foundation, GoeGrid,HG-09-Okeanos-Cloud, RECAS-BARI

Website: https://www.terradue.com/

Cloud resources to enable two Thematic Exploitation Platforms

Cloud Compute

Services used by ESA / Terradue

EGI & Research Infrastructures | 7

Online Storage

EPOS: European Plate Observing System

About EPOSEPOS is the ESFRI initiative for the solid Earthsciences. EPOS wants to establish a Europeaninfrastructure where datasets, metadata andresources can be searched and accessed acrosssites. This will create new e-science opportunitiesto monitor and better understand the dynamicsof earthquakes, volcanic eruptions and othergeological processes.

ChallengeThe understanding of solid Earth dynamics andtectonic processes relies on the analysis ofseismological data, ground deformation fromterrestrial and satellite observations, geologicaland petro-chemical studies and lab experiments.Setting up a RI to integrate and enable secureaccess 100s of Terabytes of data and use themfor 3D simulation including data staging andvisualisation is an incredible challenge.

EGI & EPOS CollaborationDuring the EGI-Engage project, EGI established aCompetence Centre for EPOS to collect, analyseand compare Earth Science community needswith EGI technical offerings. The work of theEPOS Competence Centre covered three areas:

(1) Establish a prototype service for theauthentication and authorization of EPOS usersto access federated e-infrastructure services;

(2) Setup a cloud-based virtual researchenvironment to conduct MISFIT earthquakesimulations;

(3) Enable the computation of Sentinel-1 data ona scalable cloud platform, in collaboration withthe European Space Agency (see also page 7).

Partners involvedEPOS: KNMI, INGV

EGI: CNRS, CYFRONET, EGI Foundation,Fraunhofer SCAI, GRNET

Website: https://www.epos-ip.org/

Services used by EPOS:

Cloud Compute Online Storage

Online applications, big data and user identification

8 | EGI & Research Infrastructures

DARIAH: Digital Research Infrastructurefor Arts and Humanities

About DARIAHDARIAH EU is a pan-European infrastructure forarts and humanities scholars working withcomputational tools. Research in arts andhumanities often involves huge data sets thatcan benefit from digital research methods. Thegoal is to provide tools and services to enhanceinnovation and generate new knowledge.

ChallengeDARIAH EU members required a way to providetransparent access to cloud storage resources inorder to organise, store and process their datasetsand to improve the resilience of the data.

They also wanted to investigate ways to makedigital applications and databases available toarts and humanities scholars.

EGI & DARIAH collaborationDARIAH EU and EGI set up a joint team as aCompetence Centre (CC) in the context of theEGI-Engage project. The CC started by identifyingpilot applications, for example: a Simple SemanticSearch Engine (SSE) to allow users to search inthe Knowledge Base, or the DBO@Cloud, a cloud-based repository with a centuries-old collectionof Bavarian dialects.

The CC developed the DARIAH Science Gatewayto host the applications and provide access tofile transfer and cloud compute services(available at: go.egi.eu/dariahSG). The DARIAHScience Gateway is now in production and relieson the resources committed by two EGIFederated Cloud data centres: INFN-Bari andINFN-Catania.

Partners involvedDARIAH: AAS, DANS, RBI

EGI: EGI Foundation, GWDG, INFN-Bari,INFN-Catania, MTA-SZTAKI

Website: http://www.dariah.eu/

Services used by DARIAH EU:

Cloud Compute Online Storage

A science gateway to host applications and access to cloud resources

EGI & Research Infrastructures | 9

Joaq

uim

Alve

sG

aspa

r/W

ikim

edia

Com

mon

s

About WLCGWLCG is a global collaboration of more than 170computing centres in 42 countries, linking upnational and international grid infrastructures.The mission is to provide global computingresources to store, distribute and analyse thedata generated by the High Energy Physicsexperiments hosted by the Large HadronCollider (LHC) at CERN.

ChallengeThe observations collected by the LHC detectorsgenerate an enormous amount of data thatneeds to be stored, distributed across hundredsof scientific institutions worldwide and thenanalysed by thousands of scientists. This regularlyinvolves data transfers of +80 Petabytes / monthand ~700 million computing jobs / year, whichconsume about 5 billion CPU hours.

EGI & WLCGThe services provided by the EGI Federation areessential to run a global exascale, distributedcomputing infrastructure such as WLCG.

The collaboration between WLCG and what isnow EGI is over 10 years old: WLCG has beeninvolved in every step of the development of EGIand is the biggest consumer of EGI computeresources. The four largest EGI Virtual Organi-sations are all LHC experiments: ATLAS, ALICE,CMS and LHCb.

The computing is in itself an achievement, butthe scientific results also endorse the approach.WLCG managed the computational challengesthat led, for example, to the discovery of theHiggs Boson in 2012 by ATLAS and CMS, or theLHCb discovery of ‘charmonium particles’ earlierin 2017.

Resource providers: EGI Federated Data Centres | Website: http://wlcg.web.cern.ch/

Services used by WLCG:

Storing, sharing and analysing an unprecedented scale of LHC data

10 | EGI & Research Infrastructures

WLCG: Worldwide LHC Computing Grid

Online Storage

HTC

Accounting

Data Transfer

Attribute Management

Software Repository

Helpdesk

Operational Tools

Security Coordination

Configuration Database

CTA: Cherenkov Telescope Array

About CTAThe Cherenkov Telescope Array (CTA) will be theworld’s leading gamma-ray public observatory.CTA will be used to understand the role of high-energy particles in the most violent phenomenaof the Universe and to search for annihilatingdark matter particles.

ChallengeCTA is a team of 1350 scientists and engineersfrom 32 countries and will be a distributed arrayof more than 100 telescopes built in Spain andthe European Space Observatory site in Chile.

This collaborative model comes with itschallenges: transferring data from telescopes toscientists worldwide, archiving and finding theprocessing power for data reduction and large-scale Monte Carlo simulations.

EGI & CTA CollaborationCTA is using EGI’s High-Throughput Computeand Online Storage services to handlecomputational demands during the project’spreparatory phase.

The compute and storage services are providedvia the CTA Virtual Organisation, one of themost active user groups of EGI. Since 2012 theconsortium has used EGI services to guide thechoice of the best sites to host CTA telescopes inthe North and in the South hemispheres.

CTA EGI usage (2013-2016)

> 360 million HS06 CPU hours

> 11 Petabytes of data transferred

> 2 Petabytes currently in storage

> 11 million computation jobs

Resource providersCC-IN2P3, CYFRONET-LCG2, DESY-ZN, GRIF,IN2P3-LAPP, INFN-T1

With additional resources provided by theNational e-Infrastructures of the Czech Republic,France, Germany, Italy, Poland and Spain.

Website: https://www.cta-observatory.org/

Services used by CTA:

HTC and storage solutions for CTA’s computational challenges

Akih

iro

Ikes

hita

,Mer

o-TS

K,In

tern

atio

nal

High-ThroughputCompute

Online Storage

EGI & Research Infrastructures | 11

About LifeWatchLifeWatch is a Research Infrastructure set up tosupport the fields of ecosystems research andbiodiversity by equipping scientists with accessto data, analytical tools and state-of-the-artvirtual laboratories.

ChallengeDeveloping the LifeWatch objectives requires aclose collaboration between the scientific groupsleading the development of digital tools and ITexperts that can advise on the best technologiesto carry out the vision.

These tools will allow for new avenues ofresearch, but they also come with a highdemand for computational resources.

CollaborationLifeWatch partnered with EGI through an EGI-Engage Competence Centre (CC) to:

> deploy the basic tools required to supportdata management, data processing andmodelling for ecological observatories,

> evaluate the services required to supportworkflows for the deployment of virtual labs

> and support the participation of citizens inLifeWatch's observation records, e.g. inuploading and processing of images.

The work led to a total of seven LifeWatchservices for data management and modellingcurrently in production using the computationalresources of the EGI Federated Cloud.

Partners involvedLifeWatch: CESGA, ICETA, INRA, UNIZAR,UPVLC, VLIZ

EGI: CSIC, EGI Foundation, INFN, LIP

Website: http://www.lifewatch.eu/

Services used by LifeWatch:

Computing resources, access to data, tools and virtual labs

Cloud Compute

Did

ier

Des

coue

ns/W

ikim

edia

Com

mon

s

12 | EGI & Research Infrastructures

LifeWatch: European Infrastructure for Biodiversityand Ecosystem Research

ELIXIR: bringing together life science resources fromacross Europe

About ELIXIRELIXIR unites Europe’s leading life scienceorganisations in managing and safeguarding theincreasing volume of data being generated bypublicly funded research. It coordinates,integrates and sustains bioinformatics resources– such as databases, compute services, applica-tions and training – across its member statesand enables users in academia and industry toaccess what is vital for their research.

ChallengeThe ELIXIR community initiated the developmentof the reference architecture for biomedicalapplications, called the ‘ELIXIR ComputePlatform’. The platform is envisaged as afederation of compute and storage resources,operated according to community-agreedprinciples and with the use of centrally provideduser access management services.

EGI & ELIXIR CollaborationELIXIR was one of the Competence Centres (CCs)in the EGI-Engage project. The CC had two mainactivities:

> Established a cloud infrastructure: the CCbrought together four providers (EMBL-EBI,CNRS, GRNET, CESNET) into a federated cloudresource pool accessible for biomedicalresearchers with ELIXIR user IDs. The teamsimplified the way to federate clouds, performeda compatibility study with a US-based cloudsystem (JetStream) and integrated the Terraformorchestrator technology to simplify porting ofscientific applications.

> Building on this cloud, the CC set up a set ofscientific demonstrators to pilot its use. Theywere: META-Pipe (a metagenomics pipeline),cBioPortal, Marine Metagenomics, PhenoMeNaland Insyght Comparative Genomics.

Partners involvedThe ELIXIR Competence Centre:

CESNET, CSC, CNRS, EGI Foundation, EMBL-EBI,GRNET, SURFsara, University of Indiana

Website: https://www.elixir-europe.org/

Services used by ELIXIR:

OperationalTools

Check-in

A federated cloud for scientific applications

EGI & Research Infrastructures | 13

Cloud Compute

MoBrain: Translational Research fromMolecule to Brain

About MoBrainThe MoBrain Competence Center (CC) hasdeveloped online portals for life scientistsworldwide, by integrating structural biology andmedical imaging services and data.

The MoBrain collaboration relies on the existingexpertise available within the WeNMR and N4Uprojects and technology providers.

ChallengeMoBrain provides biomedical scientists with aportfolio of portals for the study of molecularforces and biomolecular interactions to improvedrug design, treatments and diagnostics.

To do this, the MoBrain community needsadvanced computational power to sustain thework of their portals.

EGI & MoBrain CollaborationIn 2016, Mobrain signed an SLA (service levelagreement) with seven EGI data centres allowingthe CC to use High-Throughput Computing andOnline Storage services needed to developportals for life and brain scientists worldwide. Intotal, the providers pledged around 75 millionhours of computing time and 50 Terabytes ofstorage capacity.

The MoBrain portals powered by EGI resourcesare: HADDOCK, DisVis, AMBER, CS-Rosetta,FANTEN and PowerFit.

To give an example, HADDOCK has so farprocessed more than 178,000 submissions fromover 9,900 scientists, which translates into about9 million High-Throughput Compute jobs peryear on the EGI infrastructure.

Partners involvedMoBrain: University of Utrecht, CIRMMP-Florence,INFN-Padova and the consortium partners

EGI: CESNET-MetaCloud, EGI Foundation, INFN-Padova, NCG-INGRID-PT, NIKHEF, RAL-LCG2,SURFsara, TW-NCHC

Website: http://go.egi.eu/MoBrain

Services used by MoBrain:

High-ThroughputCompute

Online Storage

Compute resources to power online portals for biomedical researchers

14 | EGI & Research Infrastructures

ERIC: European Research Initiative on chroniclymphocytic leukemia

About ERICERIC is a European organisation dedicated toimproving the outcome of patients with chroniclymphocytic leukemia (CLL) and relateddiseases. As of November 2017, the initiativecounts 907 members from 64 countries.

ChallengeThe majority of published research is eitherhidden behind subscription walls or lost in thefog of nondescript information.The ERIC initiative wants to address this issue inthe field of chronic lymphocytic leukemia byimplementing a hub of accurate CLL-specificinformation within a federated environment andincorporating both data and computationalservices.

EGI & ERIC CollaborationERIC was selected as a use case of the second'Design your e-infrastructure' Workshop in 2016.During the event, participants identified theneed of a Virtual Research Environment (VRE)for data repository (patient and clinical trial data)and the computational services to support it.

EGI is supporting this initiative and facilitates theallocation of cloud resources to integrate theCLL VRE and its applications into the EGI Cloudservices. The VRE is made of a portal, a databaseof CLL datasets, analysis applications and aGalaxy workflow system. EGI and ERIC havecompleted the migration of the portal onto theEGI Federated Cloud. The portal has now beenextended to support federated authenticationmechanisms using the OIDC protocol.

Partners involved

ERIC

EGI: BEgrid-BELNET, EGI Foundation

Website: http://www.ericll.org/

A dedicated VRE to facilitate open access to health related information

Services used by ERIC:

Cloud Compute Online Storage

EGI & Research Infrastructures | 15

November 2017

This publication was written and prepared by theCommunications and User Community Support Teamsof the EGI Foundation. We would like to thank:

> all the researchers and teams of the ResearchInfrastructures and projects for their support.

> all the EGI service & resource providers for makingthe collaborations possible.

The content of this brochure is correct to the best ofour knowledge as of November 2017. Moreinformation about EGI's support for ResearchInfrastructures is available on our website:

https://www.egi.eu/use-cases/research-infrastructures/

Design: EGI Communications Team

Cover design concept: Tom Jansen

Copyright: EGI-Engage Consortium, Creative CommonsAttribution 4.0 International License.

The EGI-Engage project was co-funded for 30 monthsby the European Union (EU) Horizon 2020 programunder grant number 654142, as a collaborative effortinvolving more than 70 institutions in over 30 countries.

The butterfly image in the cover: © Derek Ramsey /derekramsey.com (used with permission, viaWikimedia Commons). The other Wikimedia Commonsimages used in this publication are credited in thecaption. A special thanks is in order to all Wikicommonsphotographers.

16 | EGI & Research Infrastructures

EGI Foundation | Science Park 140, 1098 XG Amsterdam | The Netherlands

+31 (0)20 89 32 007 | www.egi.eu